Even though the Import and Export Wizard, incorporated into the SQL Server 2008 platform, greatly simplifies the creation of SQL Server Integration Services packages, it has its limitations. This article points out the primary challenges associated with using it to copy data between SQL Server 2008 and Excel and presents methods of addressing these challenges.
The Import and Export Wizard, incorporated into SQL Server 2008 platform, greatly simplifies the creation of SSIS packages, minimizing the need for editing their content in the Business Intelligence Development Studio. However, it is worth noting that our original example, which demonstrated this approach by exporting the content of a view in the AdventureWorksDW database to a text file, was chosen mainly for its exceptional simplicity, rarely experienced in real world scenarios. In this article, we will focus on more practical circumstances, by analyzing the process of copying data between SQL Server 2008 and Excel, pointing out primary challenges associated with such transfers, and presenting methods of addressing them.
To better understand potential issues that might surface in such situations, it is important to realize that SSIS has its own, distinct collection of data types, different from those used by data stores that constitute sources and destinations of ETL operations. Integration Services data types follow a consistent naming convention (each starts with the DT_ prefix followed by a type identifier); a comprehensive list can be found in the Integration Services Data Types article of SQL Server 2008 Books Online. It is relatively straightforward to match each data type to their counterparts in other databases, as you can determine by reviewing the table within the Mapping Integration Services Data Types to Database Data Types section of this article. However, as you can see there, in a majority of the cases, their relationships are not straightforward. These mappings are implemented by default when performing data extraction and loading with Import and Export Wizard, however, it is possible to modify them if the resulting conversion does not yield the desired outcome (for example, you might want to enforce a more-space efficient data type for columns if their data values fit within its range).
XML-formatted files, on which the mappings are based, reside in the Program Files\Microsoft SQL Server\100\DTS\MappingFiles folder. While they cover the most commonly encountered database products and data types, you can modify them if you deal with a less common set of conditions or if your requirements are not satisfied by defaults employed by the Import and Export Wizard. Alternatively, you can adjust them dynamically, while running the wizard, which, in addition, gives you the ability to designate desired column names and some of data type properties, such as length or code page. Keep in mind, however, that more elaborate options will likely necessitate the use of the Designer environment in the Business Intelligence Development Studio.
In addition to mappings between data types implemented by disparate data stores, SSIS might be forced to carry out internal data conversion (since its default mapping with a source might result in a different data type than the one with its destination). The rules governing such behavior are established based on the content of TypeConversion.xml file, located in the Program Files\Microsoft SQL Server\100\DTS\binn folder (just as with the mapping files, you can modify its content to implement desired functionality).
Let's now take a look at the role of these two mechanisms in exporting SQL Server-based data. To demonstrate this process, we will copy the content of a table from the AdventureWorksDW database to an Excel spreadsheet. To carry out this exercise successfully, make sure that you are running 32-bit version of the Import and Export Wizard (since the Excel driver is not available in 64-bit format). Once you launched it, point to the installation of SQL Server 2008 (or 2008 R2) on the Choose a Data Source page, specify the appropriate authentication method, and select AdventureWorksDW in the Database listbox. Next, on the Choose a Destination page, highlight Microsoft Excel entry, set an arbitrary Excel file path, and assign the Excel version to Microsoft Excel 2007 (note that you also have an option to designate that First row has column names). Pick Copy data from one or more tables or views once you reach the Specify Table Copy or Query page. Mark the checkbox next to the [dbo].[DimReseller] table in the Tables and views section of the Select Source Tables and Views page. Clicking on the Next> command button will present you with the Review Data Type Mapping page, divided into two sections with the upper one containing the list of tables (in our case, a single entry for [dbo].[DimReseller]) and the lower consisting of Data type mapping for each of its columns. In both of them, you will find icons providing visual clues indicating conversion status:
- green checkmark conveys a successful outcome (if the conversion is guaranteed not to cause a data loss or is not needed altogether, which you can establish by checking the state of the Convert checkbox)
- yellow exclamation point means that a conversion does take place and its processing should be reviewed (in order to determine whether a realistic possibility of data loss exists).
- red circle with a cross is a sign that existing data cannot be converted (which effectively will prevent you from executing the resulting package until this issue is resolved).
In addition, for each of the source and destination column pairs, you have the ability to specify the action to be carried out (Fail or Ignore) in response to data truncation and error condition.
Double clicking on individual entries in the Data type mapping section will display the Column Conversion Details window, where you can find more about changes that will be applied during the copy operation. In addition to the list of SSIS data types that have been auto selected for source and destination mappings (along with the names of their respective providers and mapping files that served as basis for this selection), the information presented also contains conversion steps (if applicable), as well as the reference to the file used to determine underlying conversion rules (i.e. Program Files\Microsoft SQL Server\100\DTS\binn\TypeConversion.xml). While it is possible to reject these default settings directly from the Review Data Type Mapping page (by clearing the checkbox in the Convert field for every column that data type conversion behavior should be adjusted manually), you will not be able to execute the package at that point (since the Run immediately entry on the Save and Run Package page will be disabled). Instead, you will be forced to save it first, using options presented on the Save and Run Package page. (Afterwards, you can edit the saved package via Business Intelligence Development Studio or by direct modifications to the .DTSX file).
Save the resulting SSIS package as an .DTSX file and open it in the Designer interface of Business Intelligence Development Studio (via File->Open-File... menu). As you can see, its structure is a bit more complex than the one created automatically when exporting a table from AdventureWorksDW database to a text file that we described in our earlier article. Its Control Flow consists of two tasks, named Preparation SQL Task 1 (which creates DimReseller sheet in the destination Excel workbook) and Data Flow Task 1 (handling actual data transfer and conversion). By double-clicking on the latter, you will be automatically redirected to the Data Flow tab. A quick examination will reveal that, in addition to OLE DB Source labeled Source - DimReseller (providing access to the [dbo].[DimReseller] table in AdventureWorksDW database) and Excel Destination called Destination - DimReseller (facilitating connection to the DimReseller sheet in the target Excel workbook), the tab also contains Data Conversion 0 - 0 transformation, which handles data conversion according to rules defined in the Program Files\Microsoft SQL Server\100\DTS\binn\TypeConversion.xml file.
If your intention is to modify these rules, you can accomplish this by selecting the appropriate entries in the Data Type listboxes for each Input Column and Output Alias combination listed in the file. In addition, for each such pair, you have the ability to specify (wherever applicable) their Length, Precision, Scale, and Code Page values.
Unfortunately, reversing this process by attempting to import data from Excel tends to be considerably more complex and error prone, primarily due to a lack of metadata that would ensure consistency of its data types. One of the more common issues surfaces in situations when an imported column contains values that can be interpreted as either numbers or text. OLE DB Provider for Jet 4.0 (with support of Excel ISAM driver) determines the best match based on a rather superficial analysis of the first few rows of data and assigns NULL to all those that do not match its selection. Similar algorithm is applied to text columns, which, depending on their length, are identified as containing either String (if less than or equal 255 Unicode characters) or Memo (otherwise) data type. Such potential problems are flagged in the Import and Export Wizard, so it is up to you to decide how you want to resolve them (note that this constitutes an improvement from earlier versions of SQL Server, where these decisions were made implicitly by the wizard in an arbitrary manner, introducing a possibility of data loss).
One way to address these problems is to explicitly set the Import Mode in the Extended Properties section of the connection string of the Excel Connection Manager (by adding IMEX=1 entry enclosed within a couple of semicolons). This can be done using its Properties window in the Designer mode of Business Intelligence Development Studio or by direct modification to the relevant XML element in the corresponding .DTSX file, (it is also possible to modify connection strings from the Connection Managers node of DTSExecUI.exe utility). Another approach involves increasing the number of rows sampled by the OLE DB provider by assigning a larger value (by default set to 8) to TypeGuessRows entry in the HKLM\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. In some cases however, these methods will not suffice, forcing you to resort to other workarounds or to use additional Data Flow transformations (which we will be covering in future articles). For additional factors that should be considered when dealing with Excel-based data during SSIS import and export operations, refer to SQL Server 2008 Books Online Excel Source and SQL Server 2008 Books Online Excel Destination articles in SQL Server 2005 Books Online.