A database administrator can create ad-hoc packages to implement ETL via SSIS or use a method based on software programming principles. Ad-hoc packages may seem advantageous but can lead to duplication of effort. Learn how to implement the second of these options by taking advantage of package variables.
When implementing Extraction, Transformation, and Loading tasks via SQL Server Integration Services, a database administrator generally has two options. One of them involves creating ad-hoc packages without much concern for their structure or future use (frequently relying on Import and Export Wizard) and the other is based on software programming principles, with emphasis on modularity and reusability. While the first approach may seem advantageous due to its simplicity, it is bound to lead to duplication of effort in the long run and is suited primarily for occasional use of SSIS. In this article, we will demonstrate how to implement the second of these options by taking advantage of package variables.
In our sample scenario, we will step through implementation of a package, which exports a range of rows (selected according to arbitrarily assigned criteria) from vEmployee view in the AdventureWorks database into a text file. Our intention is to show how to use variables to replace hard-coded values that determine row selection or identify data source and destination. This way, a request to change these values in the future can be carried out by simply modifying respective variables, eliminating the need to search for their references throughout the entire package. In our demonstration, we will first create a package with specific values assigned directly to relevant components. Afterwards, we will describe the process of replacing them with variables.
Start by creating a new project based on the Integration Services Project template in the Business Intelligence Development Studio. This will form the required folder structure within the location you specify, including a generic Package.dtsx file. Switch to the Data Flow tab of the Designer window and follow the hyper-linked instructions appearing in its center stating, "No Data Flow tasks have been added to this package. Click here to add a new Data Flow task". With the new Data Flow Task in place, drag and drop OLE DB Source from the Data Flow Sources section of the Toolbox onto the Designer interface, right-click on it and select Edit... from its context sensitive menu. In the resulting OLE DB Source Editor, use the New... command button next to the OLE DB connection manager listbox to display the Configure OLE DB Connection Manager window. Click on New... located in its lower portion to activate the Connection Manager window. Type in the name of the server hosting the source SQL Server instance in the Server name textbox, assign appropriate authentication method in the Log on to the server section, and specify AdventureWorks in the Connect to a database section underneath (you can use the Test Connection command button to verify that the resulting configuration is correct). Return to the OLE DB Source Editor window, select Table or view entry in the Data access mode listbox, followed by [HumanResources].[vEmployee] in the Name of the table or the view listbox. Switch to the Columns tab and choose the columns you intend to export (we will narrow down our selection to EmployeeID, FirstName, LastName, City, StateProvinceName, PostalCode, and CountryRegionName). Click on the OK command button to return to the Designer window.
As we mentioned earlier, we will limit rows copied to the target text file to those matching arbitrary criteria. For the sake of simplicity, we will use the content of the City column for this purpose, filtering out entries that are not equal to Seattle. To accomplish this, drag Conditional Split from the Data Flow Transformations section of the Toolbox, position it directly underneath OLE DB Source, and connect the two via Data Flow Path (represented by an arrow extending from the lower edge of the top component). Right click on the Conditional Split Transformation and use the Edit... option from its context sensitive menu to display the Conditional Split Transformation Editor window. In its lower portion, type a new Output Name (we will call it Local Employees, but the naming convention is purely arbitrary and simply represents the intended use) and define the condition that will determine whether an entry satisfies our criterion (which in this case should be set to City == "Seattle"). As the result, all rows with the City column set to Seattle will be redirected to the newly created output. Confirm your choices by clicking on the OK command button and return to the Designer interface.
To complete implementation of the first version of our package, drag Flat File Destination from the Toolbox and drop it directly underneath Conditional Split Transformation. Extend the arrow representing the Local Employees Data Flow Path to link them together. Use the Edit... option in the context sensitive menu of Flat File Destination to launch its Editor window. Click on the New... command button next to the Flat File connection manager listbox. You will be prompted to choose the format of the destination file (among Delimited, Fixed width, Fixed width with row delimiters, and Ragged right) first, followed by the file name and path (in the Flat File Connection Manager window). Once you have provided all necessary information, click on OK and return to the Designer window.
The procedure we just completed yields a fully functional package. You can test it by simply pressing F5, which will compile and execute its code, generating a text file (whose name and location you specified) containing data of AdventureWorks employees residing in Seattle. However, if we wanted to produce equivalent output for Redmond residents and store it in a different file (or reference another SQL Server instance), we would need to modify the properties of corresponding package components. This maintenance overhead can be to some extent mitigated by employing variables to store values of component properties that are subject to change.
In order to provide this extra flexibility, we will use variables to configure both connection managers (OLE DB and Flat File) as well as Conditional Split Transformation. In particular, srcServer and srcDatabase defined on the Package level will provide parameters necessary to establish OLE DB Source-based connection to vEmployee view. City scoped at the Data Flow Task level will serve as the comparison criteria in the Conditional Split Transformation. Package-wide dstFilePath will contain the file name and path of the Flat File Destination. Each of them will be part of the default User namespace (although it is certainly possible to define a custom namespace for this purpose).
To create Package-level variables, switch to the Control Flow tab of the Designer window, right click on any part of its unoccupied surface, and choose Variables from the context sensitive menu. (Alternatively, select the Variables entry in the SSIS top-level menu). In the resulting Variables window, click on the left-most toolbar icon and enter relevant data in the Name, Data Type (String in our case), and Value columns. To create component-level variables, select Data Flow Task on the Designer surface (or switch to the Data Flow tab) and repeat the same sequence of steps described above.
Now, we need to replace hard-coded values with references to the newly defined variables. In the case of connection managers, this is accomplished by using their Expressions property (appearing in their respective Properties windows, which you can activate by pressing the F4 key or by selecting the Properties Window option in the View top-level menu). In the Expressions entry, click on the ellipsis (...) button on its right hand side to display Property Expression Editor. Select ConnectionString in its Property column and use the ellipsis button again (this time in the Expression column) to activate Expression Builder. Note the Variables node in its upper left corner. Once you expand it, you should be able to find all of the package level variables, including those you just defined (which are easily identifiable by the User: prefix, designating their namespace). You can define the desired expression by either typing it directly in the Expression text box or by dragging its individual elements (such as variables, functions, or operators) from the top window panes. Assuming that you followed our naming convention, the expression representing the database connection string takes the form (wrapped here for better readability):
"Provider=SQLOLEDB;Data Source=" + @[User::srcServer] + ";Integrated Security=SSPI;" + "database=" + @[User::srcDatabase]
You would need to repeat the same sequence of steps for the Flat File Connection Manager, although in this case, the expression for its connection string should be set to @[User::dstFilePath]. In the case of Conditional Split Transformation, once you activate its editor window, replace the existing entry in the Condition column with UPPER(City) == UPPER(@[User::City]). (We are leveraging the UPPER function in order to make our comparison case-insensitive). As a result, our package remains fully functional, but changing its behavior requires only modification of variable values, rather than altering content or properties of its components.