In our previous two articles of this series, we discussed auditing and security related aspects of SQL Server 2000 Data Transformation Services. Now, we will combine these topics by presenting DTS Meta Data Services, which provides the ability to audit creation, execution, and modification of DTS packages as well as track associated data changes. In particular, you can track package version history or data lineage. This information is stored according to the rules of DTS Information Model, which can be accessed and manipulated by both SQL Server Enterprise Manager and programmatic methods (for review of SQL Server 2000 DTS, refer to its coverage on the DatabaseJournal Web site.
As we mentioned in our previous article, the ability to save packages to Meta Data Services is disabled by default. This is intentional, since its use is associated with an increase in resource utilization and affects package execution speed. In order to alter the default setting, in the Package Properties dialog box, which appears after selecting the Properties entry from context sensitive menu of the Data Transformation Services node in SQL Server Enterprise Manager, you need to mark the checkbox labeled "Enable Save To Meta Data Services". This will make available a new option "Meta Data Services" in the Location listbox in the Save DTS Package dialog box, displayed when saving packages from the DTS Designer interface. When you choose this option, you will also have a chance to specify the target server where the package meta data information will be stored, authentication information for connecting to it, and scanning settings. These settings determine whether and to what extent information about databases and their objects referenced by DTS packages will be scanned and written into equivalent meta data structures. The main check box determines if such meta data should be created to begin with. Once you enable it, you also need to decide whether you intend to:
- rescan all information about data stores referenced in the package into the Meta Data Services even if such information is already present from previous scans (corresponding to the "Scan all referenced catalogs into repository" option) or use existing information (corresponding to "Use scanned catalogs if already present in repository"), which saves time and resources but should be used only if data structure has not changed between scans.
- perform the activity defined by the previous selection for all data structures referenced in the package (corresponding to "Scan catalog always") or limit it only to the ones which have not yet been scanned (corresponding to "Scan catalog if not already present in Meta Data Services").
You also have an ability to import meta data for a particular database (rather than for all databases referenced by a DTS package). This can be accomplished by selecting the "Import Metadata..." item from the context sensitive menu of Meta Data Services Packages and Meta Data nodes under the Data Transformation Services folder in the SQL Server Enterprise Manager. Importing databases will allow you to view their information by clicking on the Meta Data node. The display is divided into three categories, grouped under Browse, Lineage, and Package headings:
Browse section provides a listing of Database Meta Data, including database (its name, SQL Server instance name and version), schema (in the case of SQL Server, this refers to the database owner), tables, their columns and characteristics for each (datatype, length, scale, precision, and nullability). For each table and column, you can find DTS packages that use it as their data store connection, as well as related tables and columns that function as a source or a destination for its data. This is done by selecting Package and Source/Destination items from the context sensitive menu of the icon representing individual tables and columns. For this type of statistic to be available, you need to import relevant package information into the meta data as well (which we will discuss next).
The second section deals with lineage, which provides the ability to track changes to data. In general, SQL Server 2000 Meta Data Services offers two types of lineage:
row-level data lineage - identifies the source and destination for a particular piece of data (a table row) as well as all transformations that were applied to it (including DTS package execution information). This is accomplished by assigning a globally unique identifier (GUID) to each value, which is used for tracking purposes.
column-level data lineage - maintains a record of columns involved in transformations (without tracking individual values in each). This allows correlating references between packages, which might be using the same columns as source or destination for their transformations. It also provides package revision and execution history, informing who, where, and when creates, runs, and modifies packages.
In order to help you understand this concept, we will demonstrate implementation of both types of lineage using a sample DTS package based on a template called Template Demo.dtt, stored in Program Files\Microsoft SQL Server\80\Tools\Templates\DTS folder. To create a package, right click on the Data Transformation Services node in the SQL Server Enterprise Manager and select the All Tasks -> Open Template option from its context sensitive menu. Point to the target file in the Select File dialog box, click on the Open command button, and choose the New Package entry in the Select Package dialog box. This will present you with the DTS Designer interface, containing a fairly rudimentary sample package, using a single, undefined Transform Data Task and two OLE DB Provider SQL Server connections, one pointing to Northwind database as the source and the other referencing tempdb as the destination. To define transformation properties, double click on it, and from the Properties dialog box, select the Shippers table as its source (on the Source tab). You can click on the Preview button to verify that the source table exists and has some data in it (three rows by default). Next, switch to the Destination tab. This will generate a Package Error informing you that the table does not exist at the destination, which makes sense, since we have not created it yet. Acknowledge the error, click on the Create button and verify that T-SQL code presented there will create the Shippers table at the destination. Switch to the Transformations tab and note that, without any customization, Transform Data Task simply copies data from Source to Destination. Click on OK to close the Properties dialog box. If you execute the package at this point, you should receive confirmation of its successful completion. You can also easily verify that tempdb will contain the Shippers table, with its content identical to Shippers in the Northwind database.
Note, however, that information about the package execution has not been recorded into Meta Data Services and no lineage statistics are available. In order to be able to track row-level data lineage, we will need to implement the following steps:
1. Import the source and target databases into the Meta Data Services.
2. Switch to the DTS Designer interface and select the Properties option from the Package top-level menu. On the Advanced tab in the DTS Package Properties dialog box, mark the "Show lineage variables as source columns" and "Write lineage to repository" checkboxes, click on OK to confirm your choice, and return to the DTS Designer.
3. Double click on the Transform Data task arrow to display the Transform Data Task Properties dialog box. Switch to the Transformations tab. You will notice that the list of columns in the Source changed and includes DTSLineage_Full and DTSLineage_Short. The first entry will store the globally unique identifier (GUID) representing the package version, the second one is intended for version checksum (unlike with GUID, its uniqueness is not guaranteed - although still very likely).
4. Now, you need to create identical columns at the destination. To accomplish this, switch to the Destination tab, and click on the Create command button. You will notice that the T-SQL statement now contains two additional fields Lineage_Full (of type UNIQUEIDENTIFIER) and Lineage_Short (of type INT). Rename the table identifier in the CREATE TABLE statement of the Create Destination Table dialog box to ShippersMDS and click on the OK button. This will create an appropriately structured table at the destination (Note that we could have done this at the very beginning as long as the appropriate options were selected on the Advanced tab of the package Properties dialog box - we went through initial steps purely to demonstrate changes that take place as the result of activating these options).
5. Now you need to modify transform properties from the Transformations tab. To change the mapping between source and destination columns, first click on Select All and then on Delete All. This will remove existing mappings. To create a new one, make sure that all Source and Destination columns are highlighted and click on the New button. From the "Create New Transformation" dialog box, select Copy Column entry, and click on OK twice to return to the Transformations tab.
6. Click on OK to close the Transform Data Task Properties dialog box and return to the DTS Designer window.
7. Assign to the package a meaningful name and save it to Meta Data Services using the Save As item from the Package menu. Ensure you select the Meta Data Services as the target location.
8. Execute the package.
At this point, all relevant row-level lineage information should be available. For example, in the Lineage section of the Meta Data node of SQL Server Enterprise Manager, you can type either Lineage Long or Lineage Short values (you can extract them from ShippersMDS table in the tempdb) in the corresponding text boxes, and click on the Find Lineage command button. This will display information about the Package responsible for creation of the corresponding table entry. Similarly, you can obtain information about its source using the Source/Destination context sensitive menu option..
To implement column-level data lineage, all that is necessary is to ensure that you mark the "Resolve package references to scanned catalog meta data" checkbox in the Scanning Options dialog box when saving the package into Meta Data Services. In our case, you can choose "Scan all referenced catalogs into repository" and "Scan catalog always" options.
Package section displays information about packages saved to Meta Data, including such information as, for example, their name, version, author, creation time, lineage, and execution statistics (account used to launch it and launch time). You should be able at this point to see reference to the package we just saved and executed.
Note that when saving to Meta Data Services, you lose the ability to assign owner and user passwords. While, this makes your packages less secure, you have the ability to audit their use. In addition, keep in mind that this functionality is used primarily for troubleshooting more complex data warehousing scenarios, where keeping track of transformations via DTS Packages would be otherwise fairly difficult.