SQL Server 2000 DTS Part 8 - DTS Designer Tasks - Transform Data Task

Friday Jan 23rd 2004 by Marcin Policht
Share:

The eighth article of this series examines the default behavior of the Transform Data Task followed with a discussion of its more advanced features, providing an explanation of why this task is often referred to as the Multiphase Data Pump.

A few more DTS Designer tasks are left to cover out of a total of 17 existing in SQL Server 2000 Data Transformation Services implementation. Arguably, the one that has undergone the most significant changes since the SQL 7.0 release is the Transform Data Task. In this article, we will look at its default behavior. We will follow with a discussion of its more advanced features, providing an explanation of why this task is often referred to as the Multiphase Data Pump.

In general, Transform Data Task copies data between source and destination data stores, transforming it during this process. The transformation can range from a simple column copy (which essentially does not perform any changes but simply copies content of a source column to a destination column) through file operations (reading and writing), string manipulations (case change, trimming), to ActiveX scripts (for more information on this subject, refer to our previous articles), which, in extreme cases, can involve custom COM objects.

Transform Data Task requires two existing relational data store connections. For the sake of simplicity, let's use the Northwind database as the source and tempdb as the destination (tempdb is convenient for testing since its content is cleared every time SQL server restarts). Once both connections are available, make sure that the data source is highlighted, and select the Transform Data Task item from the Task menu of the DTS Package Designer interface (or the corresponding icon from the toolbox on the left hand side of the window). When prompted for the destination data store, point to the other connection. To modify the properties of the task, double click on it or select Properties item from its context sensitive menu.

The Properties dialog box is divided into five tabs. The first two, appropriately named Source and Destination, allow you to designate data source and destination. While destination entry must contain a name of a table (this can be an existing or a new table), source data can be derived from a table, a view, or a SQL query. In addition, when using a SQL query, you can include in it parameters (with question marks as placeholders) which map to DTS global variables (for more information on DTS global variables refer to our earlier articles in this series). In our example, choose Shippers table as the source and create a corresponding table in the tempdb database. Transform Data Task will automatically attempt to match source and destination columns and apply Copy Column transform to each pair (this means that there is individual transform for each column in the data source). This is reflected by entries on the Transformations tab - each arrow from source to destination designates a separate transformation, named DTSTransformation__1, DTSTransformation__2, and DTSTransformation__3 respectively (you can see their names appearing in the Name listbox). If you click on Edit, while any of these transformations are selected, you will be presented with the Transformation Options dialog box. On the first tab, labeled General, you can find information concerning the type of transformation (Copy Column) and its sequence (a number indicating in which order this transformation will be executed). You can also change the transformation name by typing a new entry in the Name text box and alter columns mapping by clicking on the Properties command button. List of columns available in the resulting dialog box can be changed by modifying source and destination columns on the remaining two tabs of the Transformation Options dialog box (named appropriately Source Columns and Destination Columns). Once you have finished reviewing all available settings, click on OK to get back to the Transformations tab of the Transform Data Task Properties dialog box.

Typically, it is desirable from an efficiency perspective to limit the number of transformations. In our case, this can easily be accomplished by removing the three automatically created Copy Column transformations and replacing them with one, which includes all columns. To implement this, at the bottom of the dialog box click first on the Select All button and then on Delete All. At this point, you will notice that three arrows have been removed. Clicking on the New button will display the Create New Transformation dialog box. From there, choose the Copy Column option and click on OK. This will trigger the display of the Transformation Options dialog box. If you click on the Properties button, you will notice that this single transformation contains the mapping between all three columns in the source and destination tables. Once you click on OK, you will see the graphical representation of the transformation reflecting this new configuration. Use the Test button to test the outcome - you should receive confirmation that the package was executed successfully (you will also be able to view the results).

Now, let's see how we can accomplish the same result with a different approach using ActiveX Script. From the Transform Data Task Properties, use the Delete button to remove any existing transformations. Next, click on the Select All button and ensure that all columns in the Source and Destination are selected. Display the Create New Transformation dialog box with the New button and select the ActiveX Script entry. Click on OK. This will present you with the Transformation Options dialog box. Check its properties (using the Properties button). You will find that our steps resulted in the automatic creation of the following script:

Function Main()
	DTSDestination("ShipperID") = DTSSource("ShipperID")
	DTSDestination("CompanyName") = DTSSource("CompanyName")
	DTSDestination("Phone") = DTSSource("Phone")
	Main = DTSTransformStat_OK
End Function

As before, you can test it and you should receive confirmation of its successful execution. This shows you that you can use different transformations to arrive at the same result. While Copy Column method is faster, ActiveX Script offers more flexibility and can be used with lookups.

Lookups, configurable from the Lookups tab, allow you to retrieve a value or set of values via a DTS connection. If you intend to access the same database as Transform Data Task, you can use existing source or destination connections, however, you should keep in mind that a separate lookup connection offers the best performance. Each lookup entry consists of its unique name, the connection name, cache setting (number indicating how many lookup results are cached), and SQL Query that defines the lookup operation. The query is defined with DTS Query Designer and includes one or more parameters (designated by question marks and replaced with values derived from a data source or DTS global variables, when transformation is executed). More specifically, this is typically accomplished with the following ActiveX Script code:

DTSDestination("DestinationColumn") = DTSLookups("LookupQuery").Execute("ParameterList")

For example, let's imagine that we want to create a table at the destination that will contain, in addition to the information in the Shippers table, the total number of orders shipped by each company. This can be done with the following query:

SELECT	COUNT(OrderID) AS Expr1
FROM	Orders
WHERE	(ShipVia = ?)

Create this query on the Lookups tab and name it GetOrders. Next, alter the entry on the Destination table tab by creating a new table in tempdb called ShippersOrders using the following SQL Statement:

CREATE TABLE [ShippersOrders] (
[ShipperID] int NOT NULL, 
[CompanyName] nvarchar (40) NOT NULL, 
[Phone] nvarchar (24) NULL,
[Orders] int)

When you switch to the Transformations tab, you will notice that the Orders column appears in the destination table but it is not affected by the existing transformation. To change this, click first on Edit, then on the Properties button (in the Transformation Options dialog box). Finally, in the code section of ActiveX Script Transformation Properties dialog box, alter the code, so it looks like the following:

Function Main()
	DTSDestination("ShipperID") = DTSSource("ShipperID")
	DTSDestination("CompanyName") = DTSSource("CompanyName")
	DTSDestination("Phone") = DTSSource("Phone")
	DTSDestination("Orders") = DTSLookups("GetOrders").Execute(DTSSource("ShipperID"))
	Main = DTSTransformStat_OK
End Function

Executing this task will populate the extra column with values derived from the Orders table using the ShipperID column for each row in the source table. You should note, however, that in general it is better to avoid lookup queries for performance reasons and look for alternative solutions. For example, in our case, the same result can be obtained by specifying a different data source. Instead of using the Shippers table, we could define the following SQL Query:

SELECT Shippers.ShipperID, Shippers.CompanyName, Shippers.Phone, COUNT(OrderID) AS Orders
FROM Shippers
INNER JOIN Orders
ON ShipVia = ShipperID
GROUP BY Shippers.ShipperID, Shippers.CompanyName, Shippers.Phone

While we could still keep the same destination (ShippersOrders table in the tempdb database), we would alter the transformation and use a single Copy Column between data source and destination. The resulting Transform Data Task should perform more efficiently. Keep in mind that lookup queries should be used only when necessary (e.g. when some of the data resides in a non-relational database, stored procedure must be used, or joins generate exceptionally large amount of data).

In our next article, we will discuss the remaining configuration options of Transform Data Task and will look into its more advanced option that allows multi-phase data processing.

» See All Articles by Columnist Marcin Policht

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved