SQL Server 2000 DTS Part 9 - DTS Designer Tasks

Tuesday Feb 10th 2004 by Marcin Policht
Share:

The last installment of this series discussed the basic principles and applications of the Transform Data task. This article continues, focusing on its more advanced features.

In our previous article, we started discussing basic principles and applications of the Transform Data task. We will continue this topic here, focusing on its more advanced features. So far, we have described the purpose of four of the tabs of the Transform Data Task Properties dialog box in the most basic scenario, without resorting to the use of multiple processing phases.

The last tab of the Transform Data Task Properties dialog box, labeled Options, provides a number of choices controlling error handling and more advanced data transformation settings. Errors can be recorded in an exception file, with an arbitrarily assigned location and name. Its type can be either set to be compatible with SQL Server 7.0 implementation (in case you are using parsers written for this version) or customized by selecting separate files for error messages, source, and destination rows (with .source and .dest extenstions, respectively) whenever transform fails. In either case, you can assign formatting details, such as characters used for row and column delimiters, as well as text qualifiers. The next section on the Options tab determines data movement parameters, such as maximum error count (maximum number of errors that can occur before a task is terminated), fetch buffer size (number of rows fetched from a source at a time), first, and last rows (allows limiting the size of transformation, skipping header and footer rows, and resuming transfer in case of a partial failure). Finally, the last section on the Options tab, applicable strictly to SQL Server data transforms, allows you to force fast data load (which is recommended from the performance point of view, although it affects recoverability and should be followed by a full backup). Once enabled, this setting makes it possible to configure a number of suboptions, such as transferring NULL values to a destination column with a default value assigned, or behavior relating to constraints check, table locking or identity insert. You can also specify for data to be processed in batches. The value of the batch size parameter indicates the number of rows to be considered as a single unit (when set to 0, the entire data is loaded in one batch). We will explain the significance of this setting shortly.

The main advantage of the SQL Server 2000 implementation of the Transform Data Task over its SQL 7.0 predecessor is its granularity. The entire data transformation process can be divided into several distinct phases. By default, however, this functionality is hidden. In order to expose it, you need to go back to the SQL Server Enterprise Manager interface and select the Properties item from the context sensitive menu of the Data Transformation Services folder. This will display the Package Properties dialog box, with the Designer section containing a single checkbox labeled "Show multi-phase pump in DTS Designer." Note also, the other settings available within the same page (even though they are not related directly to tasks, you will find them useful in other situations). In particular, you can turn on caching in order to speed up the process of creating and opening packages in the DTS Designer (this setting causes caching of DTS tasks, transforms and OLEDB providers registered on the SQL Server, so they do not have to be reloaded every time Designer is launched). Keep in mind, though, that if you mark the corresponding checkbox, you will need to use the Refresh Cache button (or restart the Designer) in order to make newly installed providers available. Enabling the debugging option simplifies troubleshooting ActiveX scripts by employing the default Windows debuggers in error handling. Finally, you can also allow saving packages to Meta Data Services, which we will be discussing in a future article of this series.

Once the "Show multi-phase pump in DTS Designer" checkbox is marked, click on the OK button, open the Local Packages folder, and open the package we created previously (for steps describing its creation refer to the previous article of this series) in the DTS Designer window. Bring up the Properties dialog box for the Data Transform Task within this package. Make sure that the Source tab contains the following SQL query:

SELECT Shippers.ShipperID, Shippers.CompanyName, Shippers.Phone, COUNT(OrderID) AS Orders
FROM Shippers
INNER JOIN Orders
ON ShipVia = ShipperID
GROUP BY Shippers.ShipperID, Shippers.CompanyName, Shippers.Phone

Since we used tempdb as our destination data store, it is likely that you will need to recreate the target table (otherwise you will receive an error message indicating that the target table does not exist). To verify whether this is the case, display the content of the Destination tab. If the table structure is not listed there, click on the Create... button and use the following SQL statement:

CREATE TABLE [ShippersOrders] (
[ShipperID] int NOT NULL, 
[CompanyName] nvarchar (40) NOT NULL, 
[Phone] nvarchar (24) NULL,
[Orders] int)

Next, switch to the Transformations tab. First, notice an additional, previously not present list box labeled "Phases filter." By default, it will contain the "Row transform" entry, since this is the only one that existed in SQL Server 7.0 and one that is exposed by default in the SQL Server 2000 implementation. However, you can easily verify that the list box contains a number of other entries, corresponding to all phases (and sub-phases) of the Transform Data Task. In order to explore the full functionality of multiphase data pump, we will use ActiveX Script, so remove any existing transformations with the help of the Delete button. Click on New, ensure that ActiveX Script entry is selected in the Create New Transformation dialog box and confirm your choice. In the Transformation Options dialog box, you will notice an additional tab named Phases. From here, you can specify which phases (and sub-phases) your transform will be using. As indicated before, only the Row transform function option is marked. We will modify it shortly. Modify the content of the Source and Destination tabs so that all available columns are listed as selected. Switch to the General tab and click on the Properties... button. This will display the ActiveX Script Transformation Properties dialog box, with all phases listed on the Phases tab (tabs appear in the lower portion of the dialog box). Select the checkboxes next to each of them and auto generate the relevant code with the Auto Gen. button. As soon as this happens, the code will appear in the code window on the right hand side.

As you can see within the displayed window, the multiphase data pump is divided into the following phases and sub-phases:

  • Pre Source Data - The first phase, executed only once per task, before the first row is fetched from the source. This can be used, for example, to create a log file, initialize data, or write a header row.
  • Row Transform - The second phase, encompassing the functionality of the entire SQL Server 7.0 Transform Data task, launched every time a new row is retrieved from the data source and completed prior to writing it to the destination.
  • Post Row Transform - The third phase, performing additional processing depending on the outcome of the second phase. This outcome determines which one of its sub-phases (On Insert Success, On Insert Failure, On Transform Failure) is invoked. On Insert Success is triggered when processing was successful, On Insert Failure indicates problems at the destination (such as inserting null data in a non-nullable column), and Transform Failure is commonly the result of type conversion errors (e.g. character to integer). Post Row Transform is typically followed immediately by the Row Transform phase, fetching the next row from data source, although whether this happens depends on a couple of factors. More specifically, Transform Failure or Insert Failure could result in task termination, depending on the value of the (previously discussed) max error count parameter (configurable from the Options tab of the Transform Data Task Properties dialog box). Other possibilities include the Batch Complete phase, if the batch size has been reached (discussed next) or the Post Source phase if the last row has been processed.
  • Batch Complete - The fourth phase, executing once per batch, once it is completed. This is where the Insert batch size parameter, mentioned earlier in this article (available from the Options tab of the Transform Data Task Properties dialog box), comes into play. With this functionality, you can divide your entire data load into more manageable pieces and handling cleaning and record keeping tasks separately for each.
  • Pump Complete - This takes place after the last row of the data is processed. Since actions performed during this phase have no access to source or destination, they are used typically for data-independent cleanup activities, such as global variable manipulation (which can be used as error counters).
  • Post Source Data - The final stage of the task, which provides data access. It is frequently used to write footer rows to the destination (although it can also include a number of other activities, such as global variable cleanup).

Division of processing into phases provides several benefits. It offers more control over data manipulation as well as error handling and recovery. Transformation can be restarted from the point of failure, without reloading already processed data.

Let's take a look at a sample ActiveX Script demonstrating the multi-phase capabilities of the Transform Data Task. For the sake of example, let's assume we will use an existing transform, producing table consisting of the content of the Shippers table with the total number of orders for each ShipperID, using the batch size of two (which should produce two batches). We will record the processing information in a text file (which name is stored in a global variable "LogFile") including such values as "TotalBatches", "TotalRows", and "FailedRows". Prior to any customization, auto generated code of our ActiveX Script transformation has the following format (Function Main is executed during the Row Transform phase):

Function PreSourceMain()
	PreSourceMain = DTSTransformstat_OK
End Function

Function Main()
	DTSDestination("Orders") = DTSSource("Orders")
	DTSDestination("Phone") = DTSSource("Phone")
	DTSDestination("CompanyName") = DTSSource("CompanyName")
	DTSDestination("ShipperID") = DTSSource("ShipperID")
	Main = DTSTransformStat_OK
End Function

Function TransFailureMain()
	TransFailureMain = DTSTransformstat_OK
End Function

Function InsertSuccessMain()
	InsertSuccessMain = DTSTransformstat_OK
End Function

Function InsertFailureMain()
	InsertFailureMain = DTSTransformstat_OK
End Function

Function BatchCompleteMain()
	BatchCompleteMain = DTSTransformstat_OK
End Function

Function PumpCompleteMain()
	PumpCompleteMain = DTSTransformstat_OK
End Function

Function PostSourceMain()
	PostSourceMain = DTSTransformstat_OK
End Function

Start by creating the "LogFile" global variable from the DTS Package Properties dialog box (you can find more information on the subject in one of our earlier articles) of type string and set its value appropriately (e.g. "d:\logs\MPDPump.log"). Then modify the ActiveX Script so it matches the following code:

'  Copy each source column to the destination column
Function PreSourceMain()
	Const Overwrite = TRUE
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	Set oLogFile = oFSO.CreateTextFile(DTSGlobalVariables("LogFile").Value, Overwrite)
	DTSGlobalVariables("TotalBatches").Value = 0
	DTSGlobalVariables("TotalRows").Value = 0
	oLogFile.Close
	Set oFSO = Nothing
	PreSourceMain = DTSTransformstat_OK
End Function

'  Copy each source column to the destination column
Function Main()
	DTSDestination("Orders") = DTSSource("Orders")
	DTSDestination("Phone") = DTSSource("Phone")
	DTSDestination("CompanyName") = DTSSource("CompanyName")
	DTSDestination("ShipperID") = DTSSource("ShipperID")
	DTSGlobalVariables("TotalRows").Value = _
		DTSGlobalVariables("TotalRows").Value + 1
	MsgBox "Row #: " & DTSGlobalVariables("TotalRows").Value
	Main = DTSTransformStat_OK
End Function

Function TransFailureMain()
	TransFailureMain = DTSTransformstat_OK
End Function

Function InsertSuccessMain()
	InsertSuccessMain = DTSTransformstat_OK
End Function

Function InsertFailureMain()
	InsertFailureMain = DTSTransformstat_OK
End Function

Function BatchCompleteMain()
	DTSGlobalVariables("TotalBatches").Value = _
		DTSGlobalVariables("TotalBatches").Value + 1
	MsgBox "Batch #: " & DTSGlobalVariables("TotalBatches").Value 
	BatchCompleteMain = DTSTransformstat_OK
End Function

Function PumpCompleteMain()
	Const ForReading = 1, ForWriting = 2, ForAppending = 8
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	Set oLogFile = oFSO.OpenTextFile(DTSGlobalVariables("LogFile").Value, ForWriting)
	oLogFile.WriteLine "Total Rows:"  & vbTab & DTSGlobalVariables("TotalRows").Value
	oLogFile.WriteLine "Total Batches:"  & vbTab & DTSGlobalVariables("TotalBatches").Value
	oLogFile.Close
	Set oFSO = Nothing
	PumpCompleteMain = DTSTransformstat_OK
End Function

Function PostSourceMain()
	PostSourceMain = DTSTransformstat_OK
End Function

This trivial example, which displays rows and batch numbers as they are being processed, and writes the total number of batches and rows to a log file, should help you to create scripts that are more elaborate. Refer to the Phased Transformation Samples topic in the SQL Server 2000 Books Online for additional help on this subject (you will find there samples demonstrating TransformFailed and InsertFailed functions). In our next article, we will cover the remaining two DTS tasks.

» See All Articles by Columnist Marcin Policht

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved