In the previous installment of our series, we discussed various approaches to debugging Control Flow components of SQL Server 2005 Integration Services. While this satisfied our primary goal of facilitating real-time monitoring and troubleshooting of the package workflow, it is important to note that majority of the techniques presented so far are not applicable when dealing with data manipulation, which takes place within the Data Flow portion of the SSIS packages (which separate design and implementation of control and data flows - unlike SQL Server 7.0/2000 Data Transformation Services). The purpose of this article is to demonstrate debugging methods, which are relevant within the context of SSIS Data Flow.
We will start with a quick overview of its characteristics, focusing on the ones that can be used for debugging purposes. The Data Flow interface within SSIS Designer of SQL Server 2005 Business Intelligence Development Studio features four main types of components:
- source components - serve as initial elements in the flow of data, extracting it from a variety of sources such as Data Reader (consuming data from .NET provider via ADO.NET-based connection manager), Excel (utilizing Excel connection manager to access data located in Microsoft Excel worksheets), Flat File (deriving data from delimited or fixed width text files using Flat File connection manager), Raw File (intended for data formatted by Raw File destination component and, therefore, requiring very limited processing), OLE DB (offering access to various OLE DB compliant data sources via OLE DB connection manager), or XML (capable of reading XML-formatted data files).
- destination components - function as outputs in the flow of data, delivering it to a variety of data store types. List of possible destinations is more extensive than sources (although it does not include XML files), providing, for example, the ability to process data using Analysis Services techniques (such as Data Mining Model Training, Dimension Processing, or Partition Processing).
- transformation components - perform modifications of data during its transition between source and destination components. Such changes might involve, for example, combining data from two (or more) sources, reversing this process by splitting it across multiple outputs, or aggregate and sort operations. In general, transformations belong to one of two categories - synchronous, for which modification of each individual input row results in exactly one output row (such as sorts) - and asynchronous, for which the number of input and output rows are different (such as aggregates).
- pipeline paths - connect the output of one Data Flow component (source or transformation) to the input of another (transformation or destination). Even though they might seem equivalent to precedence constraints, which provide an interface between a pair of Control Flow components, pipeline paths do not impose constraints, but simply channel data flow.
The Data Flow tab of the SSIS Designer shows the content of an individual Data Flow task, which constitutes an execution unit within the Control Flow (note that while an SSIS package has a single control flow, it can have any number of Data Tasks). Each of them is managed independently (you display the one you intend to work with by either double clicking on its graphical representation in the Control Flow area or by selecting Edit item from its context sensitive menu). To populate the Data Flow area, you simply drag and drop items from the Toolbox (which lists changes as soon as you switch away from the Control Flow tab). The list is divided into three categories - Sources, Transformations, and Destinations, which we presented earlier in this article. Within the second of these categories there is a Script Component, roughly equivalent to the Script Task, but with a fair share of unique characteristics.
If you review list of items available in the context sensitive menus of any of these components, you will notice that the Edit Breakpoints... option is not available. This is a clear indication that event based breakpoints, so useful for debugging Control Flow, did not find their way into the implementation of Data Flow. This apparently intentional design decision is likely the result of the fact that the majority of processing in this case is data driven, which switches the focus of debugging away from events. Breakpoints are also ignored when executing the Script Component code in the debugging mode, even though it is possible to create them by following the procedure that we described when discussing Script Task (since both of them utilize Microsoft Visual Studio for Applications for creating and modifying Visual Basic .NET code). There are, however, several alternatives, which compensate for lack of this functionality:
- color-coded component status indicators - are the simplest one to review and analyze and, in addition, are consistent with the ones associated with the Data Flow component execution status, with white (indicating that the component is idle, waiting to be invoked), yellow (designating running phase), green (marking successful completion), or red (signaling that errors were encountered during execution).
- pipeline path data record counter - provides a quick and convenient way to quantify the status of data processing by displaying the total number of data records transferred along pipeline paths.
Data Viewers - assigned to pipeline paths between two components,
to allow you to view the content of records currently passing through them. (This
way you can review data directly after it is extracted from its source, before
and after it is being processed by a transformation component, or directly
before it is stored in its final destination). Viewers are created from within the
Data Flow Path Editor window (which you can activate by either selecting Data
Viewers or the Edit item from the context sensitive menu of the data flow path
that you intend to debug). Once the Editor is displayed, with Data Viewers
section active, click on the Add... command button to display the tabbed window
of the Configure Data Viewer. On the General tab, you can select one of four
different types of viewing formats - Grid (with data displayed in a tabular
format, including all rows and arbitrarily chosen columns), Histogram (with all
values from a designated column displayed as a histogram - simplifying visual
analysis of their distribution), Scatter Plot (with data values from two
columns plotted on the X and Y axis, respectively - allowing comparing pairs of
numerical values) or Column Chart (with counts of individual values from a
selected data column appearing as separate columns in a chart). The label and
content of the second tab of the Configure Data Viewer window changes according
to the type of viewer currently selected on the General tab, giving you the
ability to designate a column (or columns) to be included in a grid, histogram,
plot, or chart. Obviously your choice will largely depend on the type of data
being processed (using scatter plot or histogram might not make sense in many
situations) - but note that you can assign multiple data viewers to the same
pipeline path if you desire to do so (using any variation of data columns for
each). After a Data Viewer is created, its visual indicator (a pair of glasses
with a grid behind it) appears next to the data path to which it has been
Following the package launch (triggered using the Start Debugging option from the Debug menu or by pressing the F5 function key), once the data flow reaches the point where the Data Viewer component has been added, a window with the configured viewer type (or multiple windows, if several different viewer types have been configured) will be automatically displayed within the Designer interface, one buffer at the time (a number of rows per buffer depends on their size). From within the Data Viewer window, you can click on the Continue button to load another buffer of data records, use the Detach button to proceed without further interruptions, or copy data to the Clipboard with the Copy Data button. Based on the statistics provided in the status bar, you can determine the number of rows and buffers that passed the data flow path.
- Progress pane of the Designer interface (which is automatically renamed to "Execution Results" as soon as the execution is completed), Output window (which you can activate from the Windows submenu of the Debug menu of the SSIS Designer interface), as well as other windows we described in our previous article (such as Locals, Watch, Command Window, Immediate Window, or Breakpoints) are also available and offer additional information about the Data Flow execution. Note that Data Flow Task is referenced in messages appearing in the Output and Progress windows as DTS.Pipeline. Interestingly, the Breakpoint window lists all of the data viewers you created.
- Row Count Data Flow Transformation component - provides the ability to obtain the count of rows processed within Data Flow. While this is essentially the same information that is already available based on the pipeline path data record counter (as well as being included in Data Viewers), the Row Count component offers two additional advantages. First of all, it stores the row count in a variable (you define it using an already familiar method - using Variables window within the SQL Server 2005 Business Intelligence Development Studio - and assign it to the Row Count component from the Component Properties tab of its Advanced Editor interface) that can be subsequently used within Control Flow or other Data Flow tasks. (Note, however, that the variable that stores the value calculated by the Row Count Data Flow Transformation component gets assigned its value after its Data Flow task has completed - this general rule applies to all SSIS variables which value is affected by execution of Data Flow). Furthermore, you can use methods described in our previous article to record it or follow steps described in the Microsoft Knowledge Base article KB 906560 to write it to the Windows Application Log. The other benefit of the Row Count component is its ability to serve as a temporary target for data during design and testing stages (despite the fact that it is not listed in the Destination components section of the Toolbox).
- Error Output of Data Flow components - The majority of data flow components (with a few exceptions of sources such as the Data Reader, transformations such as the Import Column or the OLE DB Command, or destinations such as SQL Server Mobile) support the error output feature (represented graphically by the red arrow pointing away from them), which allows customizing the execution behavior when an error is encountered. This might happen in the process of extracting data from its source, transforming its content (including data conversion or truncation), or loading to its destination. Error Output takes into consideration two types of situations involving data related issues - truncation (resulting in partial data loss, which in some circumstances might be expected or even desired) and errors (indicating typically a failure caused by incorrect or missing data as well as mismatched conversion operation). From the Error Output section (or Configure Error Output dialog box) of the respective component's Editor interface, you can specify for each data column what action should result, in case an error and truncation takes place. Your choices include Fail Component (the default option), Ignore Failure (delivering data to the regular output of transformation or source), or Redirect Row (redirecting problematic rows to error output - which can either lead to one of the Data Flow destinations or be further processed by one of the transformation components).
- Data Flow Transformation Script Component - equivalent of the Script Task in the Control Flow, gives you the ability to write Visual Basic .NET code executing in the context of Data Flow, serving as a custom source, destination or transformation component. While, as we mentioned before, breakpoints inserted into it are ignored during execution, there are several workarounds to this limitation. In our next article, we will describe in more detail the characteristics and functionality of this component, as well as how its features can be leveraged to provide more insightful debugging information.