SQL Server 2008 and 2008 R2 Integration Services - Slowly Changing Dimension Transformation - Historical Attributes

Wednesday Dec 14th 2011 by Marcin Policht

Recently we described the characteristics of Slowly Changing Dimension methodology, focusing in particular on its implementation in SQL Server 2008 R2 Integration Services. Its wizard-driven transformation does not offer full support for all existing change types. We continue our coverage of this subject by presenting an equivalent setup that eliminates this shortcoming.

In one of the recent tutorials published on this forum, we described characteristics of Slowly Changing Dimension methodology, focusing in particular on its implementation in SQL Server 2008 R2 Integration Services. Its wizard-driven transformation considerably simplifies configuration of all relevant Data Flow components, although it does not offer full support for all existing change types. For their comprehensive overview, you can refer to the aforementioned article, where you will also find  a description of a package that illustrates the processing of Type 1 changes (which simply overwrite existing rows during updates to dimension attributes, effectively precluding ability to provide historical context in subsequent analysis). We continue our coverage of this subject by presenting an equivalent setup that eliminates this shortcoming by preserving original records.

As before, our example will be based on a dataset containing various properties of products offered by AdventureWorks - a fictitious company manufacturing and selling bicycles (with sample databases downloadable from the CodePlex web site, where you will also find a detailed procedure describing their deployment). The resulting Analysis Services installation includes Product.dim dimension, corresponding to the [dbo].[DimProduct] table in the target AdventureWorksDW2008R2 Data Warehouse database, which we will update by adding entries to the [Production].[Product] table in the AdventureWorks OLTP database.

In order to demonstrate an impact of these updates, we will leverage the existing structure of the [dbo].[DimProduct] table, including Status (set to either Current or Expired values), as well as StartDate and EndDate columns, which can be used to identify whether a particular record represents a currently available product and identify the timespan of its availability. Effectively, such changes will employ the Type 6 methodology, which constitutes a hybrid of types 1, 2, and 3.

To accomplish our objective, launch Business Intelligence Development Studio and create a new project based on the Integration Services template. Switch to the Data Flow tab and activate the hyperlink displayed in the center of the main window stating No Data Flow task has been added to this package. Click here to add a new Data Flow task.

With the new Data Flow task in place and active, drag an OLE DB Source component from the Toolbox onto the Designer interface. Display its Editor window by selecting Edit... item from its context-sensitive menu. In the Connection Manager section, click on the New... command button and point to the target instance of the Adventure Works OLTP database. Once you have defined a new OLE DB connection manager, ensure that Table or view entry appears in the Data access mode listbox, and select [Production].[Product] table. Preview the input data to verify that your settings are correct and close the Editor window.

If you followed our previous tutorial describing the implementation of Type 1 updates using Slowly Changing Dimension transformation, you might recall that at this point we simply added it to our package, connecting its input with the output of OLE DB Source. This worked fine in the context of our simplistic example, whose sole purpose was to illustrate the basic setup that facilitates direct updates to non-key data in the underlying dimension table.

Unfortunately, using the same approach here will lead to rather confusing errors with our sample data. In order to address this issue, we will need to mitigate the absence of input values that map to non-nullable columns (in particular, Color, FrenchProductName, and SpanishProductName) in the target [dbo].[DimProduct] table in the AdventureWorksDW2008R2 Data Warehouse database. This can be mitigated by leveraging Derived Column transformation. Drag its icon from the Toolbox and position it under the OLE DB Source, extending the output of the latter until both are connected.

Display the Derived Column Transformation Editor window (by selecting Edit... entry in its context sensitive menu) and add the following to the lower portion:

  • FrenchProductName as Derived Column Name, <add as a new column> as Derived Column, "" as Expression, and Unicode string [DT_WSTR] as Data Type with 0 as its Length.
  • SpanishProductName as Derived Column Name, <add as a new column> as Derived Column, "" as Expression, and Unicode string [DT_WSTR] as Data Type with 0 as its Length.
  • Color as Derived Column Name, Replace 'Color' as Derived Column, ISNULL(Color) ? "NA" : Color as Expression, and Unicode string [DT_WSTR] as Data Type with 15 as its Length.

Next, add the Slowly Changing Dimensions Transformation to the Designer interface; position it directly under the Derived Column transformation, extending its Data Flow Path output until both components are connected. Double-click on the newly added transformation to trigger a wizard that guides you through all of the steps necessary to configure its settings (alternatively, you can use Edit... entry in its context sensitive menu).

On the Select a Dimension Table and Keys page, use Connection manager and Table or view listboxes to point to the target dimension table ([dbo].[DimProduct] in the AdventureWorksDW2008R2 Data Warehouse database). The wizard will automatically map columns in the input Data Flow Path (originating from the previously created OLE DB Source) and the dimension table as long as their names and data types are the same. Since, in our case, this does not cover the natural key, you will need to manually match the ProductNumber (as the Input Column) and ProductAlternateKey (as the Dimension Column) and specify Business key as the corresponding Key Type. Similarly, you can adjust other, not automatically correlated Input Columns entries with their Dimension Columns counterparts (starting with Name and EnglishProductName), but make sure that you leave the corresponding Not a key column field for each of them unchanged. Note that the Dimension Columns list does not include the primary key of the dimension table (functioning as the surrogate key) in order to accommodate changes which involve creating a new row with the identical value of natural business key.

Once you complete the process of column matching and confirm your choices, you will be prompted to decide how changes to column data will be handled (on the Slowly Changing Dimensions Columns page). In general, this involves assigning one of three possible Change Types to at least one of the non-key columns in the fact table or view:

  • Fixed Attribute - prevents any modifications to a given attribute by treating them as errors.
  • Changing Attribute - corresponds to Type 1 updates, resulting in replacing existing attributes with a new value included in the incoming data row.
  • Historical Attribute - applicable in situations where changes to designated non-key columns are saved in a new record, resulting in the existing one being marked as out of date.

Since our intention is to demonstrate Type 6 changes, we will use the third of these options (represented by the value of Change Type entry) and assign it to the SafetyStockLevel (selectable via the Dimension Columns drop-down list). For the sake of simplicity, we will limit our selection to a single column, although obviously it is possible (whenever applicable) to add others.

Next, you are presented with the Historical Attribute Options page, which requires you to choose between two methods of identifying rows containing out-of-date records. This page is displayed only if you specified one of the attributes to be Historical when setting the change type for Slowly Changing Dimension Columns. In order for this approach to work as expected, the dimension table needs to have a properly defined structure, allowing it to distinguish between up-to-date and out-of-date records. This can be handled in one of two ways:

  • The first method involves the use of a single column to show current and expired records. If you decide to follow this route, you need to choose the Column to indicate current record (which, in our case, happens to be Status in the dbo.DimProduct table), as well as Value when current (pick Current instead of True) and Expiration value (in our case, this is Expired, rather than False).
  • The second method allows you to use start and end dates to identify current and expired records. This is also a possibility when configuring our sample transformation, since its structure includes Start date column (StartDate) and End date column (EndDate). However, you are expected to designate the Variable to set date values (your choices include System::ContainerStartTime, System::CreationDate, and System::StartTime).

Options available on the Inferred Dimension Members page are intended to address situations where a fact table includes data that references entities not present in the dimension table. We will explore this scenario in one of our upcoming articles, so for the time being; simply make sure that Enable inferred member support checkbox is clear.

The last page of the wizard displays outputs that will be automatically configured by the transformation. If you selected only the Historical attribute option (as instructed), the list will include New Record Output, responsible for creation of new rows, and Historical Attribute Output, handling updates to existing ones.

This process yields a fully configured Slowly Changing Dimension transformation with both of these outputs. However, the overall structure of the autogenerated Data Flow task is more complex than the one associated with the Changing Attribute option.

The latter consisted only of a single OLE DB Command (processing T-SQL UPDATE statement that applies changes to the column you identified on the Slowly Changing Dimensions Columns page) and one OLE DB Destination (handling inserts of any new records, for which a match on the business key has not been found).

This one, on the other hand, includes a number of additional components. Historical Attribute Inserts Output leads to a Derived Column (its settings depend on the selection you made on Historical Attribute Options page - either preparing for assigning the current datetime value to EndDate column of an existing row or setting its Status to Expired). These modifications are subsequently carried out by the OLE DB Command, which executes the dynamically constructed UPDATE T-SQL statement against the target row, marking it as out-of-date.

The outgoing data flow is merged with the New Output data flow of the Slowly Changing Dimension transformation by the Union All component, leading to Derived Column and OLE DB Destination combo, responsible (respectively) for populating either the StartDate or Current Status fields of newly added rows and inserting them into the [dbo].[DimProduct] table. Note that the New Output will also contain those rows that have a matching business key and no changes in the columns you designated as historical attributes, as long as they are considered to be out-of-date in the dimension table, as indicated by their values in the EndDate or Status columns).

See all articles by Marcin Policht

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved