SQL Server 2005 Integration Services - Checkpoints - Part 26

Friday Jun 9th 2006 by Marcin Policht

Part 26 of this series presents the checkpoint functionality. This feature, missing from Data Transformation Services in SQL Server 7.0 and 2000, can help you save you a significant amount of time and resources when rerunning packages, which partially failed during their initial execution.

In our previous article of this series, we discussed transactional support in SQL Server 2005 Integration Services. Such support is critical in order to ensure that arbitrarily designated Control Flow tasks are executed as a single logical unit, which either completes successfully or is rolled back in its entirety. Demand for this functionality exists in a variety of scenarios, whenever coherence among individual, inherently independent processes needs to be enforced. However, transactions fail to provide sufficient resolution in situations where it would be preferable to avoid rerunning certain processes, due to their significant computational cost (or which, by virtue of changes they introduce, should be performed only once). As a matter of fact, including them in a transaction might potentially cause this cost to increase if another task that is part of the same transaction fails (since this would trigger potentially equally expensive rollback). Excluding them from the scope of transaction does not prevent them from being executed again if the package needs to be relaunched. Fortunately, there is a new mechanism known as checkpoints, which helps to address this very issue.

While SQL Server has been relying from its inception on the concept of checkpoints for transactional support and recovery, its Integration Services implementation has been introduced as part of SQL Server 2005 release. In its new incarnation, checkpoints can be applied to Control Flow tasks (which implies that they can not be associated with individual components of the Data Flow) and functions by recording information,  at the time of failure, about the restart point (in addition to execution results of all completed containers) in a text file, whose location and name you arbitrarily assign. By referencing recorded content, the SSIS run-time engine can avoid rerunning all tasks that have already completed. (The checkpoint file also contains package configuration, as well as the most recent values of system and user-defined variables, in order to ensure that the package uses the same settings during subsequent reruns).

While checkpoints are applied to individual Control Flow tasks, some of their properties are configured on the package level. To view them, right click on the empty area of the Control Flow tab of the SQL Server 2005 Business Intelligence Development Studio and select Properties from the context sensitive menu. In the Properties window, scroll to the Checkpoints section. From here, you have an option to control the following settings:

  • SaveCheckpoints - must be set to True (the default is False) in order to enable checkpoint support,
  • CheckpointFileName - designates the location and name of the checkpoint log file. This must be configured in order for checkpoint functionality to work as expected.
  • CheckpointUsage - determines package behavior following the restart after a failure and takes on one of three possible values - Never (precludes use of checkpoints, regardless of configuration of the other two properties), Always (enforces checkpoint use providing that value of SaveCheckpoints property is True and the file that CheckpointFileName points to actually exists. If the latter condition is not satisfied, package execution fails), and IfExists (just as the Always setting, mandates checkpoint use if SaveCheckpoints has been enabled and CheckpointFileName property has a valid value. Unlike Always, following package failure, it will simply launch it from the beginning, in cases where the checkpoint file does not exist).

In addition, you also need to set the FailPackageOnFailure property (in the Execution section of the Properties window) to True for each task and container to which you want to apply checkpoint functionality. This way, their failure results in package termination and its subsequent launch starts by initiating the same checkpoint-enabled component that did not complete successfully (rather than running from the beginning of the Control Flow). Note that since containers are viewed in this case as individual units of work, their entire content (including all child containers and tasks within their scope, even if they already have been executed) will be processed again.

Now that we have presented the general concept and configuration settings of transactions and checkpoints in the context of SQL Server 2005 Integration Services, let's take a look at a simple example demonstrating their use. Launch Business Intelligence Development Studio and create a new project using the Integration Services Project template. Our sample package will consist of three Execute SQL tasks connected via Precedence Constraint to ensure their sequential execution. To implement it, drag the icon representing SQL Execute Task to the Control Flow area of the SSIS designer. Display its Editor window by right clicking on its outline and selecting the Edit option from the context-sensitive menu. Verify that the ConnectionType in the SQL Statement area of the General section is set to OLE DB, and configure a new connection to the AdventureWorks database on your instance of SQL Server 2005. Choose Direct input in the SQLSourceType entry, and type the following in the SQLStatement text box:

UPDATE HumanResources.Employee
SET SalariedFlag = 1
WHERE (EmployeeID = 1)

Next, drag another Execute SQL Task icon from the Toolbox into the Control Flow area, directly below the first one and connect the two via Precedence Constraint with its default settings. From its Editor dialog box, set ConnectionType and Connection entries identically to the first task and enter the following statement (which will fail due to the datatype mismatch) as the value of SQLStatement property with SQLSourceType set to Direct input:

UPDATE  HumanResources.Employee
SET  VacationHours = 'five jillion'
WHERE  (EmployeeID = 1)

Repeat the same procedure for the third Execute SQL Task, this time assigning the following value to the SQLStatement property (which copies to the row of data for Employee with ID of 2 number of VacationHours assigned to the employee with ID of 1):

UPDATE HumanResources.Employee
SET VacationHours = 
 (SELECT VacationHours
 FROM HumanResources.Employee
 WHERE (EmployeeID = 1))
WHERE (EmployeeID = 2)

Now, let's check how the various transaction settings might affect the outcome of execution of this package. Start by assigning Required value to the TransactionOption on the package level (to display the appropriate Properties window, right-click on the Control Flow area of the SSIS designer outside of any specific component, such as a task or container, and select Properties entry from the context sensitive menu) and selecting the Supported value for the same property for each of the tasks. Launching the package will result in the failure of the second task (as expected) triggering rollback of the changes introduced by the first task (since both are part of the same transaction initiated on the package level) - so the original value of SalariedFlag does not change. Note that the result would be different if you left the default value of the package TransactionOption (NotSupported) and set each of the Execute SQL Tasks with Required value, since this would result in three separate transactions (so failure of the second would not trigger rollback of the first).

Revert to the original settings, by assigning NotSupported value to the TransactionOption package property and Supported to all three Execute SQL Tasks. If you execute the package at this point, it will complete the first task successfully and fail on the second. Unlike previously though, the change to the SalariedFlag entry is permanent. Despite this, relaunching the package will still result in exactly the same sequence of actions, including execution of the first task. To eliminate this unnecessary step, assign a valid path and file name to the CheckpointFileName package property, set Checkpoint usage to IfExists, and SaveCheckpoints to True. Next, set the FailPackageOnFailure for the second task to True. Once configuration changes are made, start execution again. The initial run will be identical to the ones we have experienced previously. Successful completion of the first task is followed by the failure of the second. However, you should also notice that an XML formatted checkpoint file, whose name and location you specified as the value of the CheckpointFileName property, has been created and populated with relevant information. If you run the package again, the first task will be bypassed and flow of execution will start from the second one.

» See All Articles by Columnist Marcin Policht

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved