In my last article of this series, I talked about Change Data Capture feature of SQL Server to capture DML changes happening on the tracked table and explained how it works. Then I talked about how it differs from Change Tracking. In this article, I am going to demonstrate practically how this feature can be leveraged.
Getting Started with Change Data Capture (CDC)
As discussed in my last article, you first need to enable CDC at the database level using the sys.sp_cdc_enable_db system procedure, which creates the change data capture objects that have database wide scope, including meta-data tables and required Data Definition Language (DDL) triggers. It also creates the CDC schema and CDC database user. You can verify if the database that is enabled for CDC is not by looking at the is_cdc_enabled column for the database entry in the sys.databases catalog view or by executing the following script. You need to be a member of sysadmin fixed server role in order to execute the below script to enable CDC at the database level:
USE master GO CREATE DATABASE LearningCDC GO USE LearningCDC GO --This command can be executed by a member of the sysadmin fixed server role EXECUTE sys.sp_cdc_enable_db; GO SELECT is_cdc_enabled, * FROM sys.databases WHERE name= 'LearningCDC' GO
Is CDC Enabled?
If you look under System Tables, you will notice many tables have been created after enabling Change Data Capture at the database level:
Many Tables Created
Next as a member of the db_owner fixed database role, you can enable CDC for the required table (a capture instance) using the sys.sp_cdc_enable_table system stored procedure. When CDC is enabled for a table, a CDC table (table which keeps track of history of changes along with meta-data about changes) and two table-valued functions are generated along with capture and cleanup jobs for the database if this is the first table in the database to be enabled for CDC. You can verify if the table is enabled for CDC by looking into the is_tracked_by_cdc column of the sys.tables catalog view.
Let’s first create a table and load some data with the following script:
CREATE TABLE dbo.Employee ( EmployeeID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(100), LastName VARCHAR(100), CurrentPayScale DECIMAL ) GO INSERT INTO dbo.Employee(FirstName, LastName, CurrentPayScale) VALUES ('Steve', 'Savage', 10000), ('Ranjit', 'Srivastava', 12000), ('Akram', 'Haque', 12000) GO SELECT * FROM dbo.Employee GO
Create First Table
Execute the script below to enable CDC on the above created table. With @role_name parameter, you can specify the name of the database role used to gate access to changed data; if it already exists then it is used, or else an attempt is made to create a database role with this name. With @supports_net_changes parameter you can enable support for querying for net changes. This means all changes that happen on a record will be summarized in the form of net change. By default its value is 1 if the table has a primary key or the table has a unique index that has been specified with @index_name parameter otherwise, the default value is 0. You can find more detail about its parameters here:
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Employee', @role_name = N'MyCDCUserRole', @supports_net_changes = 1 GO SELECT is_tracked_by_cdc, * FROM sys.tables WHERE name = 'Employee' GO
Now let’s run some DML statement against the CDC enabled table with this script:
INSERT INTO dbo.Employee(FirstName, LastName, CurrentPayScale) VALUES('Ahmad', 'Jamal', 10000) GO DELETE FROM dbo.Employee WHERE EmployeeID = 2 GO UPDATE dbo.Employee SET CurrentPayScale = 15000, FirstName = 'Akramul' WHERE EmployeeID = 3 GO UPDATE dbo.Employee SET CurrentPayScale = 18000 WHERE EmployeeID = 3 GO
When you enable CDC for a table, SQL Server creates the table (with this naming convention: cdc.<capture instance>_CT) and keeps recording DML changes happening to the tracked table in this table. For example, in the current example here is the result-set of changes captured for the above changes:
SELECT * FROM [cdc].[dbo_Employee_CT] GO
Result-set of Changes
As you can see above, along with the data there are some columns which capture meta information about the changes. For example, __$operation column captures the DML operation needed to apply the row of change data to the target data source. Valid values are 1 = delete, 2 = insert, 3 = value before update and 4 = value after update. __$update_mask is a bit mask representing columns that were changed during DML operations. It means delete (__$operation = 1) and insert (__$operation = 2) operation will have value set to 1 for all defined bits whereas for update (__$operation = 3 and __$operation = 4) only those bits corresponding to columns that changed are set to 1.
When you enable CDC, several functions are created to return changes. For example, cdc.fn_cdc_get_all_changes_<capture_instance> function returns one row for each change applied to the CDC tracked table within the specified log sequence number (LSN) range. If a source row had multiple changes during the specified range interval, each change is represented in the returned result set whereas cdc.fn_cdc_get_net_changes_<capture_instance> function returns one net change row for each source row changed within the specified LSN range. That is, when a source row has multiple changes during the specified LSN range, a single row that reflects the final content of the row is returned.
For example, as you can see below cdc.fn_cdc_get_all_changes_dbo_Employee function returns two rows for two updates of EmployeeID = 3:
DECLARE @MinimumLSN binary(10), @MaximumLSN binary(10) SET @MinimumLSN = sys.fn_cdc_get_min_lsn('dbo_Employee') SET @MaximumLSN = sys.fn_cdc_get_max_lsn() SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Employee (@MinimumLSN, @MaximumLSN, N'all'); GO
Two Rows Updated
Whereas cdc.fn_cdc_get_net_changes_dbo_Employee function returns net or final changes for EmployeeID = 3 in a single record:
DECLARE @MinimumLSN binary(10), @MaximumLSN binary(10) SET @MinimumLSN = sys.fn_cdc_get_min_lsn('dbo_Employee') SET @MaximumLSN = sys.fn_cdc_get_max_lsn() SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Employee (@MinimumLSN, @MaximumLSN, N'all'); GO
Final Changes in a Single Record
The above functions expect starting LSN and ending LSN and returns the changes between these two LSNs. You can use sys.fn_cdc_get_min_lsn to get start LSN for the specified capture instance and sys.fn_cdc_get_max_lsn to get ending LSN from cdc.lsn_time_mapping system table.
Sometimes, you would like to pull data based on a time range instead of LSN range and hence you can use the sys.fn_cdc_map_time_to_lsn function to get start LSN from cdc.lsn_time_mapping system table for the specified time.
CDC Jobs and Cleanup Process
You can disable CDC for either each individual table or disable it at the database level. As I said before, for each CDC enabled database there will be two jobs created as shown below. The first job captures the information from the SQL Server transaction log as it works in asynchronous manner whereas the second job cleans up the tracked table. The cleanup process occurs every three days by default (this is configurable and can be changed as per specific needs). For more intense environments, where you want to directly manage the CDC table cleanup process, you can leverage the manual method using the system stored procedure sys.sp_cdc_cleanup_change_table. When we execute this system procedure you need to specify the low LSN and any change records occurring before this point are removed and the start_lsn is set to the low LSN we specified.
SQL Server Agent
As these jobs are essential for CDC to work properly, SQL Server Agent must be running.
Disabling Change Data Capture (CDC)
You can disable CDC for either each individual table or disable it at the database level, which in effect will disable CDC for all the tables of the given database. You need to be a member of the db_owner fixed database role to disable CDC at the table level and a member of the sysadmin fixed server for disabling at database level.
--To disable CDC for a table in a database for a given capture instance EXEC sys.sp_cdc_disable_table @source_schema = N'dbo', @source_name = N'Employee', @capture_instance = N'dbo_Employee' GO --To disable CDC for the database in the context EXEC sys.sp_cdc_disable_db GO
Disabling CDC at the database level removes all associated CDC metadata, including the CDC user and schema and the CDC jobs.
You can execute the scripts below to get more information about CDC configuration:
--Returns CDC configuration information for a specified schema and table EXECUTE sys.sp_cdc_help_change_data_capture @source_schema = N'db', @source_name = N'Employee'; GO --Returns CDC configuration information for all tables in the database EXECUTE sys.sp_cdc_help_change_data_capture GO
More Information about CDC Configuration
In this article I demonstrated how you can leverage the Change Data Capture feature of SQL Server to track DML changes on the source table and how you can pull data incrementally from the tracked table.