In this first CDC article, we will track the changes that take place as SQL Server 2008s Change Data Capture feature is started on a user table. Next month, well look at the performance of CDC.
My focus here is not just showing how CDC works on the surface (there are many articles that cover that), but rather to see what we can find out about the various moving parts within SQL Server that actually make CDC function. Understanding the internals will help in an overall understanding of CDC.
Introduction: SQL Server 2008 Change Data Capture
SQL Server 2008's CDC functionality reads the transaction log to record changes in system tables associated with each table for which CDC is enabled. It writes those files to system tables in the same database, and those system tables are accessible through direct queries or system functions.
Lets walk through some example code and see what changes in each step.
Tracking the Parts Using Example Code
Initially, we start with a brand new database. To follow the changes, open Object Explorer and expand these folders:
- SQLServer Agent | Jobs
- (Once the database is created) Security | Roles | Database Roles
- (Once the database is created) Programmability | Stored Procedures | System Stored Procedures
- (Once the database is created) Programmability | Functions | Table-valued Functions
CREATE DATABASE TestCdc
This gives us the new database, but there are no new jobs. Expand the Tables folder and the System Tables folder. Empty so far.
Enable Change Data Capture on Database TestCdc
USE TestCdc -- Turn on Change Data Capture at database level EXEC sys.sp_cdc_enable_db
This step takes a few seconds. Now refresh the System Tables folder. There are now six system tables:
Notice that five of these are part of the cdc schema.
There are also four new rather odd looking SQL inline table-valued function entries in the sys.all_objects table:
(I used a non-proportional font with spaces marked with ◦ to make it clear that there are embedded and trailing spaces in these names.)
Here is the query that will return those functions:
SELECT [name] FROM sys.all_objects WHERE [name] LIKE '%...%'
There are multiple new stored procedures:
There are still no new jobs or roles.
Create a New Schema and User Table
CREATE SCHEMA MySchema CREATE TABLE MySchema.Person ( PersonID int IDENTITY PRIMARY KEY, FirstName varchar(32) NOT NULL, LastName varchar(32) NOT NULL, UpdateCt int NOT NULL DEFAULT 0 )
This creates a new user table, but the table is not yet enabled for CDC.
This step does not affect the list of functions/procedures/roles were tracking.
Enable New User Table for CDC
EXEC sys.sp_cdc_enable_table @source_schema = N'MySchema', @source_name = N'Person', @role_name = N'ChangeDataAccessRole', @supports_net_changes = 1
This enables the new user table for CDC. Refresh the folders listed above. We now have
- one new system table (cdc.MySchema_Person_CT)
- two new jobs (cdc.TestCdc_capture and cdc.TestCdc_cleanup)
- one new database role (ChangeDataAccessRole)
- two new table-valued functions
- fn_cdc_get_all_changes_MySchema_Person (Books Online reference)
- fn_cdc_get_net_changes_MySchema_Person (Books Online reference)
- three new stored procedures:
You can actually get the text for the stored procedures (sp_helptext 'cdc.sp_batchinsert_389576426' [your number will likely be different]). Interesting reading. It is clear that this stored procedure is generated and specific to this user table.
If you enable a second table for CDC, you will get a second set of sp_batchinsert_n, sp_insdel_n, and sp_upd_n procedures.
System Table cdc.MySchema_Person_CT
This table has the following columns:
Details regarding this table are covered well in other articles including Books Online.
The capture job is very interesting as you dig into it. From sysjobs, we see that this job is owned by sa, is in the category REPL-LogReader and has a description of CDC Log Scan Job.
From sysjobsteps, we see that it has two steps:
Starting Change Data Capture Collection Agent
RAISERROR(22801, 10, -1)
Starting the Change Data Capture Collection Agent job. To report on the progress of the operation, query the sys.dm_cdc_log_scan_sessions dynamic management view.
Change Data Capture Collection Agent
There are two logical next steps after reading those details, but well leave the second until weve inserted some data.
sp_helptext N'sys.sp_MScdc_capture_job' SELECT * FROM sys.dm_cdc_log_scan_sessions
Procedures sys.sp_MScdc_capture_job and sys.sp_cdc_scan
sys.sp_MScdc_capture_job is the procedure that kicks off the whole CDC process. It first does some security checks and then calls sys.sp_cdc_scan, which does the real work. sys.sp_cdc_scan executes the change data capture log scan operation.
By default, sys.sp_cdc_scan loops continuously with a five-second delay between loops. Inside each normal loop, it calls sp_replcmds. (It is interesting to note that this is a replication procedure. Books Online says that calling this procedure with the default (1) will return the next transaction waiting for distribution. The default @maxtrans in this procedure is 500.)
Last, it reads from sys.dm_cdc_log_scan_sessions to prepare for a call to sp_sqlagent_log_jobhistory to record job history information.
Database Role ChangeDataAccessRole
If you open the Properties window for the new ChangeDataAccessRole role, you can see that the role is owned by cdc and has SELECT permissions on the two table-valued functions listed above and nothing else.
Digging into these details gives us insight into how SQL Server 2008s CDC is implemented. So far, we havent even inserted a single record into the new table. Well pick that up next time as well as show performance comparison results.
Thoughts or comments? Drop a note in the forum.
- Microsoft SQL Server 2008 - Change Data Capture Part I from Database Journal
- Microsoft SQL Server 2008 - Change Data Capture Part 2 from Database Journal
- Microsoft SQL Server 2008 - Change Data Capture Part 3 from Database Journal
- Microsoft SQL Server 2008 - Change Data Capture Part 4 from Database Journal
- Overview of Change Data Capture from SQL Server Books Online
- Change Data Capture from SQL Server Books Online
- Tuning the Performance of Change Data Capture in SQL Server 2008 from Microsoft TechNet
Download the .sql files for this article.SqlCredit - Developing a Complete SQL Server OLTP Database Project
- SQL Server 2008's Change Data Capture - Tracking the Moving Parts
- Performance Testing - SQL Server 2008 versus SQL Server 2005
- Exploring SQL Server's Index INCLUDEs
- Exploring SQL 2005's Ranking Functions - NTILE() and ROW_NUMBER()
- SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK()
- SqlCredit, Part 18: Exploring the Performance of SQL 2005's OUTPUT Clause
- SqlCredit - Part 17: Exploring SQL 2005's OUTPUT Clause
- SqlCredit - Part 16: The Cost of Bloat
- SqlCredit - Part 15: The Cost of Distribution
- SqlCredit - Part 14: The Cost of Translation
- SqlCredit - Part 13: More on Indexed Persisted Computed Columns
- SqlCredit - Part 12: Exploring Indexed Persisted Computed Columns
- SqlCredit - Part 11: Change Tracking Using History Records
- SqlCredit - Part 10: MAC Performance and Updating SqlCredit
- SqlCredit - Part 9: Message Authentication Codes
- SqlCredit Part 8: Comparing Encrypt/DecryptByCert and Encrypt/DecryptByKey
- SqlCredit Part 7: Performance Impact of EncryptByCert and DecryptByCert
- SqlCredit Part 6: Exploring EncryptByCert and DecryptByCert
- SqlCredit - Part 5: Adding Card, Vendor, and Purchase, Plus Much Refactoring
- SqlCredit - Part 4: Schema and Procedure Security
- SqlCredit - Part 3: Schema/Proc Updates and Automated Unit Testing
- SqlCredit - Part 2: Creating the Database, Tables, CRUD Procedures
- SqlCredit - Developing a Complete SQL Server OLTP Database Project