Snapshot Isolation level was introduced in SQL Server 2005 and has been available ever since. Snapshot isolation levels improve performance but there are some things to take into consideration when using this feature. Some people use it frequently as it minimizes blocking and improves performance/concurrency without knowing its impact on maintaining versions in tempdb, whereas some people stay away from it because of this extra overhead. Some people get confused about the two variants of snapshot isolation level (Read Committed Snapshot Isolation (RCSI) and Snapshot Isolation (SI)) and use one where the other is needed or vice versa.
In this article series, I am going to discuss what snapshot isolation levels are, their variants, why and when we should use them and how we should start using this feature with examples.
Understanding Snapshot Isolation Level
Isolation level controls how two or more transactions running simultaneously should be isolated from each other in terms of locking and blocking resources. Isolation level determines the level of concurrency and data consistency. Prior to SQL Server we had four isolation levels as briefly discussed below:
- Read Uncommitted – The transaction that uses this isolation level neither acquires shared locks to prevent others from modifying the data nor is blocked by conflicting locks acquired by other transactions. As this transaction can see the data that has been changed by other transactions but has not been committed, there is a possibility of dirty reads. This isolation level assures higher concurrency at the cost of data consistency.
- Read Committed – This is the default transaction isolation level in SQL Server and prevents dirty reads by not allowing reading of modified but not yet committed data by other transactions in the current transaction. A transaction with this isolation level acquires shared locks to prevent other transactions from modifying the data during read operation by that transaction. As a shared lock can be acquired only if there is no exclusive lock (needed for data modification) by other transactions, it ensures it reads only committed data.
- Repeatable Read – A transaction with Read Committed isolation level might face a problem of repeatable read, which means in a single transaction two data reads might see different sets of data if the data is changed between these two data reads. This happens as shared lock is acquired only while data is processed and released immediately after it. In other words, after the first data read the shared lock will be released and if the other transaction modifies the data (as other transactions can acquire exclusive locks) before a subsequent data read, the subsequent data read will see a different set of data than the previous data read.
- Serializable – Even though Repeatable Read isolation level provides consistency in repeatable reads, there is still the possibility of phantom read ( Phantom read means two reads from a single transaction return a different number of records even though they use exactly same predicates).
Serializable isolation level provides the highest level of data consistency but at the cost of greatly reduced concurrency. To avoid phantom, it uses range locks in the range of key values that match predicates of each statement executed in the current transaction. This way this isolation level blocks other transactions from inserting or updating any rows, which qualify the predicates used in the current transaction and hence the current transaction will keep on getting exactly the same records in the current transaction for the query. Please note the range locks are acquired and held till the completion of the transaction and hence this isolation is most restrictive and should be used only when absolutely necessary.
To avoid a repeatable read issue, you can use Repeatable Read isolation level with your transaction. The difference between Read Committed and Repeatable Read is, in the Read Committed isolation level the shared lock is released once the data gets processed without waiting for transaction completion whereas in Repeatable Read isolation levels the shared lock is held until the transaction completes either by committing or roll backing. Of course holding the shared lock till the end of the transaction improves the data consistency but reduces concurrency as well.
If you notice, in all of the above isolation levels (except in the case of Read Uncommitted where there is a possibility of dirty reads anyway), the data writers (exclusive lock) block data readers (shared lock) and data readers block data writers. SQL Server 2005 introduced two new snapshot based isolation levels. The idea behind these isolation levels is to not let data writers block data readers and vice versa. These new snapshot isolation levels use a row versioning concept where they maintain the version of previously committed data in version store (tempdb) and hence it allows data readers to continue reading the older committed/consistent version of data before the current transaction/statement began, even though current version is locked and being changed by other data writers.
Why, When and Where Should We Use Snapshot Isolation Levels?
When using snapshot isolation levels, when the same data is modified by many data writers, SQL Server might need to maintain multiple versions of the old data and hence proper envisioning and planning needs to be done for the tempdb database size and storage before utilizing this feature. SQL Server needs to maintain versions in version store as long as they might be needed by currently running operations and if they are not needed they are removed from the version store by the SQL Server.
When you use snapshot (how to use is discussed in the next section) isolation levels, any update (please note many updates to the same data in a single transaction does not create multiple versions but rather many updates from multiple transactions do) will be marked with a timestamp and will create a version with old committed data in version store and a pointer (14 bytes needed for pointer and additional overhead) is stored with the changed/new data. This storage of pointers will also add to the cost of using snapshot isolation level. If changes are very frequent, successive prior versions are stored in tempdb using a linked list structure and the newest committed value is always stored in a page in the database.
There are two variants of using snapshot isolation levels as discussed below:
- Read Committed Snapshot Isolation (RCSI) – This is also frequently referred to as statement level snapshot isolation level. This is, you can say, an extension of Read Committed isolation level but with increased concurrency. In this data readers don't get blocked by data writers but rather once enabled at database level, SQL Server starts maintaining a version of data being changed by data writers and data readers get data (last committed version before the statement starts, irrespective of the timing of transaction start) from versions stored in version store. As I said, to use it you just have to enable at database level and no code changes are required as it applies to all the transactions by default.
- Snapshot Isolation (SI) – This is also frequently referred to as transaction level snapshot isolation level. This increases concurrency along with data consistency at transaction level. Like RCSI, data readers aren't blocked by data writers but rather once enabled at database level and specified to be used in the code, SQL Server start maintaining a version of the data being changed by data writers and data readers get data (last committed version before the transaction start, in other words it provides transaction level consistent view of data) from versions stored in version store.
This requires code changes to use the SET command to with the transaction with which you want to use it.
Using either of the snapshot isolation levels requires enabling it at database level first though another important difference between RCSI and SI is, SI requires you to explicitly change the isolation level to SNAPSHOT for each transaction that you want to execute at the transaction level with SI and usage of this in legacy system requires code changes, whereas RCSI becomes the default for all the transaction without doing any code changes.
Now at the end, just to summarize the benefit of using snapshot isolation, data readers can get consistent data without being blocked by data writers running at the same time as the versions get stored in version store in the tempdb database before data is changed, whereas the cost of using snapshot isolation is more overheard on SQL Server in creating and maintaining versions and the increased size of tempdb for storage for version data and the increased size of each row for pointers. The overhead even becomes more when you use SI or transaction level snapshot, in which case versions are maintained till the end of the transaction as opposed to completion of the statement, as in case of RCSI.
SQL Server 2005 and later versions have 6 different isolation levels. I briefly talked about the earlier four, which have been in SQL Server for a long time and talked in detail about two new snapshot based isolation levels. To learn more about how to use this great feature and how they differ from each other with an example, please refer to the next article on the series.
In closing, I would like to reiterate, these new snapshot based isolation levels are great as they provide better concurrency but come at a cost. Please ensure you do a thorough study before using this feature and consider the size and load on tempdb. especially in the case of SI.