We are continuing our series which covers new and enhanced functionality implemented in SQL Server 2005 (based on its Beta 2 release), focusing on high availability and scalability area. We have presented so far database mirroring and failover clustering - now it is time for other features falling into the same category. In this article, we will describe improvements in online indexing and recovery procedures, as well as newly introduced database snapshots and snapshot transaction isolation level:
online indexing - allows execution of index data definition language statements in parallel with modifications and queries against underlying tables or views. In previous versions of SQL Server, such operations were not possible or limited at best. Creating, dropping, or rebuilding (which effectively drops and recreates) an index (typically performed as part of maintenance tasks geared towards optimizing query performance caused by index fragmentation) were precluding (or restricting to SELECT statements only) any type of access to indexed data (because of exclusive or shared locks that have been placed on it). Such behavior resulted, for example, from running DBCC DBREINDEX or dropping and recreating indexes. You could sometime improve the situation by running DBCC INDEXDEFRAG, which held locks for a much shorter time than the other two methods (although efficiency of this approach was limited when dealing with higher levels of fragmentation). SQL Server 2005 eliminates the need for exclusive locks completely by maintaining two copies of an index. While one of them is being rebuilt, the other can be used for supporting current activities (this one is subsequently dropped once the rebuild of the first one is completed).
To rebuild an index, you can either execute ALTER INDEX T-SQL statement with the REBUILD clause (with serves as a replacement for DBCC DBREINDEX) or invoke the CREATE INDEX T-SQL statement with the DROP_EXISTING clause. (You can also split the process into two separate steps - first dropping and then recreating the index with DROP INDEX and CREATE INDEX statements, respectively, but this approach is least efficient and therefore not recommended). All of these activities can be performed on-line by adding the WITH (ONLINE=ON) clause to each statement (the clause can be combined with CREATE INDEX, DROP INDEX, ALTER INDEX REBUILD, ALTER INDEX DISABLE, ALTER TABLE ADD CONSTRAINT, and ALTER TABLE DROP CONSTRAINT).
Note that online indexing has negative performance implications (due to the need of maintaining two copies of indexes), hence it is still advisable to perform it offline whenever possible. In addition, you should first evaluate, by determining the degree of fragmentation, whether there is a need for rebuilding an index or whether reorganizing the index instead, will suffice. This can be accomplished with sys.dm_db_index_physical_stats system function, which returns (among other values) the percentage of logical fragmentation for a particular (or all, if desired) indexes on a selected table or view. Index reorganization (recommended if fragmentation is no higher than 30%), which always performs as on-line an operation, sorts existing index pages in place (by physically reordering leaf level pages to match the logical sequence of leaf nodes) without dropping an index and creating a new one (and functions as an equivalent of SQL Server 2000 DBCC INDEXDEFRAG). You should resort to rebuilding an index if fragmentation percentage exceeds a 30% threshold (as explained, whether this operation is performed off- or on-line depends on whether the WITH ONLINE clause is applied to the associated T-SQL statement).
early restore access - allows connecting to a database during restore or recovery (invoked automatically whenever a database is brought online, for example, as the result of restarting SQL Server) as soon as committed entries in a transaction log are rolled forward. During the recovery process, entries in the transaction log designating transactions committed since the most recent checkpoint are reapplied to the database (during so-called "redo" phase), which is followed by rollback of all incomplete transactions (known as the "undo" phase). Both types of actions are necessary in order to maintain database consistency. Prior to SQL Server 2005, the database remained inaccessible for applications until the whole procedure was completed. With the new recovery model (available at this point only in SQL Server 2005 Enterprise Edition) this changes - the database becomes available as soon as the first "redo" part (involving replaying transactions committed after the most recent checkpoint) is finished. This does not mean, however, that access to all pages is granted - in order to prevent data consistency problems, data pages affected by uncommitted transactions are locked for the duration of the second part (requests for access to these pages need to wait until uncommited transactions are rolled back).
database snapshots - provides a read-only, "virtual" copy of a database, representing its state at a particular time. This functionality can be very useful in a number of scenarios, ranging from simplifying creation of historical reports reflecting content of the database at a specific moment in time to ensuring rapid recovery from risky system upgrades or accidental user errors.
Snapshot is different from a traditional backup due to the way it is implemented (which also affects its size). Initiation of a database snapshot marks a point after which the first change to any data page triggers creation of its copy, which retains its original content. Snapshot consists of all such copies, in addition to pages that have not been modified yet (including those that would be affected by transactions not committed yet at the time of snapshot creation). Creation of duplicate pages is accomplished using copy-on-write approach, where a copy of each database page, which is supposed to be modified, is saved as part of the snapshot before modification is applied. Since, initially, the snapshot refers to the same pages as its source, space requirements as well as the amount of time and processing resources needed to create it are minimal. However, following its creation, first-time modification to original pages increase the amount of related I/O operations and space occupied by it.
Snapshots work in different server configurations, including database mirroring and failover clustering. As a matter of fact, as we mentioned in our article on database mirroring, snapshots can be used to create a read-only copy of a mirror (the mirror can not be accessed directly since it is in recovery mode as long as the principal is operational), which in turn provides the ability to perform data reporting activities against it (to lower utilization of the principal).
At the same time, however, snapshots have some important limitations. Among the most apparent ones is their read-only status and reliance on underlying database, which have a number of other implications (for example, their properties, such as permissions or online/offline status are inherited and can not be independently changed). Snapshots also cannot be created for any of the system databases (model, master, and temp). It is not possible to back them up and they should never be viewed as a substitute for backup - if the source database fails, its snapshot is no longer accessible. While their dependency on NTFS (because copied pages are stored in "sparse files," which are NTFS-specific) is unlikely to constitute a major issue in most implementations, it will prevent you from using RAW partitions for database storage.
To create a database snapshot, use the standard CREATE DATABASE T-SQL statement including the unique name of the snapshot, logical name(s) of underlying database file(s), name of a sparse file where the snapshot will be stored, and the AS SNAPSHOT OF clause followed by the database name. (As mentioned before, you can create a snapshot of a mirror, in which case you use its name instead). For example, to create a snapshot of the pubs database, you could execute the following:
CREATE DATABASE [pubs_ss120604] ON (NAME = pubs, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL\data\pubs_ss120604.ss') AS SNAPSHOT OF pubs; GO
A list of existing snapshots appears in the Database Snapshots folder of SQL Server Management Studio. Once snapshots are created, you can take advantage of them by either reading their content (users can access both the copy and the source, with the latter exhibiting the same characteristics as any standard database) or by reverting to them, overwriting effectively current data in the original database.
The procedure of reverting has several prerequisites. First, you need to ensure that only a single snapshot for the target database exists, which means that you need to delete all of its remaining snapshots (to delete a snapshot, use the DROP DATABASE T-SQL statement including the snapshot name). You also need to remove all full-text catalogs from the database. After reverting is completed, the state of data reflects data retained in the snapshot, which means that all database changes following snapshot creation are lost. This includes entries in the transaction log, since reverting also clears its content. Therefore, it is recommended to perform full database backup immediately afterwards (no incremental or differential backups are possible at that point). To revert to a database snapshot, you need to run the RESTORE DATABASE T-SQL statement with the FROM DATABASE_SNAPSHOT clause, followed by its name.
snapshot isolation - a new type of transaction isolation level. Transactions can be viewed as a unit of work that satisfies four characteristics of the ACID model:
- atomicity (stating that the transaction can either succeed in its entirety or fail, without leaving any changes)
- consistency (conveying the notion that each transaction leaves a database in consistent state)
- isolation (indicating that multiple transactions scheduled to take place in parallel execute independently)
- durability (ensuring that the effect of each committed transaction is recorded and stored in the database).
Transaction isolation levels protect data from consistency and integrity problems in a multi-user environment and impact primarily locking mechanisms used by the database engine to manage simultaneous requests for read access to the same rows or tables by multiple transactions. Raising the isolation level increases the potential for blocking but decreases the likelihood of negative concurrency side effects (such as dirty reads or lost updates). Analogically, decreasing isolation level makes it possible for multiple transactions to operate on the same data, but at the cost of possible update or read anomalies.
SQL Server 2005 still supports levels available in earlier versions of SQL Server, such as Serializable, Repeatable Read, Read Committed, and Read Uncommitted (listed in the decreasing sequence of concurrency capabilities), however with the snapshot isolation it introduces a different approach to managing concurrent data access. Snapshot isolation does not lock rows of data involved in the transaction, but instead it creates their copies in tempdb database, assigning a version number to each. The copy is maintained until either transaction modifying the data is committed (at which point, it is deleted) or rolled back (resulting in restoring the original value). Snapshot isolation level is intended primarily for frequently repeating read operations on relatively dynamic data (which, with other isolation levels, would likely cause contention issues) and for databases where long-running transactions are common.
You can apply snapshot isolation on the session level, using the SET TRANSACTION ISOLATION LEVEL SNAPSHOT T-SQL statement, as long as the ALLOW_SNAPSHOT_ISOLATION database option has been turned ON (note that this can be further modified for individual SELECT statements with table-level locking hints). In addition, if the READ_COMMITTED_SNAPSHOT database option is ON, then all transactions operating with read-committed isolation level also use row versioning instead of locking.
In our next article, we will discuss the remaining scalability and high-availability features in SQL Server 2005 Beta 2.