SQL Server 2005 Part 5 - High Availability and Scalability Enhancements

Friday Jan 14th 2005 by Marcin Policht

This final installment of High Availability and Scalability Enhancements in SQL Server 2005 Beta 2, examines features not previously covered, such as a new method of table and index partitioning, backup and restore improvements, and new hardware support options.

In this article, we continue our discussion on high availability and scalability enhancements in SQL Server 2005 Beta 2 release. Features such as clustering, database mirroring, online indexing, fast recovery, database snapshots, and snapshot isolation transaction level have already been covered earlier in this series - now we will concentrate on the remaining features, such as a new method of table and index partitioning, backup and restore improvements, and new hardware support options.

  •   table and index partitioning - intended to improve performance of operations performed on large tables. The basic concept is straightforward and involves splitting a table into several units (called partitions), which can be accessed independently of each other, limiting impact of I/O intensive activities performed on the table's data (queries, data loads, backups and restores, maintenance tasks - such as index rebuilds and defragmentations, as well as operations that would result in lock escalation to the table level). The most common method of splitting data is horizontal partitioning, in which rows of a table matching mutually exclusive criteria (such as, range of dates or letters in alphabet, for datetime and character data, respectively) are placed in designated partitions.

    SQL Server 2000 implementation of horizontal partitioning of tables, known as Updateable Distributed Partitioned Views (supporting Data Modification statements, unlike its SQL Server 7.0 predecessor, capable of querying data only), was based on creating identically structured tables across separate databases. Separation of data loaded into tables was based on a CHECK constraint associated with one of their columns. The same partitioned view was then defined on each of the servers hosting partitioned data (this required creating linked server entries for each participant). Once the view was created, users could access underlying data transparently, without being required to know its physical location. Updateable Distributed Partitioned Views were a subject of several limitations (outlined in more details in the SQL Server 2000 Books Online), however the most relevant one was resulting from the fact that partitions had to be spread out across multiple SQL Server instances.

    SQL Server 2005 extended the horizontal partitioning concept, by allowing placing partitions of the same table within a single database. In order to fully realize performance and scalability advantages, such partitions should be assigned to separate filegroups based on appropriately defined criteria, which are established by partitioning functions and associated partitioning schemes. The partition function determines the distribution of rows of a table or index across partitions and is applied by comparing its formula against values in a designated column (called the partitioning column). While the partitioning column will typically contain numeric or date/time values, others are also permitted - with exception of timestamp, text, image, XML, varchar(max), varbinary(max), CLR user-defined, and alias data types. A partitioning scheme assigns each of the partitions to a specific filegroup. Note that in order to maximize performance and scalability benefits it is important to both properly select the partitioning column and define the partitioning function associated with it. The choice should depend on data distribution and the type of activities that are typically performed against the target table. For example, if data is used to produce monthly reports, then partitioning based on the date column using monthly intervals as partition boundaries would be a very likely candidate.

    The process of partitioning involves four main steps. The first, preliminary step consists of creating filegroups and their corresponding files. During the second step, you define a partition function using the CREATE PARTITION FUNCTION T-SQL statement, with ranges of values (set by the FOR VALUES clause) that determine the beginning and end of each partition (and subsequently, their number). For example, the following statement:

    AS RANGE LEFT FOR VALUES ('$30,000', '$90,000')

    is intended to create three partitions - the first one, containing values less then or equal to $30,000 (inclusive, based on the LEFT element of the AS RANGE clause), the second one between $30,000 (exclusive) and $90,000 (inclusive), and the third one encompassing values of $90,000 and higher.

    In the third step, you create a partition scheme using the CREATE PARTITION SCHEME T-SQL statement, which associates partition function to filegroups, determining physical layout of data. The association between partitions and filegroups is typically one to one, although other arrangements are also possible (including specifying extra unassigned partitions for future growth). Following our earlier example, we could use the following statement to create a partitioning scheme:

    AS PARTITION pfIncome
    TO ('fgIncome1', 'fgIncome2', 'fgIncome3')

    This would result in rows within each range being redirected into their respective filegroups. By specifying a different sequence of (already existing) filegroups - e.g. by specifying TO ('fgIncome1', 'fgIncome1', 'fgIncome2') - you could alter the mapping, allocating the first two partitions to the first filegroup.

    Once partitions are associated with their respective filegroups, you can proceed with the fourth step, in which you create a partitioned table or index using CREATE TABLE or CREATE INDEX statements. The ON clause within these statements identifies the target partition scheme and partitioning column, whose values are compared against the definition of the partitioning function.

    For example, the following statement would result in the creation of a table with the second column designated as the partitioning one:

    CREATE TABLE tbIncome (cSSN char(9), mIncome money)
    ON psIncome(mIncome)

    Once you start performing operations on the target table (such as data additions, modifications, or queries), the database engine will transparently redirect them to appropriate partitions. For more information on the procedure required to set up or modify data partitioning (including converting existing non-partitioned tables to partitioned ones and vice versa), refer to the relevant topics in the Books Online. You will also find there, a more elaborate partitioning example, illustrating this process with the AdventureWorks sample database included with the installation of SQL Server 2005.

  •   backup media set mirroring - increases the reliability of backups by performing a concurrent dump of a database to multiple (up to four) backup devices, without significant impact on performance. This greatly increases the chances for avoiding backup failures due to a faulty tape media or device. Note, however, that for mirrored media sets, equivalent backup devices are needed (in the case of tape drives, it is recommended to use the same hardware models from the same manufacturer). In order to perform mirrored backup of a database, you need to add the MIRROR TO clause to the BACKUP DATABASE T-SQL statement, followed by the paths of the DISK or TAPE backup devices. In case of a backup media error detected during the restore, you can replace the faulty one with its mirror.

    Another method increasing the reliability of backups involves checksum generation when running BACKUP, which can be enabled by including the CHECKSUM clause (backup is also capable of verifying existing checksums and detecting torn pages, assuming that the PAGE_VERIFY CHECKSUM and TORN_PAGE_DETECTION database options are enabled). With this clause present, the generated checksum is recorded on the backup media that can be subsequently used to verify the restore operation. Keep in mind, though, that this feature introduces negative performance implications (and hence it is disabled by default), so you should evaluate its impact before implementing it. You can confirm successful completion of backups and media set readability (including checksum, if applicable) by executing RESTORE VERIFYONLY against a media or backup set. This statement, enhanced in SQL Server 2005, emulates the restore procedure without actually performing it. Furthermore, providing that the checksum information has been recorded during backup, you can take advantage of it by using the CHECKSUM clause when running RESTORE. However, a potential problem with this approach is the fact that even a single error encountered during restore will result in its failure. If such behavior is not desired, you can prevent it by applying the CONTINUE_AFTER_ERROR option, allowing the restore to complete (while this is likely to leave a database in inconsistent state, you can follow it with a manual repair).

    Another backup reliability enhancement introduced in SQL Server 2005 is integration of full-text catalogs into your database backups. Full-text catalogs contain full-text indexes, which are used to speed up keyword-based queries of text data stored in SQL Server databases in char, varchar, nvarchar, and formatted binary columns. While full-text search catalogs are stored outside of SQL Server (as part of the file system), they are managed by one of SQL Server 2005 components (Microsoft Full-Text for SQL Server service) and administered with SQL Server utilities. Integrating full-text search catalogs into SQL Server backups and restores resolves the synchronization issues present in SQL Server 2000, where each of these procedures had to be performed separately. You can also include full-text catalogs when transferring databases by combining DETACH and ATTACH operations (this was not the case in SQL Server 2000).

  •   online restore - increases availability by allowing the target database to remain online for its duration. Online restore functionality is available only in the case of Enterprise and Developer editions of SQL Server 2005 (at the Beta 2 stage) and applies only to configurations where the database consists of multiple filegroups. It can be used when restoring files, pages, or when running piecemeal restore (performed in stages, by filegroups, starting with the restore of the primary one). Specifics of this process depend on a number of circumstances, such as, recovery mode, availability of transaction logs necessary to bring database to consistent state, and read-only status of filegroups that are to be restored. The online restore feature is useful in situations following a failure of a drive hosting an individual filegroup. Database administrators might be able to maintain a database on-line, while restoring this filegroup only, without impacting access to data residing in other filegroups.

  •   support for highly available and scalable hardware (since SQL Server 2005 relies here on functionality built into the Windows 2003 Server and specialized server hardware, you need to have both in place in order to take advantage of the features listed below):

    •   64-bit version availability - on both Intel (64-bit Itanium and Xeon EM64T) and AMD (Opteron with Direct Connect Architecture) platforms, when running 64-bit version of Windows 2003 Server - increases considerably the amount of supportable memory (up to the theoretical limit of 32TB), eliminating 4 GB limit inherent to 32-bit platform (although it is possible to push this limit to 32GB of RAM by taking advantage of Advanced Windowing Extensions in 32-bit versions of Windows 2000 and Windows 2003 Servers) and makes it comparable with other database management systems operating in 64-bit computing environments. The new version includes 64-bit versions of Analysis Services and Integration Services, according to the recently released Community Technology Preview. SQL Server 2005 has also been designed to operate more efficiently in the Non-Uniform Memory Architecture (NUMA) systems.

    •   hot-plug memory - takes advantage of the functionality present in Windows 2003 Server Enterprise and Datacenter Editions (providing that underlying hardware supports it as well) which makes it possible to add (but not remove) physical memory while the operating system is running. SQL Server 2005 is capable of detecting the change dynamically (previous versions of SQL Server could only recognize the amount of memory present at the startup). In order for this feature to be available, the SQL Server 2005 needs to have Address Windowing Extensions activated (using the awe enabled server configuration option) and -h option specified at the startup.

  •   increased number of instances supported on a per server basis - the number has been raised to 50 with the Enterprise Edition of SQL Server 2005 (from 16 in SQL Server 2000).

  •   reduced number of restarts required for changing configuration options - it is no longer necessary to restart SQL server to perform such modifications as adjusting CPU affinity or to perform changes to the AWE configuration (including, as described earlier, recognizing newly added memory).

This concludes our review of high availability and scalability enhancements in SQL Server 2005 Beta 2. In our next article, we will look into new security related features.

» See All Articles by Columnist Marcin Policht

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved