IBM presents version 12 of its flagship database product Db2 as, "The ultimate enterprise database for business-critical transactions and analytics". What does this mean? In this slideshow we dive into the most critical changes in this version of Db2, including improvements for application enablement, DBA functions, on-line transaction performance and query performance.
In the past, on-line applications provided search capabilities to users by permitting the entry of key subsets or wildcards on the screen. For example, one could request a search for customers with a last name of “Smith*”, where the asterisk indicated any combination of zero or more letters. The results would include names like Smith, Smithe and Smithson. Typically, the application would present these results in alphabetical order. More complex searches might involve a search on a combination of last name and first name. The user might then select a function that initiated a “page down” process, where the following rows from the same query would be retrieved for display. This required application logic to remember the last row retrieved in order to continue executing the original query.
In Figure 1 (“Old Method”) we show the SQL required to retrieve table rows based on last name and first name, where the results must appear in alpha order. In the application this query would be placed in a cursor and rows fetched from the result to populate a results screen. The coding of the query, containing multiple columns, host variables and boolean logic could become quite complex, especially for queries on multi-column keys.
Db2 v12 provides two alternative syntaxes for doing the same thing. The first simplifies the old method by allowing specification of key combinations in parentheses; the second method uses the new OFFSET keyword and FETCH FIRST syntax to specify how many result rows to retrieve, thus greatly simplifying application logic.
In prior Db2 versions the maximum size of a dataset for a partitioned tablespace was 256 gigabytes, as specified with the DSSIZE parameter. This parameter was defined at the tablespace level, and held for all partitions. With a 4K page size this limited the maximum number of partitions to 64, giving a maximum total tablespace size of 16 terabytes. While it was possible to alter the DSSIZE parameter (from a minimum up to 256 GB), alterations required a reorganization of the entire tablespace, during which time the table was unavailable. In addition, should a single partition become full of active rows, changing the number of partitions or adding partitions also required a major reorg.
In Db2 v12 there is a new universal tablespace structure that allows the DBA to specify DSSIZE at the partition level. Further, the maximum partition size is increased to 1 terabyte. This change, along with allowing for more than 64 partitions, increases the maximum tablespace size to 4 petabytes. (1 petabyte = 1,000 terabytes). As a result, it is now possible for a single table to contain 256 trillion rows.
Db2 v12 also includes new DSSIZE changes to support larger indexes.
Some SQL queries analyzed by the Db2 optimizer result in access paths that require multiple searches through an index tree structure. Beginning with the index root page, Db2 must determine what non-leaf pages must be read into memory and, depending upon the results, if additional pages are required. These may be non-leaf pages at a lower level in the index tree, or leaf pages that contain key information and pointers to rows in the table. Each retrieval of a non-leaf page typically requires in an I/O, and multiple retrievals of pages can cause I/O waits.
Db2 v12 delivers a new feature called Index Fast Traversal. First, Db2 analyzes whether a query can take advantage of this new traversal method, i.e., whether the new method will be cost effective. If so, Db2 assembles index page information and stores it in a new memory area called index control that is outside the normal buffer pools reserved for data and index I/Os. Db2 then stores the data in a proprietary structure in that area, and allows the query to continue.
New queries that can benefit from this index information are automatically directed to the new area.
This new feature is currently restricted to unique indexes whose key lengths are no greater than 64 bytes. Initial measurements show a 20 percent reduction in CPU when index fast traversal is used.
One of the most difficult performance tuning situations for the DBA is that of heavy application workloads that do SQL inserts into a single table. There are several potential issues to contend with, including “hot spots” in the table where newly-inserted rows having similar keys should be added, system logs filling with new changed data for recovery purposes, CPU and I/O used to search for empty space in the table and index, and others.
In the past, DBAs have attempted to alleviate some of these bottlenecks through creative database design techniques. One method is to physically partition the table and randomly assign new rows to partitions, thus avoiding heavy changes to a single place in the table. Another method is to define the tablespace with frequent free space areas, initially empty of rows, so that newly-inserted rows can be added there.
Db2 v12 provides a new method to alleviate some of the constraints of a heavy insert workload. This is a new algorithm for determining where new rows are to be inserted, and is called insert algorithm 2. Db2 can use this new insert algorithm to speed up non-clustered inserts where the constraint is multiple concurrent threads doing inserts associated with delays finding available space for the new rows. While not a panacea for every situation, this new algorithm does provide constraint relief in many cases.
Estimates made by IBM indicate that a table with a single index can experience more than five million inserts per second using the new algorithm.
Many applications construct SQL statements based upon user requests, rather than pre-code all possible statements in the application. Pre-coding SQL statements, or static SQL, is much easier to tune for a variety of factors:
On the other hand, dynamic SQL by its very nature has been difficult to tune. Since statements are constructed immediately prior to execution, DBAs may have little or no idea what to expect from the application. While Db2 stores dynamic SQL statements and access path information in a cache, the limited size of the cache usually causes statement information to be deleted soon after storage. Problems can be exacerbated when DBAs make infrastructure changes such as adding indexes, changing memory configurations or upgrading software.
Db2 12 introduces a new feature called dynamic SQL plan stability. Dynamic SQL information previously stored in the dynamic statement cache is now stored permanently in the Db2 catalog. If a statement executes a second time, access path information can now be retrieved from the catalog, avoiding the CPU cost of invoking the Db2 optimizer.
Advertiser Disclosure: Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.