The Range-clustered Table: A New Kind of Table in DB2 UDB V8.1.4

Wednesday Mar 31st 2004 by DatabaseJournal.com Staff
Share:

There's a new type of table in DB2 version 8.1.4. This table is supported through the ORGANIZE BY KEY SEQUENCE clause on the CREATE TABLE statement and can dramatically improve performance for some workloads.

by Paul C. Zikopoulos and Roman B. Melnyk

This article outlines range-clustered table (RCT) support in the IBM DB2 Universal Database (DB2 UDB) products as of Version 8.1.4. RCTs represent a new table type that is supported through the ORGANIZE BY KEY SEQUENCE clause on the CREATE TABLE statement.

This new feature can dramatically improve performance for some workloads. RCTs exploit a method of organizing rows in a table that allows for very fast direct access to any given row, or set of rows, without indexes. This is accomplished through sequential numeric primary key values (for example, an employee ID). Transaction processing applications often generate sequential numbers for use as primary key values; such databases will benefit the most from implementing RCTs.

The database manager leverages the sequential nature of the key column values in an RCT to determine the location of any row in the table. Because the data is always kept in a specific order, the location of a particular row is based on its sequence number, which is logically mapped to a row number.

To qualify as an RCT, a table must have key values that are:

  • Unique
  • Not null
  • Of type SMALLINT, INTEGER, or BIGINT
  • Monotonically increasing
  • Within a predetermined range for each column in the key

DB2 UDB-supported table types

DB2 UDB now supports four table types: regular, append mode, multidimensional clustered (MDC), and range-clustered. Each table type has characteristics that make it useful when working in a particular business environment. The matrix in Figure 1 compares and contrasts the four table types.


Figure 1. A comparison of the benefits offered by each of the four supported table types. Range-clustered tables offer superior performance.

As a general rule of thumb:

  • Regular tables with indexes are general purpose tables that represent a good starting choice. If your data will be loosely clustered (not monotonically increasing), consider using a regular table and indexes.
  • Append mode tables are suitable if you are only adding new data and retrieving existing data, such as is the case in a banking scenario, in which changes to an account are tracked through debits, credits, and transfers, and customers can review the history of changes to their accounts.
  • Multidimensional clustered tables are used in data warehousing and large database environments. Clustering indexes on regular tables support one-dimensional clustering of data, but MDC tables provide the benefits of data clustering across more than one dimension.
  • Range-clustered tables are useful when the data is tightly clustered across one or more columns in the table. The smallest and largest values in the columns define the range of possible values. You use these columns to access rows in the table.

Advantages of using range-clustered tables

RCTs can result in significant performance advantages during query processing because fewer input/output (I/O) operations are required to complete transactions. RCTs require less cache buffer allocation because there are no secondary objects to maintain. Because RCTs do not require indexes, DBAs that utilize these objects will find savings in storage as well.

RCTs also require fewer locks. With regular tables, rows are locked to ensure that only one application or user has access to a row or group of rows at any given time. With RCTs, "discrete locking" is used instead of key and next-key locking. This method locks all rows that are affected by, or that might be affected by, a requested operation. The number of locks that are obtained depends on the isolation level. Qualifying rows in the RCT that are currently empty, but that have been preallocated, are locked. This avoids the need for next-key locking.

Locking only affected rows can actually result in increased concurrency. Consider a table T1 with seven rows (1, 2, 3, 4, 7, 8, and 9) and repeatable read (RR) isolation. If T1 is an RCT, the following actions could happen concurrently:

Application 1 executes the following query and locks rows 7, 8, and 9 in share (S) mode:

 select * from T1 where PrimaryKeyvalue >= 7
   and PrimaryKeyvalue <= 9

Application 2 executes the following insert operation to add row 6, and locks row 6 in exclusive (X) mode:

 insert into T1 values (6)

If T1 is a regular table with a Type 1 index, Application 2 will not only lock row 6, but will try to lock the next row (in this case, row 7) in exclusive mode, thereby conflicting with Application 1. As long as you do not have any overflows, RCT concurrency is always at least as good as the concurrency that can be achieved with indexes, and sometimes a bit better. The way we like to think about it is that RCT concurrency is equivalent to a Type 2 index with every non-existent row being pseudo-deleted. For insert and 1-row delete operations, RCTs do take fewer locks than would a table with an equivalent index. For selects in which every possible selected value exists, RCTs take the same number of locks, and for selects in which certain possible values have not yet been inserted, RCTs take more locks than would a table with an equivalent index.

by Paul C. Zikopoulos and Roman B. Melnyk

The ORGANIZE BY KEY SEQUENCE clause in DB2 UDB Version 8.1.4

Figure 2, which shows a partial syntax diagram for the CREATE TABLE statement, includes the syntax fragments that form a path to the ORGANIZE BY KEY SEQUENCE clause. You can specify one or more columns of the table that are to be included in the unique key that determines the sequence of the RCT. You can also specify a starting constant for the range of key values; this is optional, and if you do not specify a value, 1 is assumed. Whether or not you specify a starting constant, you must specify an ending constant for the range of key values.

Click for larger image


Figure 2. A partial syntax diagram for the CREATE TABLE statement, showing the syntax fragments that form a path to the ORGANIZE BY KEY SEQUENCE clause.

If you specify the DISALLOW OVERFLOW clause, key values will not be allowed to exceed the defined range. If, on the other hand, you specify the ALLOW OVERFLOW clause, key values will be allowed to exceed the defined range. In this case, overflow data will be placed in the overflow area, which is dynamically allocated. If you have an RCT supporting overflow that goes past the defined range, the resulting performance will be very poor. The overflow area is an emergency mechanism that will kill performance, but might allow enough time for you to unload the data and recreate the table with a wider range.

The following example shows how to create a range-clustered table called CUSTOMERS. This table will have two columns: CUSTOMER_ID and CUSTOMER_NAME. The CUSTOMER_ID column will become the unique key that determines the sequence of the RCT; in this case, the range of possible key values will be defined as 1 to 100, and key values will not be allowed to exceed this defined range.

CREATE TABLE customers (
    customer_id INTEGER NOT NULL,
    customer_name VARCHAR(80)
    )
  ORGANIZE BY KEY SEQUENCE (customer_id STARTING FROM 1 ENDING AT 100)
    DISALLOW OVERFLOW

If an application needed to access the row with a customer ID value of 2, the database manager would look for the second row in the CUSTOMERS table. This is quite straightforward, and data access is fast, because each row in an RCT has a predetermined offset from the logical start of the table.

If a row is updated and the key column values are modified, the updated row is copied to the new location, and the old copy of the row is deleted, maintaining the clustering of data in the table.

Inserting rows whose key values do not exceed the defined range is no problem, but if you try to insert a row whose key value exceeds the range (1000, for example), the insert operation fails:

INSERT INTO customers VALUES (1,'Jones')

INSERT INTO customers VALUES (5,'Benoit')

INSERT INTO customers VALUES (1000,'Hoffmann')
SQL1870N  A row could not be inserted into a range-clustered table
because a key sequence column is out of range.  SQLSTATE=23513

The following example shows how to create a range-clustered table in which the sequence of the RCT is determined by key values from two columns. The table, called ORDERS, will have three columns: ORDER_ID, CUSTOMER_ID, and PART_NAME. In this case, the ORDER_ID column and the CUSTOMER_ID column together will form a composite unique key that determines the sequence of the RCT.

CREATE TABLE orders (
    order_id INTEGER NOT NULL,
    customer_id INTEGER NOT NULL,
    part_name VARCHAR(80)
    )
  ORGANIZE BY KEY SEQUENCE (order_id ENDING 1000,
    customer_id ENDING 100)
    DISALLOW OVERFLOW

Here are some important points to remember about RCTs:

  • Space is allocated statically for each possible key value and cannot be modified.
  • The space required for an RCT must be available when the table is created, and must be sufficient to contain the number of rows in the specified key range multiplied by the row size.
  • You cannot alter a key range.
  • If other columns in a table are defined as type VARCHAR, sufficient space to accommodate the specified maximum length is allocated for each row.
  • The order of rows in an RCT that allows overflow is not guaranteed.
  • Clustering indexes are not supported because RCTs are already clustered.
  • Table reorganization is not required for overflow-disallowed RCT tables because such tables are always clustered.

Conclusion

DB2 UDB range-clustered table support, which introduces a new table type in V8.1.4 through the ORGANIZE BY KEY SEQUENCE clause on the CREATE TABLE statement, can dramatically improve performance for some workloads, particularly in transaction processing environments. Be sure to consider exploiting this feature and its many advantages when you are ready to move to DB2 UDB V8.1.4.

About the Authors

Paul C. Zikopoulos, BA, MBA, is with IBM Canada Ltd. Paul has written numerous magazine articles and books about DB2. Paul has co-authored the books: DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). You can reach him at: paulz_ibm at msn.com.

Roman B. Melnyk, PhD, is with IBM Canada Ltd., specializing in database administration, DB2 utilities, and SQL. Roman has written numerous DB2 books, articles, and other related materials. Roman co-authored DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, and DB2 for Dummies. You can reach him at roman_b_melnyk at hotmail.com.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved