There's a new type of table in DB2 version 8.1.4. This table is supported through the ORGANIZE BY KEY SEQUENCE clause on the CREATE TABLE statement and can dramatically improve performance for some workloads.
by Paul C. Zikopoulos and Roman B. Melnyk
This article outlines range-clustered table (RCT) support in the IBM DB2 Universal Database (DB2 UDB) products as of Version 8.1.4. RCTs represent a new table type that is supported through the ORGANIZE BY KEY SEQUENCE clause on the CREATE TABLE statement.
This new feature can dramatically improve performance for some workloads. RCTs exploit a method of organizing rows in a table that allows for very fast direct access to any given row, or set of rows, without indexes. This is accomplished through sequential numeric primary key values (for example, an employee ID). Transaction processing applications often generate sequential numbers for use as primary key values; such databases will benefit the most from implementing RCTs.
The database manager leverages the sequential nature of the key column values in an RCT to determine the location of any row in the table. Because the data is always kept in a specific order, the location of a particular row is based on its sequence number, which is logically mapped to a row number.
To qualify as an RCT, a table must have key values that are:
- Not null
- Of type SMALLINT, INTEGER, or BIGINT
- Monotonically increasing
- Within a predetermined range for each column in the key
DB2 UDB-supported table types
DB2 UDB now supports four table types: regular, append mode, multidimensional clustered (MDC), and range-clustered. Each table type has characteristics that make it useful when working in a particular business environment. The matrix in Figure 1 compares and contrasts the four table types.
Figure 1. A comparison of the benefits offered by each of the four supported table types. Range-clustered tables offer superior performance.
As a general rule of thumb:
- Regular tables with indexes are general purpose tables that represent a good starting choice. If your data will be loosely clustered (not monotonically increasing), consider using a regular table and indexes.
- Append mode tables are suitable if you are only adding new data and retrieving existing data, such as is the case in a banking scenario, in which changes to an account are tracked through debits, credits, and transfers, and customers can review the history of changes to their accounts.
- Multidimensional clustered tables are used in data warehousing and large database environments. Clustering indexes on regular tables support one-dimensional clustering of data, but MDC tables provide the benefits of data clustering across more than one dimension.
- Range-clustered tables are useful when the data is tightly clustered across one or more columns in the table. The smallest and largest values in the columns define the range of possible values. You use these columns to access rows in the table.
Advantages of using range-clustered tables
RCTs can result in significant performance advantages during query processing because fewer input/output (I/O) operations are required to complete transactions. RCTs require less cache buffer allocation because there are no secondary objects to maintain. Because RCTs do not require indexes, DBAs that utilize these objects will find savings in storage as well.
RCTs also require fewer locks. With regular tables, rows are locked to ensure that only one application or user has access to a row or group of rows at any given time. With RCTs, "discrete locking" is used instead of key and next-key locking. This method locks all rows that are affected by, or that might be affected by, a requested operation. The number of locks that are obtained depends on the isolation level. Qualifying rows in the RCT that are currently empty, but that have been preallocated, are locked. This avoids the need for next-key locking.
Locking only affected rows can actually result in increased concurrency. Consider a table T1 with seven rows (1, 2, 3, 4, 7, 8, and 9) and repeatable read (RR) isolation. If T1 is an RCT, the following actions could happen concurrently:
Application 1 executes the following query and locks rows 7, 8, and 9 in share (S) mode:
select * from T1 where PrimaryKeyvalue >= 7
and PrimaryKeyvalue <= 9
Application 2 executes the following insert operation to add row 6, and locks row 6 in exclusive (X) mode:
insert into T1 values (6)
If T1 is a regular table with a Type 1 index, Application 2 will not only lock row 6, but will try to lock the next row (in this case, row 7) in exclusive mode, thereby conflicting with Application 1. As long as you do not have any overflows, RCT concurrency is always at least as good as the concurrency that can be achieved with indexes, and sometimes a bit better. The way we like to think about it is that RCT concurrency is equivalent to a Type 2 index with every non-existent row being pseudo-deleted. For insert and 1-row delete operations, RCTs do take fewer locks than would a table with an equivalent index. For selects in which every possible selected value exists, RCTs take the same number of locks, and for selects in which certain possible values have not yet been inserted, RCTs take more locks than would a table with an equivalent index.
by Paul C. Zikopoulos
and Roman B. Melnyk
The ORGANIZE BY KEY SEQUENCE clause in DB2 UDB Version 8.1.4
Figure 2, which shows a partial syntax diagram for the
CREATE TABLE statement, includes the syntax fragments that form a path to the
ORGANIZE BY KEY SEQUENCE clause. You can specify one or more columns of the
table that are to be included in the unique key that determines the sequence of
the RCT. You can also specify a starting constant for the range of key values;
this is optional, and if you do not specify a value, 1 is assumed. Whether or
not you specify a starting constant, you must specify an ending constant for
the range of key values.
Click for larger image
Figure 2. A partial syntax diagram for the CREATE TABLE
statement, showing the syntax fragments that form a path to the ORGANIZE BY KEY
If you specify the DISALLOW OVERFLOW clause, key values will
not be allowed to exceed the defined range. If, on the other hand, you specify
the ALLOW OVERFLOW clause, key values will be allowed to exceed the defined
range. In this case, overflow data will be placed in the overflow area, which
is dynamically allocated. If you have an RCT supporting overflow that goes past
the defined range, the resulting performance will be very poor. The overflow
area is an emergency mechanism that will kill performance, but might allow
enough time for you to unload the data and recreate the table with a wider
The following example shows how to create a range-clustered
table called CUSTOMERS. This table will have two columns: CUSTOMER_ID and
CUSTOMER_NAME. The CUSTOMER_ID column will become the unique key that
determines the sequence of the RCT; in this case, the range of possible key
values will be defined as 1 to 100, and key values will not be allowed
to exceed this defined range.
CREATE TABLE customers (
customer_id INTEGER NOT NULL,
ORGANIZE BY KEY SEQUENCE (customer_id STARTING FROM 1 ENDING AT 100)
If an application needed to access the row with a customer
ID value of 2, the database manager would look for the second row in the
CUSTOMERS table. This is quite straightforward, and data access is fast,
because each row in an RCT has a predetermined offset from the logical start of
If a row is updated and the key column values are modified,
the updated row is copied to the new location, and the old copy of the row is
deleted, maintaining the clustering of data in the table.
Inserting rows whose key values do not exceed the defined
range is no problem, but if you try to insert a row whose key value exceeds the
range (1000, for example), the insert operation fails:
INSERT INTO customers VALUES (1,'Jones')
INSERT INTO customers VALUES (5,'Benoit')
INSERT INTO customers VALUES (1000,'Hoffmann')
SQL1870N A row could not be inserted into a range-clustered table
because a key sequence column is out of range. SQLSTATE=23513
The following example shows how to create a range-clustered
table in which the sequence of the RCT is determined by key values from two
columns. The table, called ORDERS, will have three columns: ORDER_ID, CUSTOMER_ID,
and PART_NAME. In this case, the ORDER_ID column and the CUSTOMER_ID column
together will form a composite unique key that determines the sequence of the
CREATE TABLE orders (
order_id INTEGER NOT NULL,
customer_id INTEGER NOT NULL,
ORGANIZE BY KEY SEQUENCE (order_id ENDING 1000,
customer_id ENDING 100)
Here are some important points to remember about RCTs:
Space is allocated statically for each possible key value and
cannot be modified.
The space required for an RCT must be available when the table is
created, and must be sufficient to contain the number of rows in the specified
key range multiplied by the row size.
You cannot alter a key range.
If other columns in a table are defined as type VARCHAR,
sufficient space to accommodate the specified maximum length is allocated for
The order of rows in an RCT that allows overflow is not
Clustering indexes are not supported because RCTs are already
Table reorganization is not required for overflow-disallowed RCT
tables because such tables are always clustered.
DB2 UDB range-clustered table support, which introduces a
new table type in V8.1.4 through the ORGANIZE BY KEY SEQUENCE clause on the
CREATE TABLE statement, can dramatically improve performance for some
workloads, particularly in transaction processing environments. Be sure to
consider exploiting this feature and its many advantages when you are ready to
move to DB2 UDB V8.1.4.
About the Authors
Paul C. Zikopoulos, BA, MBA, is with IBM Canada
Ltd. Paul has written numerous magazine articles and books about DB2. Paul has
co-authored the books: DB2 Version 8: The Official Guide, DB2: The
Complete Reference, DB2 Fundamentals Certification for Dummies, DB2
for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2
Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified
Solutions Expert (Business Intelligence and Database Administration). You can
reach him at: paulz_ibm at msn.com.
Roman B. Melnyk, PhD, is with IBM Canada
Ltd., specializing in database administration, DB2 utilities, and SQL. Roman
has written numerous DB2 books, articles, and other related materials. Roman
co-authored DB2 Version 8: The Official Guide, DB2: The Complete
Reference, DB2 Fundamentals Certification for Dummies, and DB2
for Dummies. You can reach him at roman_b_melnyk at hotmail.com.