Cooling Your Overloaded Database with New DB2 UDB EEE Partitions

Tuesday Feb 25th 2003 by Staff

Today's server hardware is so powerful that quite often we find stand-a-lone SMP (Symmetric Multiprocessing Machine) servers in different clustered and non-clustered configurations. Learn how to expand this type of system with additional hardware and DB2 UDB EEE database partitions.

Marin Komadina

Today the server hardware is so powerful that it is not unusual to find standalone low cost SMP (Symmetric Multiprocessing Machine) servers in different clustered and non-clustered configurations. Data volume and the number of concurrent database connections on those systems are rising higher and higher. When we approach warning levels in our hardware utilization we can simply expand the system with more hardware resources (disk space, processors, memory) and new DB2 UDB EEE database partitions.

This article covers:

  • DB2 UDB EEE Terminology
  • Partitioned Database Design Consideration
  • Existing Database Configuration Inventory
  • Procedure for Adding New Database Partition
  • Future Enhancements

DB2 UDB EEE Terminology

Database Instance is a database manager logical environment consisting of several database engine processes and their memory allocations. A database is created within a database instance.

Node is a physical or logical machine hosting one or more DB2 UDB EEE instances or partitions.

Database Partition is a logical database unit with private data and index subsets, configuration files and transaction logs. Database partitions can be configured to support Intra-partition (SQL query broken in smaller parts and executed in parallel on single partition), and inter-partition parallelism (SQL query executed in parallel on several partitions), or both of them.

  • Catalog partition is the main database partition which is running DB2 UDB EEE instance manager. This partition has the number 0 and holds the DB2 UDB EEE system tables that are not distributed to any other partition.
  • Coordinator partition (dispatcher) is a partition that coordinates query execution, sends query parts on other partitions for execution, assembles the result data set and returns it to the requester.

Node group is a named set of one or more database partitions that belong to the same database. Node group provide logical layer between tablespaces and partitions.

Partitioned Database Design Consideration

  • Hardware Configuration
    Partition design depends on the underlying hardware configuration. In physical partition design, we have separate partitions on separate hosts connected together with high-speed links. Today a logical partition design, where we have several standalone n-CPU machines with a large amount of RAM and several database partitions running on the same physical host, is often used. In this configuration, every partition has exclusive access to its own disk and memory. CPU resources are shared between partitions.

  • DB2 UDB EEE failover design
    The failover function is strictly dependent on underlying cluster software such as IBM HACMP (High Availability Cluster Multi-Processing), Veritas Cluster Server, Sun Cluster, and Microsoft Cluster Server. The main consideration in failover design is the proper distribution of data from the failed partition to all others after failover. Cluster software is unaware of partitioning schema inside the database. The default behavior of just switching the database partition from one node to another would create an unbalanced database configuration, with one or more overloaded nodes. For that reason we must configure the cluster software with a takeover list and DB2 UDB EEE with multiple partitions per node so that each surviving node takes over the same amount of data.

  • Recommended Data Volume per Partition
    For the best performance, it is recommended to hold to a maximum of 50-100 GB of user data per partition.

  • CPU Number Per Partition
    This is the optimal performance that we can get with 2 CPUs per partition.

  • Partitions Number Per Table
    The number of partitions per user table varies and is dependant on table size. The largest table should span all partitions while small tables should be partitioned using the join key as a partition key. Small tables should be grouped together on catalog node.

  • Buffer Pool Configuration
    The buffer pool configuration depends on the tablespace(s) page size. For every page size that differs from the default (4KB), we have to calculate and create a new buffer pool for that tablesapce(s).

Marin Komadina

Existing Partitioned Database Inventory

For our example, we are using a Sun Solaris machine with 4 CPU, 8 GB of RAM, and separate disk volumes for database partitions. DB2 UDB EEE 7.1 database is running on that machine and has 2-database partition (0, 1).

Database General Information

$ db2level
DB21085I  Instance "db2inst1" uses DB2 code release "SQL07010" with level 
identifier "02010105" and informational tokens "DB2 v7.1.0", "s000510" and "".

 	 System Database Directory
 	 Number of entries in the directory = 1
 	 Database 1 entry:
 	 Database alias              = ARTIST
 Database name                   = ARTIST
 Local database directory        = /home/artist
 Database release level          = 9.00
 Catalog node number             = 0        

The catalog partition is NODE 0, and the database name is ARTIST, version 7.1.


Partition configuration

$ cat db2nodes.cfg

Database has 2 partitions defined on the same physical host.


Database Connection Ports Configuration

DB2_db2inst1      60000/tcp   # Connection port for DB2 instance db2inst1
DB2_db2inst1_END  60004/tcp   # End connection port     
db2cdb2inst1      50000/tcp   # Connection port for DB2 instance db2inst1
db2idb2inst1      50001/tcp   # Interrupt  port for DB2 instance db2inst1

Rsh Configuration

-rw-------   1 db2inst1 db2iadm1      18 Feb  6 20:18 .rhosts

Hostname  Userid  
ARTIST0A1 db2inst1

Each DB2 UDB EEE partition must have the authority to perform remote commands (rsh) on the other partitions. On single SMP machines, only one entry is needed, and that is for local host name.

Node groups Configuration

db2 =>  select * from sysibm.sysnodegroupdef
------------------ ------- ------
IBMCATGROUP              0 Y     
IBMDEFAULTGROUP          0 Y     
IBMDEFAULTGROUP          1 Y     
LOCAL                    0 Y     
GLOBAL                   1 Y  
  • IBMCATGROUP - default node group for table space containing system catalogs

  • IBMTEMPGROUP - default node group for system temporary table spaces

  • IBMDEFAULTGROUP - default node group for user table spaces

  • GLOBAL - user defined node group for big tables, partition 1

  • LOCAL - user defined node group for small tables, only catalog partition 0

Tablespace Configuration

db2 => select TBSPACE,TBSPACETYPE,NGNAME from syscat.tablespaces
TBSPACE            TBSPACETYPE NGNAME            
------------------ ----------- ------------------
SYSCATSPACE        S           IBMCATGROUP       
TEMPSPACE1         S           IBMTEMPGROUP      
ARTIST_BIG         S           GLOBAL            
ARTIST_SMALL       S           LOCAL    

Two tablespaces have been defined for user objects:

ARTIST_BIG        - tablespace for big tables, node 1
ARTIST _SMALL - tablespace for small lookup tables, node 0

Test Table Definition

DB20000I  The SQL command completed successfully.

DB20000I  The SQL command completed successfully.

db2 => list tables
Table/View                      Schema          Type  Creation time
------------------------------- --------------- ----- --------------------------
PART_TEST                       DB2INST1        T     2003-02-10- 

db2 => select NODENUMBER(testcol1) PARTITION,COUNT(*) TOTAL 
	from part_test group by NODENUMBER(testcol1)
----------- -----------
          1        5120
  1 record(s) selected.

Test table PART_TEST will be used as the test table.

Marin Komadina

Procedure for Adding New Database Partition

This system has performed very well, until we doubled the data volume which resulted in increased CPU and memory utilization. This can be solved by adding more memory and new CPUs. After that, we are going to create a new database partition on added the hardware and redistribute the user data across all partitions.

Hardware upgrade

We upgraded the hardware to an additional 2 CPUs, 4 GB RAM and additional disk space for new database partition.

Add new database partition

There's both an online and offline method for adding a new database partition to an existing database. Since, in both cases we need to restart the database to enable full Read Write access to the new partition, we are going to use the offline method. When we add new partition, temporary tablespace is automatically created on the new partition, and we need to decide where we want it to be located. Possibilities for temporary tablespace are:

  • NO INDICATION will use the temporary tablespace container definition from node 0
  • LIKE NODE will use temporary tablespaces with containers similar to what we already have on another partition
  • WITHOUT TABLESPACES will create a partition without temporary tablespace containers

NO INDICATION is the default value and we will use that option. Now we are ready to add new partition (2):

$ db2stop 
02-10-2003 16:51:13     0   0   SQL1064N  DB2STOP processing was successful.
02-10-2003 16:51:14     1   0   SQL1064N  DB2STOP processing was successful.
SQL1064N  DB2STOP processing was successful.

To register the new partition (2) we need to change the partition configuration file:

$ vi db2nodes.cfg
2 ARTIST0A1 2  -> new partition (2)

On virtual node 2, we will separately start and create the partition (2):

$  export DB2NODE=2

$ db2start nodenum 2
02-10-2003 17:08:59     2   0   SQL1063N  DB2START processing was successful.
SQL1063N  DB2START processing was successful.

$ db2 add node 
DB20000I  The ADD NODE command completed successfully.

The newly created partition has associated only temporary tablespace:

db2 => list tablespaces
           Tablespaces for Current Database
 Tablespace ID                        = 1
 Name                                 = TEMPSPACE1
 Type                                 = System managed space
 Contents                             = System Temporary data
 State                                = 0x0000
   Detailed explanation:

db2 => list tablespace containers for 1
             Tablespace Containers for Tablespace 1
 Container ID                         = 0
Name                                 = /export/home/db2inst1/db2inst1/NODE0002/SQL00001/SQLT0001.0
 Type                                 = Path

Next we will stop partition 2 and start the whole database:

$ db2stop nodenum 2
02-10-2003 17:12:16     2   0   SQL1064N  DB2STOP processing was successful.
SQL1064N  DB2STOP processing was successful.

$ export DB2NODE=0
$ db2start
02-10-2003 17:12:56     1   0   SQL1063N  DB2START processing was successful.
02-10-2003 17:12:57     0   0   SQL1063N  DB2START processing was successful.
02-10-2003 17:12:57     2   0   SQL1063N  DB2START processing was successful.
SQL1063N  DB2START processing was successful.

The next step is to extend node group GLOBAL to a new partition:

db2 => alter nodegroup global add node (2)
SQL1759W  Redistribute nodegroup is required to change data partitioning for 
objects in nodegroup "GLOBAL" to include some added nodes or exclude some 
dropped nodes.  SQLSTATE=01618

db2 => select * from sysibm.sysnodegroupdef
------------------ ------- ------
IBMCATGROUP              0 Y     
IBMDEFAULTGROUP          1 Y     
IBMDEFAULTGROUP          0 Y     
LOCAL                    0 Y     
GLOBAL                   2 A   -> active pending status    
GLOBAL                   1 Y     

Node group GLOBAL is distributed over two partitions, 1 and 2, and has active pending status. Objects in node group GLOBAL are still only on node 1.

Note: Node group IBMDEFAULTGROUP is the default node group for new database objects. It is recommended to extend this node group over all partitions, however, we are not going to do that in our test.

Redistribute data

Data redistribution over all defined partitions in the node group is the last step. Data redistribution must be applied from the catalog node. We are going to distribute data uniformly using the default hash function. Other methods for data distribution, (using distribution file or using target partition map), need better application and data knowledge than we have in this test.

db2 =>  redistribute nodegroup GLOBAL uniform
DB20000I  The REDISTRIBUTE NODEGROUP command completed successfully. db2 => 

In the case of tables without a partitioning key, distribution will not succeed and the partitioning key need not be unique.

Redistribution log is located on path: ~/sqllib/redist

                      Data Redistribution Utility
The following options have been specified:
Nodegroup name                  : GLOBAL
Data Redistribution option      : U
Redistribute Nodegroup          : uniformly
No. of nodes to be added        : 1
List of nodes to be added       : 2
No. of nodes to be dropped      : 0
List of nodes to be dropped     :
Delete will be done in parallel with the insert.
The execution of the Data Redistribution operation on:
 Begun at  Ended at  Table
  17.38.07            DB2INST1.PART_TEST            17.38.08
--All tables in the nodegroup have been successfully redistributed.-

Successfully finished data distribution will reset the status flag for the node group:

------------------ ------- ------
IBMCATGROUP              0 Y     
IBMDEFAULTGROUP          1 Y      
LOCAL                    0 Y     
GLOBAL                   2 Y
GLOBAL                   1 Y     

The PART_TEST table has records equally distributed over partitions in node group:

----------- -----------
          1        2560
          2        2560

Data distribution can be checked reading the partition map for table DB2INST1.PART_TEST:

>> db2gpmap -d ARTIST -m tab_distribution.txt -t DB2INST1.PART_TEST
Retrieving the partition map ID using table DB2INST1.PART_TEST
1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1

For performance reasons, after redistribution IBM suggests statistics be updated on the redistributed objects:

db2 => runstats on table db2inst1.part_test with distribution and detailed indexes all shrlevel change
DB20000I  The RUNSTATS command completed successfully.



The main advantage of the DB2 UDB EEE database is optimal scalability. When you have more users, more data, and complex queries, just add additional hardware and DB2 will scale. Measurements for different database operations (load of all tables, index creation, runstats, single and multiple concurrent delete and insert operations) on duplicate hardware configurations give us an average scalability factor of 0,98. It is almost ideal, however, there are still some points that IBM improve on in the future:

  • Implement memory sharing between multiple partitions on the same physical node
  • Solve problem of expensive select or update on non partitioned key
  • Limit interprocess communication between partitions on multipartition installation
  • Implement partitioning models other than hash partitioning
  • Provide real online database expansion (after adding a new partition, full RW access to the newly added partition is possible only after database restart)
  • Enable non blocking data redistribution after adding or dropping partitions

» See All Articles by Columnist Marin Komadina

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved