Since relieving I/O constraints usually translates into faster execution times, it can become one of the major goals of an IT support group. Many applications have Service Level Agreements (SLRs) that promise quick turnaround or acceptable on-line response time. For an application suite such as a data warehouse that depends upon a regular nightly batch window to extract, transform and load data, lengthening elapsed times can cause the load jobs to extend into the beginning of the business day.
Physical I/O Constraints
I've written before about tuning CPU-constrained applications ("Tuning an IBM DB2 for z/OS Application: CPU Constraints").
Input-Output (I/O) constraints are, in a sense, the opposite problem ... and a common one.
Generally speaking, the perceived elapsed time for a DB2 database query depends upon multiple factors. The major factors are:
- Network transit time
- Query interpretation and access path selection by DB2
- CPU speed
- Physical I/O access to storage media
Of all these, physical I/O to storage media (usually disk or a disk array) is the slowest. When there are I/O constraints or bottlenecks in the system, access is slower still.
Balancing Resource Constraints
When involved in application performance tuning where the major resource constraint is elapsed time, the database administrator usually concentrates on reducing I/Os. This can be accomplished by trading (or balancing) I/O constraints with some resource available in relative abundance. Let's consider what resources might be available, and how in each case a balance can be achieved.
Abundance of storage media. Physical I/O access can be slow (milliseconds) when compared to network speeds (microseconds) or CPU instruction execution (nanoseconds). How can one speed up physical I/Os to a DB2 table? One way is to create multiple indexes on a table. These indexes might contain a major subset of the table columns, perhaps all that are required by common queries. In this fashion, DB2 can determine that the best access path to the table is through that index. Indeed, if all the columns required by the query are contained in the index, the access path is said to be index-only. In general, indexes take up far less disk space than tables. Accessing such an index may involve far fewer I/Os and thusly reduce the number of physical I/Os.
Another alternative is data compression. DB2 comes with a hardware-assisted data compression algorithm that the DBA can use when defining a table. Data loaded to the table can be reduced by fifty percent, or even as much as eighty percent in cases of character data. This reduction in size results in the table being stored in a much smaller area. Again, table access will result in fewer physical I/Os.
Abundance of CPU cycles. It is rare for IT support to say that there are excess CPU cycles available for applications to use. Indeed, many hardware vendors charge fees based on the amount of CPU consumed. Still, the DBA and application designer should review the possibilities and do a cost comparison of the alternatives.
In this case, we can take advantage of the availability of CPU cycles by two database design concepts called partitioning and parallelism.
Partitioning is the splitting of a table into multiple physical files. The most common variation is called horizontal partitioning. The DBA designates a number of table partitions, and how rows will be assigned to each partition. By far the most common designs specify that each partition contains table rows whose key values are within a certain range. For example, one common method for data warehouse tables is to partition by date, where each partition contains table rows for a particular day, month, or date range.
Parallelism is the ability of the database management system (DBMS) to logically split a database query into multiple pieces and process each piece separately in parallel. For example, consider a query against a data warehouse table requesting all rows for a certain year. Assume that the table is partitioned by month; that is, each partition contains data for a particular month of the year. It is now possible for the DBMS to (internally and transparently) split our query into twelve similar queries, all equal except that each of the twelve accesses rows for a single month. One large query for a year's worth of data now becomes twelve similar queries for one month each. Of course, the DBMS must also take the results of the twelve queries and combine them into a single result for the requestor.
We can now see how the combination of these two strategies can drastically reduce physical I/O times. The original query must access twelve months of data in sequence across the table. The twelve queries can be executed simultaneously, thus reducing the elapsed time by almost a factor of ten. The number of I/Os is almost the same in each case; however, by partitioning the table and by using parallelism, the same I/Os are done by twelve parallel processes.
The drawback of this method is CPU usage. The DBMS uses CPU to drive the I/Os being done. If the elapsed time is now reduced to one tenth the original elapsed time, the CPU used during the original period is now squeezed into that time. In fact, if CPU usage averaged ten percent for the original query, the parallel query may now peak the system at one hundred percent!
Clearly the DBA and application designer must review the alternatives and costs before implementing these techniques. For our example above, one possibility might be to artificially limit the maximum number of parallel tasks DB2 can use. Limiting the number of parallel tasks to five would provide some parallelism while limiting potential spikes in CPU usage.
Abundance of memory. DB2 uses blocks of memory for storing table and index data read from disk. These blocks or areas are called virtual pools. The DBA defines the sizes of these areas in DB2 configuration parameters; then, in table and index definitions, identifies which pools are used by which objects.
With an abundance of memory available, the DBA can greatly increase the sizes of some virtual pools to benefit some objects and queries against them. For example, if some tables are frequently used the DBA assigns them to their own buffer pool and allocates a large area. As these tables are accessed by queries DB2 reads them into memory. Later references to the object will probably find them still in memory, meaning that DB2 can retrieve data immediately rather than reading from disk. By accessing the table in memory DB2 avoids physical I/Os and greatly reduces elapsed times.
Good or Bad Tuning?
Many of these methods have advantages and disadvantages. Here are a few examples to review.
Situation #1. After analysis of accounting records, a DBA determines that the RunStats utility is responsible for almost 10% of the elapsed time used during the nightly batch runs. When 90% of these jobs are removed, elapsed times are greatly reduced.
Analysis. RunStats analyzes data distribution in tables and indexes and stores that information. When queries access the data, DB2 then uses these statistics to determine the lowest-cost access path to the data. By removing the RunStats jobs, the DBA may have significantly reduced the effectiveness of most or all of the queries against the data! A better choice would have been to review the need for the RunStats jobs. Those deemed necessary can still be run; more importantly, RunStats can be executed at any time after the data has changed and can be run simultaneously with other jobs (using the SHRLEVEL CHANGE option). To clarify, rather than execute jobs in a sequence such as Update Table A, RunStats Table A, Update Table B, etc., the second and third jobs in this list can be executed at the same time. The Update Table B job does not have to wait for the RunStats Table A job to finish.
Situation #2. The DBA reviews statistics for a long-running application. It contains an SQL statement that joins two partitioned tables; however, the access paths do not take advantage of parallelism. By implementing parallelism, the DBA cuts total execution time of the job from two hours to two minutes.
Analysis. As discussed previously, implementing parallelism can greatly reduce elapsed time. However, in this case the CPU time used by the original job over a two hour period is now squeezed into two minutes. The CPU may well be executing at 100% during that period, locking out all other jobs in the system! The DBA should review this decision carefully, perhaps monitoring the job during execution to ensure that CPU usage is not excessive. The DBA can also reduce or limit the maximum number of parallel tasks.
Situation #3. A popular online transaction typically takes several minutes to execute. After some analysis, the DBA creates three new indexes on tables that are joined in the application. Response time for the transaction now averages less than a second.
Analysis. In this case the DBA has traded disk space for elapsed time. While seemingly a good choice, the DBA must be aware of potential disadvantages. The first is that the large number of indexes on the table will lengthen the time it takes to load the table or to insert rows. This is because in addition to adding new rows to the table DB2 must also add new index entries. The second is a recovery issue. This table may be part of an application that is deemed critical for disaster recovery purposes. If so, and a disaster occurs, then the additional indexes will make table recover time longer. This is because in addition to recovering the table to a point-in-time, DB2 must also recover the indexes.
Physical I/O constraints can be addressed in multiple ways using resource constraint analysis. I/Os and elapsed times can be reduced through a variety of methods that involve balancing or trading available or abundant resources for the constrained resource.
Still, reducing an application's elapsed time by changing database design or DBMS configuration parameters should not be a substitute for good planning or standard performance tuning methods. You should use a strategy where you understand system-wide resource usage and develop application tuning methods that take into account the system as a whole.