Reporting Bad SQL Code in DB2

Friday Jun 27th 2003 by Marin Komadina
Share:

Poorly written SQL code can cause database slowdown. Marin Komadina explains how to collect and identify slow-running SQL statements.

A production DBA often faces the problem of a slow database caused by badly written SQL code. Even when tested by developers, SQL code can generate many performance problems. A regular pre-production test made on a manually generated, small dataset, when run on the real, big production dataset, can cause system breakdown. Some complex executions need hours to complete. In that situation the DBA needs a way to collect and identify slow-running SQL statements, and send the most critical statements back to developers for additional change.

This article covers:

  • Database and Event Monitor Switches
  • Describing a Problem
  • Finding a Top Sessions
  • Reporting a Bad SQL Code
  • Conclusion

Database and Event Monitor Switches

There are many different troubleshooting techniques for discovering and diagnosing problems caused by badly written SQL statements. One of them is enabling special monitor switches.

Monitor switches are used to instruct the database manager to collect applications statistics and performance. The Monitoring period is defined by starting and stopping the event monitor switch. Collected information is then written into a dedicated serial file, with full details of each SQL call that was executed. The Event monitor statistic files has to be further processed using the db2trace database utility and the result file is readable ASCI file.

Monitor switches can be controlled at the database manager level (database monitor switches) for the whole database or at the individual session level (event monitor switches).

Database manager monitor switches

Displaying the current status of the database manager monitor switches:

# db2 "get dbm monitor switches"
 
 Default database monitor switches
   Buffer pool                         (DFT_MON_BUFPOOL) = OFF
   Lock                                   (DFT_MON_LOCK) = OFF
   Sort                                   (DFT_MON_SORT) = OFF
   Statement                              (DFT_MON_STMT) = OFF
   Table                                 (DFT_MON_TABLE) = OFF
   Unit of work                            (DFT_MON_UOW) = OFF
  • DFT_MON_BUFFERPOOL - collecting number of reads and writes to the database
  • DFT_MON_LOCK - collecting locks wait statistics and deadlock occurrence
  • DFT_MON_SORT - collecting sort statistics
  • DFT_MON_STMT - collecting SQL statements information
  • DFT_MON_TABLE - collecting table statistics, rows written and rows read
  • DFT_MON_UOW - collecting timing information about Unit of Works

DB2 database version 8 has one new monitor switch DFT_MON_TIMESTAMP for collecting timestamps of monitored data. Database monitor switches are by default disabled and database manager will not collect any statistics.

Event monitor switches

On an application level, we can create private monitor switches independent of the database manager and other applications. Applications inherit their monitor switch settings from the database manager when connecting to a database.

Displaying current application event monitor switches:

db2 => get monitor switches
 
            Monitor Recording Switches
 
Switch list for node 0
Buffer Pool Activity Information  (BUFFERPOOL) = OFF  
Lock Information                        (LOCK) = OFF
Sorting Information                     (SORT) = OFF
SQL Statement Information          (STATEMENT) = OFF
Table Activity Information             (TABLE) = OFF
Unit of Work Information                 (UOW) = OFF

List of event monitor filters:

  • DATABASE - logging database deactivation
  • TABLES - logging changes for each accessed tables
  • DEADLOCKS - logging deadlock occurance
  • TABLESPACES - logging tablespace activity
  • BUFFERPOOLS - logging bufferpool activity
  • CONNECTIONS - logging connection activity
  • STATEMENTS - logging finished SQL statement information
  • TRANSACTIONS - logging commit and rollback activity

The syntax for event switches has an optional <<where>> conditional parameter, for further restricting data collection. Possible conditions are:

  • APPL_ID - application marker
  • AUTH_ID - authorization marker
  • APPL_NAME - application program name
  • LOCAL - Event monitor reports activity only for the node where it is running
  • GLOBAL - Event monitor reports activity from all nodes

Activating SQL statement collecting for a specified user will help us to find potential problems with application SQL code. The typical reason for system slowdown is long-running jobs that contain one or more expensive un-tuned SQL statement.

Describing a Problem

The test database ARTIST is a multinode DB2 database, ver. 7.2 on Sun Solaris. It is a multinode DB2 EEE database with 3 nodes.

The customer is complaining about a performance problem with the database. The system is very slow and every regular, routine process takes a long time to finish. Let's look at the situation on the system:

# top 
575 processes: 549 sleeping, 16 running, 1 zombie, 9 on cpu 
CPU states:  0.0% idle, 94.6% user,  5.4% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 6336M free, 11G swap in use, 5538M swap free
 
   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 18864 db2udb1    4  10    0 1145M 1091M cpu/1   22:17  6.32% db2sysc
  4891 db2udb1    4  10    0 1154M 1096M cpu/6   20:55  6.04% db2sysc
 17753 db2udb1    4  10    0 1154M 1096M run     22:22  6.00% db2sysc
  4883 db2udb1    4  10    0 1154M 1096M run     20:37  5.97% db2sysc
  9571 db2udb1    4  20    0 1145M 1091M run     27:12  5.96% db2sysc
  9552 db2udb1    4  10    0 1145M 1091M cpu/4   32:44  5.96% db2sysc
 27413 db2udb1    4  11    0 1145M 1091M cpu/2    2:57  5.83% db2sysc
  9603 db2udb1    4  10    0 1153M 1095M cpu/5   21:58  5.74% db2sysc
 20471 db2udb1    4  20    0 1144M 1091M cpu/3    2:54  5.61% db2sysc
 28498 db2udb1    4   0    0 1145M 1091M run     21:55  5.60% db2sysc
 20564 db2udb1    4  10    0 1144M 1090M run      2:53  5.55% db2sysc
 20536 db2udb1    4  10    0 1144M 1091M run      2:55  5.53% db2sysc
 20469 db2udb1    4  10    0 1144M 1090M run      2:53  5.41% db2sysc
 27452 db2udb1    4  11    0 1145M 1091M cpu/4    2:58  5.40% db2sysc 
 26419 db2udb1    4  10    0 1145M 1091M run      3:01  5.35% db2sysc 

Output of <<top>> UNIX program will generate a list of running processes on the machine, with high CPU consumer on the top of the list. On our system, we have a serious problem with db2 processes, which are using all of the CPU power (0.0% Idle). From this output, we cannot identify from which node the request is coming. To discover node information for the running database process we have to use the following command:

>> /usr/ucb/ps -gauxwwr
USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
db2udb1  16401  6.8  7.011515281116720 ?        R 16:00:35  1:09 db2agntp (ARTIST) 2
db2udb1  16410  6.8  7.011515201116384 ?        R 16:00:35  1:08 db2agntp (ARTIST) 2
db2udb1   7308  6.5  7.011720001118488 ?        O 02:37:56  2:51 db2agntp (ARTIST) 2
db2udb1   9579  6.5  7.111720081132232 ?        R 11:41:19  5:21 db2agntp (ARTIST) 1
db2udb1  17785  6.4  7.011720081118320 ?        O 11:24:34  1:17 db2agntp (ARTIST) 1
db2udb1  17763  6.4  7.011858241129736 ?        R 11:24:34  2:14 db2agntp (ARTIST) 1
db2udb1  29885  2.4  6.811318081097176 ?        R 05:28:48  8:28 db2pfchr 2
db2udb1  29883  2.2  6.811318001098376 ?        O 05:28:48  7:59 db2pfchr 1
db2udb1  29866  0.8  6.811318081097016 ?        O 05:28:47  7:47 db2pfchr 1
db2udb1   9551  0.6  7.011817201123160 ?        O 11:41:19  0:50 db2agntp (ARTIST) 2
db2udb1  29867  0.6  6.811318081096944 ?        O 05:28:48  5:31 db2pfchr 1
db2udb1  18450  0.4  7.011656481118216 ?        R 16:01:16  0:02 db2agent 2

The process is coming from database partitions 1 and 2. The most critical processes are:

  • db2agntp - Active subagent, used for SMP parallelism.
  • db2pfchr - Prefetcher, making I/O to get data to db2agent
  • db2agent - DB2 agent, handles requests from applications

This situation leads to the conclusion that some big, <<killer>> query has been started. Query runs in parallel, using several agents to process a result data set.

Finding Top Sessions

To find the slow-running SQL, we need to collect and analyze all SQL statements coming into the database, for a period of high system utilization. Previously presented "top" UNIX system commands can be used to identify a period for collecting SQL statements. After the system returns to low CPU utilization, we will stop collecting SQL statistics.

Listing applications connected to the database:

 db2 => list application global
 
Auth Id  Application    Appl.      Application Id                 DB       # of
         Name           Handle                                    Name    Agents
-------- -------------- ---------- ------------------------------ -------- -----
DB2INST1  db2bp          401        *N0.db2inst1.030520135855     ARTSIT   87   
PAYMENT1  DWH_CGN.EXE    355        776B6599.0DEA.030520135255    ARTSIT   41   
PAYMENT1  DWH_CGN.EXE    366        776B6599.0DED.030520135450    ARTIST   131  
PAYMENT9  java           377        82500A51.F327.030520061537    ARTIST   11   
PAYMENT9  java           345        43264E31.E222.020110034533    ARTIST   7   

We assume that the problem is coming from users PAYMENT1 and PAYMENT9. The maximum number of agents (131) has been allocated for PAYMENT1 user connection and it makes sense to expect that this is hogging the connection.

First, we need to create the statements monitor switches and to store their definition in the database catalog. On our system, statement collecting is only reasonable on node 1 and node 2, since node 0 (catalog) has no user objects.

db2 => connect to artist 
 
   Database Connection Information
 
 Database server        = DB2/SUN 7.2.5
 SQL authorization ID   = DB2INST1
 Local database alias   = ARTSIT

Enabling statement information collecting at database manager level:

db2 => update monitor switches using statement on
DB20000I  The UPDATE MONITOR SWITCHES command completed successfully.

Creating SQL event monitor for capturing the SQL statements for the user PAYMENT1:

db2 => create event monitor SQLNODE1_PAY1 for statements where AUTH_ID='PAYMENT1' write to 
file '/home/db2inst1/pay1_1' NONBLOCKED REPLACE ON NODE 1 LOCAL
DB20000I  The SQL command completed successfully.
db2 => create event monitor SQLNODE2_PAY1 for statements where AUTH_ID='PAYMENT1' write to 
file '/home/db2inst1/pay1_2' NONBLOCKED REPLACE ON NODE 2 LOCAL
DB20000I  The SQL command completed successfully.

Creating the SQL event monitor for capturing the SQL statements for the user PAYMENT9:

db2 => create event monitor SQLNODE1_PAY9 for statements where AUTH_ID='PAYMENT9' write to 
file '/home/db2inst1/pay9_1' NONBLOCKED REPLACE ON NODE 1 LOCAL
DB20000I  The SQL command completed successfully.
db2 => create event monitor SQLNODE2_PAY9 for statements where AUTH_ID='PAYMENT9' write to 
file '/home/db2inst1/pay9_2' NONBLOCKED REPLACE ON NODE 2 LOCAL
DB20000I  The SQL command completed successfully.

Activate the SQL event monitors for the user PAYMENT1:

db2 => set event monitor SQLNODE1_PAY1 state=1
DB20000I  The SQL command completed successfully.
db2 => set event monitor SQLNODE1_PAY1 state=1
DB20000I  The SQL command completed successfully.

Activate the SQL event monitors for user PAYMENT9:

db2 => set event monitor SQLNODE1_PAY9 state=1
DB20000I  The SQL command completed successfully.
db2 => set event monitor SQLNODE2_PAY9 state=1
DB20000I  The SQL command completed successfully.

Checking status for the SQL event monitor switch:

db2 => get monitor switches
 
            Monitor Recording Switches
 
Switch list for node 0
Buffer Pool Activity Information  (BUFFERPOOL) = OFF
Lock Information                        (LOCK) = OFF
Sorting Information                     (SORT) = OFF
SQL Statement Information          (STATEMENT) = ON  05-22-2003 10:12:49.890612
Table Activity Information             (TABLE) = OFF
Unit of Work Information                 (UOW) = OFF

db2 => select EVMONNAME,TARGET,NODENUM,MONSCOPE from syscat.eventmonitors
 
EVMONNAME          TARGET               NODENUM MONSCOPE
------------------ -------------------- ------- --------
SQLNODE1_PAY1      /home/db2inst1/pay1_1      1 L       
SQLNODE1_PAY9      /home/db2inst1/pay9_1      1 L       
SQLNODE2_PAY1      /home/db2inst1/pay1_2      2 L       
SQLNODE2_PAY9      /home/db2inst1/pay9_2      2 L       

Reporting Bad SQL Code

After the application activity has been finished, monitoring activity can be turned off. Stopping SQL event monitoring will empty all database buffers, and force writing all of the collected information to a file.

Stopping the SQL collection for users PAYMENT1 and PAYMENT9:

db2 => set event monitor SQLNODE1_PAY1 state=0
DB20000I  The SQL command completed successfully.
db2 => set event monitor SQLNODE1_PAY1 state=0
DB20000I  The SQL command completed successfully.
db2 => set event monitor SQLNODE1_PAY9 state=0
DB20000I  The SQL command completed successfully.
db2 => set event monitor SQLNODE2_PAY9 state=0
DB20000I  The SQL command completed successfully

The generated file is a non-readable binary file. We will use the standard DB2 system command db2evmon to format trace file to the readable format.

>> db2evmon -path /home/db2inst1/pay1_1 > sql_payment1_node1.txt
Reading /tmp/00000000.evt ...
>> db2evmon -path /home/db2inst1/pay1_2 > sql_payment1_node2.txt
Reading /tmp/00000000.evt ...
>> db2evmon -path /home/db2inst1/pay9_1 > sql_payment9_node1.txt
Reading /tmp/00000000.evt ...
Reading /tmp/00000001.evt ...
Reading /tmp/00000002.evt ...
Reading /tmp/00000003.evt ...
>> db2evmon -path /home/db2inst1/pay1_1 > sql_payment9_node2.txt
Reading /tmp/00000000.evt ...
Reading /tmp/00000001.evt ...
Reading /tmp/00000002.evt ...

sql_payment1_nodex.txt and sql_payment9_nodex.txt contains all of the executed SQL commands from users PAYMENT1 and PAYMENT9.

Checking and comparing content, in one of the event trace files, we find the query with highest CPU utilization:

  Statement Event ...
  Appl Handle: 377
  Appl Id: 82500A51.F327.030520061537
  Appl Seq number: 0001
  Creator  : PAYMENT9
  Package  : SQLLF000
  Cursor   : SQLCUR4
  Cursor was blocking: FALSE
  Text     : select u1.status, u2.status_old, max(u16.user_enrity_rel) user_entity,           
  max(u16.user_entitiy_lang) user_entitiy_lang, u11.customer_group , max(u14. customer_group)    
  customer_group,......
  -------------------------------------------
  Start Time: 05-20-2003 14:34:36.111156
  Stop Time:  05-20-2003 16:23:39.507116
  Exec Time:  594.477708 seconds   
  Number of Agents created: 11
  User CPU:   427.400000 seconds  
  System CPU: 161.260000 seconds
  Fetch Count: 0
  Sorts: 23
  Total sort time: 13.340000
  Sort overflows: 4
  Rows read: 1342544411
  Rows written: 0
  Internal rows deleted: 0
  Internal rows updated: 0
  Internal rows inserted: 0

The SQL statement under the Text field is our database killer. This statement has the highest CPU utilization and it is coming from user PAYMENT9.

Conclusion

The demonstrated procedure can be further extended with advanced reporting techniques.

All of the collected information is an ideal source for loading into a database for further analysis. Once it is in the database, we will be able to produce reports related to SQL statement execution since many performance problems are closely related to SQL statement execution.

» See All Articles by Columnist Marin Komadina

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved