Oracle: Collecting Real Time Wait Events

Thursday Mar 10th 2005 by DatabaseJournal.com Staff
Share:

There are two common ways of collecting wait event information: the Oracle extended trace (10046 event) and using V$ views. Learn how to collect wait event information using V$ views.


By Alexander Bubernak

Wait event tuning has become very popular in recent years. Many DBAs are moving away from the cumbersome Ratio approach of tuning and using wait events. Ratio tuning has been proven unreliable in general and it can lead to a hit or miss way of troubleshooting and fixing problems. The ratio tuner would also have to have various complex scripts and an advanced knowledge of Oracle internals to try to figure out or make an educated guess of where the problem may be. This process may have to be repeated many times before a problem is resolved.

Collecting wait events will show where a database or session is spending most of its time. Once this is known, you can drill down to the root cause of the wait. You also do not need a large amount of complex scripts; a simple query can show what event the database or session is waiting on.

All databases will have waits events even in a perfectly tuned database and application. Some events point to bottlenecks while others can point to the limitation of the server, CPU, IO, etc. and others are normal wait events like certain idle events.

Idle events are events that spend their time idle or waiting for an action. The time keeps incrementing for the event until an action occurs. An action can be a database process, user or application process. These wait events can be ignored most of the time. Be careful, because if the event is idle and you expect it not to be, then this can show where a potential problem is. For example, the event 'SQL*Net message from client' is a true idle wait event. The database is waiting for a response from the client to do something. If you log onto the database using SQL*Plus and not issue a command, this event's wait time will increase as it waits for a command from you. In this case, this idle event can be ignored. Now, if an application or batch program is processing code during this event, it can mean that the application or batch program is taking longer than it should to process.

There are two common ways of collecting wait event information: the Oracle extended trace (10046 event), which I discuss in article "Collecting Oracle Extended Trace (10046 event)", and using V$ views; which is the main topic of this article. You can miss events, timing and certain other details using V$ views instead of 10046 event because they only show the wait event at the moment in time you select from them. However, the events you will miss are probably so fast that they are not that significant for troubleshooting a large bottleneck in a database or user session at the time users report a performance problem.

I will focus this discussion on collecting wait events at the user session level. Database users are the ones that will notice the slow down and they do not care that everything looks good at the database level. They only know that they are losing production time because of the problem. Focusing on user waits, in my experience, will help overall database performance.

You also need to make certain that you collect properly scoped data. If a process runs for an hour, you can collect events for that hour and see where that process has spent most of its time. Now, if a user is logged on for an hour and everything is running fine except for the last 5 minutes and you look at the average or overall wait events you may miss the real bottleneck because that wait is only a small piece of the entire session.

Oracle provides a number of views to aid in tuning; a few are:

  • V$SYSTEM_EVENT, which gives you total system waits for an event since database startup.
  • V$SESSION_EVENT, which is at the session level.
  • V$SESSION_WAIT gives you what an active session is waiting on at that point in time.
  • V$SESSION, which gives you session information and has a column ROW_WAIT_OBJ# that tells you the object involved when joined with OBJECT_ID or DATA_OBJECT_ID in DBA_OBJECTS. ROW_WAIT_OBJ# column is in version 8i on up, but is not populated until version 9i and above.

Version 10g has some more V$ views for tracking history called V$%_HISTORY, including V$ACTIVE_SESSION_HISTORY (ASH), which samples non-idle wait events every second and new columns in V$SESSION to combine information in V$SESSION_WAIT. Oracle 10g specific wait events will be covered in much more detail in another article.

Detail of Views

V$SYSTEM_EVENT - Aggregated statistics for wait events since last database startup.

EVENT - Name of event.

TOTAL_WAITS - Number of times had to wait for an event.

TOTAL_TIMEOUTS - Total amount of times went to get an event resource and timed out waiting for the event.

TIME_WAITED - Sum of time waited for an event in centiseconds (1/100).

AVERAGE_WAIT - Average amount of time waiting for an event in centiseconds (1/100).

TIME_WAITED_MICRO - Sum of time for an event in microseconds (1/1000000). This column is 9i and above.

EVENT_ID - ID of the wait event. Column is in 10g.

V$SESSION_EVENT - Aggregated statistics for wait events since start of session. It has the same columns and definitions as V$SYSTEM_EVENT except for the following additional columns.

SID - Session ID

MAX_WAIT - Maximum time waited for an event.

V$SESSION_WAIT - Unlike the previous view, this view does not contain aggregated statistics; it contains statistics on the event the session is waiting on at the moment. The P(n) columns are an important piece of information used to help determine where the problem exists.

SID - Session ID.

SEQ# - Sequence number that uniquely identifies the wait event. This sequence is incremented with each wait event completion.

EVENT - Name of wait event.

P1TEXT - A description of what the P1 value is used for. Not reliable, use PARAMETER(n) value from V$EVENT_NAME to determine.

P1 - This value definition is dependent on the event. Use PARAMETER1 value from V$EVENT_NAME to describe what this value is used for.

P1RAW - Same value as P1 except in hexadecimal format.

P2TEXT - A description of what the P2 value is used for. Not reliable, use PARAMETER(n) value from V$EVENT_NAME to determine.

P2 - This value definition is dependent on the event. Use PARAMETER2 value from V$EVENT_NAME to describe what this value is used for.

P2RAW - Same value as P2 except in hexadecimal format.

P3TEXT - A description of what the P3 value is used for. Not reliable, use PARAMETER(n) value from V$EVENT_NAME to determine.

P3 - This value definition is dependent on the event. Use PARAMETER3 value from V$EVENT_NAME to describe what this value is used for.

P3RAW - Same value as P3 except in hexadecimal format.

WAIT_CLASS_ID - ID of the wait class. Column is in 10g.

WAIT_CLASS# - Number of the class. Column is in 10g.

WAIT_CLASS - The name of the wait class (Idle, Network, System I/O, etc.). Column is in 10g.

WAIT_TIME - The last amount of time that the session waited for an event. A 0 value means the session is currently waiting on the event. This value of 0 will not increase until the session completes the wait event. Time is in centiseconds.

SECONDS_IN_WAIT - If WAIT_TIME equals 0, then this is the number of seconds spent waiting on the event so far. This can give you complicated results when a timeout occurs like with enqueues because it will reset to 0, but this is beyond the scope of this article.

STATE - State of the wait event: WAIT_TIME=0 - WAITING, the process is in the state of waiting on an event. WAIT_TIME=-1 - WAITED SHORT TIME, wait time completed in less than a centisecond. WAIT_TIME=-2 - WAITED UNKOWN TIME, duration of wait is unknown because TIME_STATISTICS was set to false. WAIT_TIME>0 - WAITED KNOWN TIME, wait completed. WAIT_TIME is the duration of the wait event in centiseconds.

V$SESSION - Session details, the only listing column used in the scope of this article.

ROW_WAIT_OBJ# - This is the object id of the object involved and is joined to the DBA_OBJECTS.OBJECT_ID or DBA_OBJECTS.DATA_OBJECT_ID column. This column is only populated in version 9i and above. A value that is not -1 means that the session is waiting on a lock for this object. Previous versions of the database will need to use P(n) values from V$SESSION_WAIT to find objects in DBA_EXTENTS.

V$EVENT_NAME - This view displays the definition of P(n) columns from V$SESSION_WAIT and can be joined on the NAME column.

EVENT# - Number of the wait event.

NAME - Name of the event.

PARAMETER1 - Description of the value for P1.

PARAMETER2 - Description of the value for P2.

PARAMETER3 - Description of the value for P3.

WAIT_CLASS_ID - ID of the wait class. Column is in 10g.

WAIT_CLASS# - Number of the class. Column is in 10g.

WAIT_CLASS - The name of the wait class (Idle, Network, System I/O, etc.). Column is in 10g.


By Alexander Bubernak

Collecting Real Time Wait Events

The scenario from my example below is taken at a time when multiple users have called in complaining about performance for the past 10 minutes of running their application.

The examples here are just a brief overview of how to use collected real time wait event data.

Note: Parameter timed_statistics must be set to true to collect timing of wait events.

Let's create a table to store real time wait events for session information.

/*
   Author: Alex Bubernak
   This will create 
     real_time_session_waits table
*/

drop table real_time_session_waits
/
create table real_time_session_waits
( snap_tm date not null,
  seq# number not null,
  sid number not null,
  event varchar2(64) not null,
  p1 number,
  p2 number,
  p3 number,
  wait_time number,
  seconds_in_wait number,
 object_id number )
tablespace tools
/

Run the following anonymous SQL block to insert wait events for a given time period. For our example, we will use a 30-second sampling.

SQL> @collect_rt_waits
SQL> Enter time is seconds to run: 30

collect_rt_waits.sql

/*
   Author: Alex Bubernak

   This script will select from v$session_wait and v$session to 
   insert real-time wait event statistics in user created table.
   Inserts are run every second until run time is exceeded or
   you can modify dbms_lock.sleep(n) to any time you prefer.

   Remove truncate command to keep history.
   Modify script to only collect waits for a single session if
   necessary.

  Instructions:
  SQL> collect_rt_waits.sql
  SQL> Enter time in seconds to run: 30

*/
accept tm prompt 'Enter time in seconds to run: '
variable tm_t number;

truncate table real_time_session_waits -- comment out to keep history
/

declare

iteration number;
start_time date;

begin
 :tm_t := '&&tm';
 select sysdate into start_time from dual;

   iteration := 24*60*60/:tm_t;
 while sysdate < start_time+1/iteration loop
    insert into real_time_session_waits (SNAP_TM, 
                                 SEQ#,       
				 SID,            
				 EVENT,          
				 P1,             
				 P2,             
				 P3,             
				 WAIT_TIME,      
				 SECONDS_IN_WAIT,
				 OBJECT_ID )
				select 	sysdate,
                                        a.seq#,
					a.sid,
					a.event,
					a.p1,
					a.p2,
					a.p3,
					a.wait_time,
					a.seconds_in_wait,
					b.row_wait_obj#
				from v$session_wait a,
				     v$session b
				where a.sid = b.sid
                                -- and a.sid = 'session id' -- only insert a single session
				and a.event not in
				         ('pmon timer',
				          'rdbms ipc message',
				          'smon timer',
				          'dispatcher timer',
				          'Null event',
				          'parallel query dequeue wait',
				          'parallel query idle wait - Slaves',
				          'pipe get',
				          'PL/SQL lock timer',
				          'slave wait',
				          'virtual circuit status',
				          'lock element cleanup',
				          'WMON goes to sleep'
				          );
   dbms_lock.sleep(1); -- sleep for one second before checking and inserting next record. 
-- Can take this out for more detail or change value for longer 
-- runs.
 end loop;
commit;
end;
/

Let's display wait event counts and time by session and for our example filter idle events. Note: Remember what was said previously about idle events.

wait_cnt_by_sess.sql

/* 
   Author: Alex Bubernak
   Script displays wait event count and time for each sessions 
   completed wait event
*/
set linesize 150
col sid format 99999 heading "Session ID"
col event format a45 heading "Wait Event"
col cnt heading "Wait Count"
col wait heading "Wait time(cs)"
col pct format 999.99 heading "% of Time"
select sid,event, count(*) cnt, sum(wait_time) wait 
from real_time_session_waits
where wait_time > 0 -- this will limit wait times from sessions that have completed event and are > 1 sc.
-- and sid = 81 -- change to limit by sid
-- filter out idle events
and event not in ('SQL*Net break/reset to client',
	 	  'SQL*Net message to client',
		  'SQL*Net message from client',
		  'SQL*Net more data to client')
group by sid,event
order by wait
/
Session ID Wait Event                        Wait Count Wait time(cs)
---------- --------------------------------- ---------- -------------
         4 control file parallel write                1             0
        82 free buffer waits                          1             0
       139 db file sequential read                   14             1
       105 latch free                                15             1
        81 buffer busy waits                          1             1
        48 db file sequential read                   19             1
        41 db file sequential read                   28             2
        88 latch free                                18             2
       132 db file sequential read                   20             2
        81 db file sequential read                   19            11
         2 db file parallel write                    18            20
       143 latch free                                20           127
       133 db file sequential read                    9          1098
        82 db file sequential read                   10          1525
       172 db file sequential read                   10          1755
        81 db file sequential read                   20          2298

We see that session 81 waited 20 times for 'db file sequential read' for almost 23 seconds of the 30 second sampling.

Now let's modify the script to filter for session 81.

Session ID Wait Event                        Wait Count Wait time(cs)
---------- --------------------------------- ---------- -------------
        81 latch free                                 1             3
        81 buffer busy waits                          1             5
        81 db file sequential read                   20          2298

We can see three wait events with the most significant and largest amount of time are for 'db file sequential read' for the 30-second timeframe. The SQL statement involved should be analyzed to eliminate any unnecessary IOs, possibly spread IO across multiple volumes, partition the object if possible, etc.

From the above examples, we can see where the bottleneck in the session is, for the given timeframe. In the above, you can join REAL_TIME_SESSION_WAITS.OBJECT_ID with DBA_OBJECTS.DATA_OBJECT_ID to find the object or objects involved. In this case the P(n) columns would also be able to tell you what objects are involved when joined with DBA_EXTENTS.

SELECT  owner, segment_type, segment_name 
        FROM     dba_extents 
        WHERE    file_id = 'P1 value'
        AND      'P2 value' BETWEEN block_id AND block_id + blocks - 1;

We could also look at V$SESSION_EVENT for wait event statistics for the life of the session.

select 	event,
            total_waits,
	time_waited 
from v$session_event
where sid = 81
order by time_waited
/
EVENT                               TOTAL_WAITS TIME_WAITED
----------------------------------- ----------- -----------
buffer busy waits                            15           0
wait list latch free                          2           3
SQL*Net message to client                 41715           5
db file parallel read                         8           8
free buffer waits                             3          41
direct path read                           2001          47
log buffer space                              7         110
log file sync                                53         148
db file sequential read                      98        5568
latch free                                33361        6157
db file scattered read                    63065       10374
SQL*Net message from client               41714       79457

As you can see 'db file sequential read' is shown as a less significant wait event compared to the other three top events. This is not saying that the other events should not be examined, but that the problem was most noticeable to the users with a high 'db file sequential read' wait event time.

The above example only shows you a small piece of how to collect and use real time wait events.

Summary

Real time wait events are an excellent way of troubleshooting a problem as it is happening and are easier to setup and collect compared to 10046 event tracing. It does not have the accuracy or as much detail as the 10046 event, but can be a quick method to troubleshoot a problem as it is happening.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved