Disk Sorts - A Subtle Threat to Database Performance

Friday Jun 6th 2003 by DatabaseJournal.com Staff
Share:

Disk sorts are a near-silent performance vampire that can rob a database of good response times. Intelligently diagnosing the presence of disk sorts and then doing what you can to reduce or eliminate them altogether, you can remove this subtle threat to your database's performance.

by Robin Schumacher

Even though database vendors continue to proclaim the fact that they have "self-healing" databases, performance problems continue to plague many critical database systems. Added complexity in the database engines and a reduced DBA staff has added to the administrator's burden of keeping databases up and running at a rapid pace.

When troubleshooting performance problems, an Oracle DBA can employ a number of different techniques to uncover the root cause of database slowdowns. These methods include observing key ratios, using wait events, viewing performance details for heavy resource-hungry sessions, and tracking down inefficient SQL. The most in-vogue practices focus on bottlenecks (waits) and SQL interrogation, but smart DBAs also know when to put into play their favorite ratio and session scripts or monitors to fully expose a suspected performance problem.

I have to confess that when working through the database tuning process, I have often overlooked a subtle, but nasty threat to database performance - disk sorts. Excessive disk sort activity can make a wreck out of well-optimized SQL statements and give a black eye to even the most robust database server. How can this happen? More importantly, is your database suffering from disk sort problems? Let's take a look at this performance thief and see how to both find and fix disk sort activity on a database.

What is a Disk Sort?

All DBAs know that various operations and SQL statement executions can create sort activity in the database. For example, include one of Oracle's aggregation functions (MAX, MIN, etc.) in a SQL statement and you've introduced a sort operation to the database. A required sort is not necessarily a bad thing if it is efficiently performed. A short list of some of the most common SQL commands/clauses that can cause sorts include:

  • CREATE INDEX, ALTER INDEX ... REBUILD
  • DISTINCT
  • ORDER BY
  • GROUP BY
  • UNION
  • INTERSECT
  • MINUS
  • IN, NOT IN
  • Aggregation functions (MAX, MIN, AVG, SUM)
  • Certain unindexed joins
  • Certain correlated subqueries

A sort can become problematic if it requires disk I/O for it to complete. When a sort operation occurs, Oracle attempts to perform the sort in a memory space that is assigned by the DBA. In Oracle versions below Oracle9i, this memory sort area was controlled by the SORT_AREA_SIZE and SORT_AREA_RETAINED_SIZE initialization parameters. For Oracle9i and above, Oracle recommends the use of PGA_AGGREGATE_TARGET. If a sort is too large to be contained within this memory space, Oracle will continue the sort on disk - specifically, in the user's assigned temporary tablespace. This is where performance problems can begin to develop.

Most DBAs will tell you that they understand the impact of excessive physical I/O on their database and that they strive to keep it to a minimum. However, when they refer to physical I/O, they typically think about physical read activity needed to satisfy a SQL statement's request, but totally forget about disk sorts. They shouldn't. In my opinion, a disk sort can far outweigh the unwanted effects of regular disk I/O read activity because a disk sort involves both physical reads and physical writes. First, Oracle must perform physical writes to a session's temporary tablespace for the sort activity it cannot handle in memory. Then, the database must turn right around and read that information back from disk to build the result set requested by the SQL query. So in essence, it's a double whammy especially for large result sets that are the product of heavy sort requests.

by Robin Schumacher

Are Disk Sorts Affecting Your Database?

How can you tell if disk sorts are occurring on your system and causing an adverse effect on the performance of your database? There are a couple of routes you can take to determine if disk sorts are indeed present on your database:

  • Examine the memory/disk sort ratio
  • View wait events that may indicate disk sort activity
  • Check session activity regarding disk sort operations

The place to start is with a ratio of memory vs. disk sorts to see if disk sorts are coming into play on your database. The basic query to use is the following:

SELECT ROUND((100 * b.VALUE) /DECODE ((a.VALUE + b.VALUE), 
       0,1,(a.VALUE + b.VALUE)),2) "Percent memory sorts"
FROM   v$sysstat a, 
       v$sysstat b
WHERE  a.name = 'sorts (disk)'
AND    b.name = 'sorts (memory)'
 
Percent memory sorts
-------------------
95.5
 

Note that if Oracle has been up a long time, the cumulative numbers for memory sorts may hide more recent occurrences of disk sorts. What you hope to see (whether just using cumulative metrics or delta-based statistics) is a high value - at least 95% or more. Lower percentages could indicate a current or growing disk sort problem.

From a wait event perspective, there aren't really wait events that are solely devoted to disk sort activity. However, from past experience I can tell you that databases I've been involved with that suffered from disk sort activity seem to show a high percentage of time waited for the direct path read and direct path write events. A wait-based query you can use to check for these events is the following:

select event,
       total_waits,
       round(100 * (total_waits / sum_waits),2) pct_waits,
       time_wait_sec,
       round(100 * (time_wait_sec / greatest(sum_time_waited,1)),2)
       pct_time_waited,
       total_timeouts,
       round(100 * (total_timeouts / greatest(sum_timeouts,1)),2) 
       pct_timeouts,
       average_wait_sec
from
(select event,
       total_waits,
       round((time_waited / 100),2) time_wait_sec,
       total_timeouts,
       round((average_wait / 100),2) average_wait_sec
from sys.v_$system_event
where event in ('direct path read','direct path write')),
(select sum(total_waits) sum_waits,
        sum(total_timeouts) sum_timeouts,
        sum(round((time_waited / 100),2)) sum_time_waited
 from sys.v_$system_event
 where event not in 
 ('lock element cleanup', 
 'pmon timer', 
 'rdbms ipc message',
 'rdbms ipc reply',
 'smon timer', 
 'SQL*Net message from client', 
 'SQL*Net break/reset to client',
 'SQL*Net message to client',
 'SQL*Net more data from client',
 'dispatcher timer',
 'Null event',
 'parallel query dequeue wait',
 'parallel query idle wait - Slaves',
 'pipe get',
 'PL/SQL lock timer',
 'slave wait',
 'virtual circuit status',
 'WMON goes to sleep') AND 
 event not like 'DFS%' AND 
 event not like 'KXFX%')
order by 2 desc, 1 asc  

Output from a system suffering from disk sorts might look like this:


Click for larger image

Notice the high percentages of time waited attributed to these two events over all other events.

One other thing you can do at the global database level is to check the I/O activity of your temporary tablespaces that are devoted to disk sort operations. A query like this can be used:

select  c.name tablespace_name,
        sum(b.phyrds),
        sum(b.phywrts),
        sum(b.phyblkrd),
        sum(b.phyblkwrt),
        sum(b.readtim),
        sum(b.writetim) 
from    sys.v_$datafile a, 
        sys.v_$filestat b,
        sys.ts$ c ,
        sys.file$ d
where   ( a.file# = b.file#) and 
        ( a.file# = d.file#) and
        d.ts# = c.ts# 
group by c.name 
union all
select  c.name tablespace_name,
        sum(b.phyrds),
        sum(b.phywrts),
        sum(b.phyblkrd),
        sum(b.phyblkwrt),
        sum(b.readtim),
        sum(b.writetim) 
from    sys.v_$tempfile a,
        sys.v_$tempstat b,
        sys.ts$ c ,
        sys.x$kccfn v, 
        sys.x$ktfthc hc 
where   a.file# = b.file# and 
        a.file# = hc.ktfthctfno and
        hc.ktfthctsn = c.ts# and
        v.fntyp = 7 and
        v.fnnam is not null  and 
        v.fnfno = hc.ktfthctfno  and 
        hc.ktfthctsn = c.ts#
group by c.name
order by 1

Output from a database with high disk sort activity might look like this:


Click for larger image

Notice the high numbers of reads and writes for the TEMP and TEMP2 tablespaces above. This indicates a lot of disk sort activity. You should also check the read and write times to see if Oracle is experiencing I/O delays in reading from and writing to the temporary tablespaces.

From the global database level, you can move into examining disk sort activity at the session level to see if one or more sessions are causing the bulk of disk sorts. A query I like to use for this is the following:

select  b.sid sid,
        decode (b.username,null,e.name,b.username) user_name,
        b.machine machine_name,
        to_char(logon_time,'dd-mon-yy hh:mi:ss pm') logon_time,
        sum(decode(c.name,'sorts (disk)',value,0)) disk_sorts,
        sum(decode(c.name,'sorts (memory)',value,0)) memory_sorts,
        sum(decode(c.name,'sorts (rows)',value,0)) rows_sorted,
        round(100 * (sum(decode(c.name,'sorts (disk)',value,0)) /
        greatest(sum(decode(c.name,
        'sorts (memory)',value,0)),1,
        sum(decode(c.name,'sorts (memory)',value,0)))),2)
        disk_memory_sort_ratio
from    sys.v_$sesstat a,
        sys.v_$session b,
        sys.v_$statname c, 
        sys.v_$bgprocess e
where   a.statistic#=c.statistic# and 
        b.sid=a.sid  and 
        e.paddr (+) = b.paddr  and
        c.name in ('sorts (disk)',
                 'sorts (memory)',
                 'sorts (rows)')
group by b.sid, 
         decode (b.username,null,e.name,b.username),
         b.machine, 
         to_char(logon_time,'dd-mon-yy hh:mi:ss pm') 
order by 5 desc,6 desc  

From the query's output, you can see what sessions may be responsible for disk sort actions.

Finally, to see if any disk sorts are in progress on your database, you can use this query (for Oracle 8.0 and higher) to view the SQL statements causing current disk sorts along with other associated information:

select sql_text,
       sid,
       c.username,
       machine,
       tablespace,
       extents,
       blocks
from   sys.v_$sort_usage a,
       sys.v_$sqlarea b,
       sys.v_$session c
where  a.sqladdr = b.address and
       a.sqlhash = b.hash_value and
       a.session_addr = c.saddr
order by sid

by Robin Schumacher

How to Fix Disk Sort Problems

So, you've done some investigation and are horrified to find lots of disk sort activity on your database. Is there anything you can do to make things better?

The first step is to try to eliminate any needless sort activity regardless of whether it is done in memory or disk. For example, UNION ALL does not cause a sort in a query whereas UNION does (to eliminate duplicate rows). DISTINCT oftentimes is inappropriately coded and might possibly be eliminated from certain queries.

The next step is to begin investigating the appropriate use of the Oracle initialization parameters that affect sort operations. Before doing this, you might wonder if setting such parameters can really make a difference. Let's take a look at an Oracle 8.1.7 database that has only the default 65K SORT_AREA_SIZE parameter set. We will run a query that causes a disk sort and examine its performance metrics. When first run, its performance metrics look like this:

Notice the numbers for physical reads and physical writes. If you only reviewed the physical reads statistic, you might think the query is performing physical read activity for data to satisfy the query. However, when you also include the physical writes statistic, you can then confirm the fact that the query is instead involved in a disk sort and that is what is causing the physical I/O activity. Of course, the sorts (disk) statistic definitively answers the question about whether disk sorting is active for the query.

Since this is Oracle 8i, let's dynamically alter the session to include a larger sort area size (1MB). We can do this by issuing the following commands:

ALTER SESSION SET SORT_AREA_SIZE=1024000;
ALTER SESSION SET SORT_AREA_RETAINED_SIZE=1024000;

Then, we rerun our query and observe the following performance changes:

By enlarging the amount of memory available to the session for sort operations, we have been able to kiss all physical I/O related to disk sorts goodbye, with the end result being a reduction in overall response time of 75% (eight seconds down to two).

If you're using Oracle9i or above, then you will want to work with the new PGA_AGGREGATE_TARGET initialization parameter that replaces all the pre-9i parameters that were devoted to sorting. Nevertheless, whether you are working with Oracle8i or 9i, take care and do not be overly generous with memory sort areas as each session is assigned this amount, so for databases with large numbers of users, memory can quickly become a rare commodity on servers that do not have an abundance of RAM.

What if you have increased the amount of memory devoted to sorting, but you still have disk sorts occurring on your system? Keep in mind that for large databases, sometimes disk sorts are unavoidable. In these cases, what can you do to help improve performance?

First, ensure that you are intelligently managing your temporary tablespaces. This equates to using true TEMPORARY tablespaces and assigning them to non-RAID5 disks on your server. Even if the hardware vendor states that no write penalty is possible for their flavor of RAID5 (because of their caching, etc.), I would still go for non-RAID5 for any write intensive files like those used for disk sorting.

You also want to place your temporary tablespaces on their own fast physical drive/controller combination if at all possible. If you have a large user community, consider creating multiple temporary tablespaces on different devices and assigning half the users to one tablespace and the other half to the other temporary tablespace to reduce contention. Finally, be sure you monitor temporary tablespace activity to see if you have got things well balanced on your server.

Conclusion

Without a doubt, disk sorts are a near-silent performance vampire that can rob a database of good response times. However, by intelligently diagnosing the presence of disk sorts and then doing what you can to reduce or eliminate them altogether, you can remove this subtle threat to your database's performance.


About the Author

Robin Schumacher is vice-president of product management at Embarcadero Technologies, Inc., a leading supplier of database software tools. Robin has numerous years of experience with database engines and has been a feature writer and software reviewer for many database-related publications. He is the author of a new book entitled Oracle Performance Troubleshooting from Rampant Press and can be emailed at Robin.Schumacher@embarcadero.com.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved