What is the Latch Event and Why Does It Occur?

September 2, 2019

Posted in an Oracle forum some time back was a question on the latch event:


    latch: parallel query alloc buffer

It was noted by the person posting the question that there’s not much, if any, information on the latch event. The question posed in that forum was, essentially, just WHAT is that latch event and why does it occur? To answer this it will be necessary to understand some of the ‘ins-and-outs’ of parallel processing according to Oracle.

Parallel query processing allocates slaves to do the bulk of the work; as a result of that parallel coordinators allocate message buffers to handle intermediate results. Depending on the system configuration and the load the parallel slaves must handle a surprisingly large number of message buffers can be allocated. There are two equations that can be used to compute the number of buffers that will be required; those equations use the parallel degree value that is either calculated by Oracle or specified in a hint; for simplicity that value will be represented by pd. For a non-RAC Oracle database the number of buffers required would be calculated as:


    pd(pd+1)*3

RAC adds a bit of complexity and multiple instances and, as a result, the equation is modified to be:


    pd(pd+1)*4

On non-RAC systems various degrees of parallelism will be used to calculate the estimated number of buffers Oracle could allocate. Consider a query executed with a parallel degree set to 4; the first equation produces the following results:


    4(4+1) * 3

becomes:


    4(5) * 3

and simplifying:


    20 * 3 = 60

Four parallel slaves can allocate 60 buffers. Now consider setting a parallel degree of 16. It’s important to know that this buffer calculation is not proportional; a degree of 16 is four times the original setting, but the result is far from four times the original result:


    16(16+1) * 3

Combining like terms produces:


    16(17) * 3

which reduces to:


    272 * 3 = 816

A parallel execution using 16 slaves requires a total of 816 message buffers, almost 14 times the number required by the execution at a parallel degree of 4. Given the computing power in modern servers, it is not a ridiculous thought to set a parallel degree of 128. Performing that calculation produces the following result:


    126(128+1) * 3

Continuing as in previous calculations:


    128(129) * 3

The result is:

:


    16512 * 3 = 45936

This is a considerable increase over the buffers allocated for a parallel degree 4 execution and far larger than what would be expected if the result from the degree 4 execution is multiplied by 32. Having the “latch: parallel query alloc buffer” wait as one of the top waits isn’t a far-fetched idea after all. A determining factor affecting this wait is where the PX message pool (‘PX msg pool’) resides; both the large pool and the shared pool are locations where this pool is usually found. The query provided below can be executed to learn where it located in a given database configuration:


break on pool skip 1
column name format a32
 
select
    pool,
    name,
    bytes
from
    v$sgastat
where
      pool = 'large pool'
   or
      name like '%PX%'
   or
      name like '%para%'
order by
    pool, name;

An example of the results of this query follows:


BING @ quanghoo > break on pool skip 1
BING @ quanghoo > column name format a32
BING @ quanghoo > 
BING @ quanghoo > select
              2         pool,
              3         name,
              4         bytes
              5 from
              6         v$sgastat
              7 where
              8         pool = 'large pool'
              9    or
             10         name like '%PX%'
             11    or
             12         name like '%para%'
             13 order by
             14         pool, name;

POOL               NAME            BYTES
------------ -------------------------------- ----------
large pool  PX msg pool         491520
               free memory         7897088

shared pool   PX QC deq stats         1696
              PX QC msg stats         3080
              PX list of chunk lists  1088
              PX msg pool struct      56
              PX scan rate stats      1080
              PX server deq stats     1696
              PX server msg stats     3080
              PX subheap              156344

POOL         NAME                   BYTES
------------ -------------------------------- ----------
shared pool  PX subheap desc            256
             parallel kcbibr               32
             parallel kcbibr dbwr bitv     16
             parallel_max_servers          14720
             param hash values             15912
             parameter blocks              31824
             parameter handle              137584
             parameter string values       291240
             parameter table block         1050648
             parameter text value          9400
             parameter value memory        1416


21 rows selected.

BING @ quanghoo &gt

From the example above it can be seen that the ‘PX msg pool’, for this database, is in the large pool. Versions of Oracle older than 11.2 (there are still some running) could be using the shared pool; this could result in the “latch: parallel query alloc buffer” wait to experience exceptionally long wait times as the result of contention for the available resources.

For RAC configurations the equation changes slightly but that does not mean that the calculated results follow that trend. Using the same parallel degree settings from the first round of calculations the differences start small but soon increase dramatically:


    4(4+1) * 4

reduces to:


    4(5) * 4

and produces a final result of:


    20 * 4 = 80

an increase of 33% simply because this is Real Application Clusters. Moving on to a parallel degree of 16 the equation generates:


    16(16+1) * 4

resulting in:


    16(17) * 4

giving a final value of:


    272 * 4 = 1088

This is 272 more buffers as a result of a clustered database configuration.

Engineered systems, from Oracle and from other vendors, make it possible to have a parallel degree set to 128 or higher. Considering parallel execution at a degree of 128 the computed allocation becomes:


    128(128+1) * 4

Combining like terms:


    128(129) * 4

produces:


    16512 * 4 = 66048

66,048 is a large number of message buffers. Even 10 years ago a parallel degree of 128 could have crippled a transactional database and possibly some smaller data warehouses or data marts with waits for the ‘parallel query alloc buffer’ latch. Depending on the overall resource utilization (disk, memory, swap) some smaller engineered systems could still fall victim to this wait.

If parallel execution plans are rare in a database configuration this latch wait may be of little or no concern. If there is a reasonable chance that parallel execution can occur, then being aware that parallel buffer allocation can be an issue can make DBA life a bit less surprising and may shorten problem investigation and resolution times. This can all lead to a bit less stress on the DBA.

# # #

See all articles by David Fitzjarrell

What is the Latch Event and Why Does It Occur?

Get the Free Newsletter!

Latest Articles

SQL SORT BY Statement

Working with NULL Values in SQL

Working with SQL AND, OR, and NOT Operators

How to Use Databases With Python

Advertisers

Menu

Our Brands

What is the Latch Event and Why Does It Occur?

Related Articles

Best Certifications for Database Administrators

Best Courses for Database Administrators

TYPE Definition Change in Oracle 21c

Get the Free Newsletter!

Latest Articles

SQL SORT BY Statement

Working with NULL Values in SQL

Working with SQL AND, OR, and NOT Operators

How to Use Databases With Python

Advertisers

Menu

Our Brands