Posted in an Oracle forum some time back was a question on the
latch: parallel query alloc buffer
It was noted by the person posting the question that there's not much, if any, information on the
latch event. The question posed in that forum was, essentially, just WHAT is that latch event and why does it occur? To answer this it will be necessary to understand some of the 'ins-and-outs' of parallel processing according to Oracle.
Parallel query processing allocates slaves to do the bulk of the work; as a result of that parallel coordinators allocate message buffers to handle intermediate results. Depending on the system configuration and the load the parallel slaves must handle a surprisingly large number of message buffers can be allocated. There are two equations that can be used to compute the number of buffers that will be required; those equations use the parallel degree value that is either calculated by Oracle or specified in a hint; for simplicity that value will be represented by
pd. For a non-RAC Oracle database the number of buffers required would be calculated as:
RAC adds a bit of complexity and multiple instances and, as a result, the equation is modified to be:
On non-RAC systems various degrees of parallelism will be used to calculate the estimated number of buffers Oracle could allocate. Consider a query executed with a parallel degree set to 4; the first equation produces the following results:
4(4+1) * 3
4(5) * 3
20 * 3 = 60
Four parallel slaves can allocate 60 buffers. Now consider setting a parallel degree of 16. It's important to know that this buffer calculation is not proportional; a degree of 16 is four times the original setting, but the result is far from four times the original result:
16(16+1) * 3
Combining like terms produces:
16(17) * 3
which reduces to:
272 * 3 = 816
A parallel execution using 16 slaves requires a total of 816 message buffers, almost 14 times the number required by the execution at a parallel degree of 4. Given the computing power in modern servers, it is not a ridiculous thought to set a parallel degree of 128. Performing that calculation produces the following result:
126(128+1) * 3
Continuing as in previous calculations:
128(129) * 3
The result is:
16512 * 3 = 45936
This is a considerable increase over the buffers allocated for a parallel degree 4 execution and far larger than what would be expected if the result from the degree 4 execution is multiplied by 32. Having the "
latch: parallel query alloc buffer" wait as one of the top waits isn't a far-fetched idea after all. A determining factor affecting this wait is where the PX message pool ('PX msg pool') resides; both the large pool and the shared pool are locations where this pool is usually found. The query provided below can be executed to learn where it located in a given database configuration:
break on pool skip 1 column name format a32 select pool, name, bytes from v$sgastat where pool = 'large pool' or name like '%PX%' or name like '%para%' order by pool, name;
An example of the results of this query follows:
BING @ quanghoo > break on pool skip 1 BING @ quanghoo > column name format a32 BING @ quanghoo > BING @ quanghoo > select 2 pool, 3 name, 4 bytes 5 from 6 v$sgastat 7 where 8 pool = 'large pool' 9 or 10 name like '%PX%' 11 or 12 name like '%para%' 13 order by 14 pool, name; POOL NAME BYTES ------------ -------------------------------- ---------- large pool PX msg pool 491520 free memory 7897088 shared pool PX QC deq stats 1696 PX QC msg stats 3080 PX list of chunk lists 1088 PX msg pool struct 56 PX scan rate stats 1080 PX server deq stats 1696 PX server msg stats 3080 PX subheap 156344 POOL NAME BYTES ------------ -------------------------------- ---------- shared pool PX subheap desc 256 parallel kcbibr 32 parallel kcbibr dbwr bitv 16 parallel_max_servers 14720 param hash values 15912 parameter blocks 31824 parameter handle 137584 parameter string values 291240 parameter table block 1050648 parameter text value 9400 parameter value memory 1416 21 rows selected. BING @ quanghoo >
From the example above it can be seen that the 'PX msg pool', for this database, is in the large pool. Versions of Oracle older than 11.2 (there are still some running) could be using the shared pool; this could result in the "
latch: parallel query alloc buffer" wait to experience exceptionally long wait times as the result of contention for the available resources.
For RAC configurations the equation changes slightly but that does not mean that the calculated results follow that trend. Using the same parallel degree settings from the first round of calculations the differences start small but soon increase dramatically:
4(4+1) * 4
4(5) * 4
and produces a final result of:
20 * 4 = 80
an increase of 33% simply because this is Real Application Clusters. Moving on to a parallel degree of 16 the equation generates:
16(16+1) * 4
16(17) * 4
giving a final value of:
272 * 4 = 1088
This is 272 more buffers as a result of a clustered database configuration.
Engineered systems, from Oracle and from other vendors, make it possible to have a parallel degree set to 128 or higher. Considering parallel execution at a degree of 128 the computed allocation becomes:
128(128+1) * 4
Combining like terms:
128(129) * 4
16512 * 4 = 66048
66,048 is a large number of message buffers. Even 10 years ago a parallel degree of 128 could have crippled a transactional database and possibly some smaller data warehouses or data marts with waits for the 'parallel query alloc buffer' latch. Depending on the overall resource utilization (disk, memory, swap) some smaller engineered systems could still fall victim to this wait.
If parallel execution plans are rare in a database configuration this latch wait may be of little or no concern. If there is a reasonable chance that parallel execution can occur, then being aware that parallel buffer allocation can be an issue can make DBA life a bit less surprising and may shorten problem investigation and resolution times. This can all lead to a bit less stress on the DBA.
# # #