DB2 Instance and Fast Communication Manager - Part 2

Friday Jan 30th 2004 by Marin Komadina
Share:

The process of tuning an ESE DB2 database is automatic--almost. Several components still require human intervention and the Fast Communication Manager is one of them. Learn about the Fast Communication Manager Memory Structure, Troubleshooting, and Tuning in Part 2 of this series.

Managing DB2 Fast Communication Manager

In the last article, we discovered the DB2 instance memory structure and fast communication manager, together with all of the DBM parameters influencing memory utilization. In this article we we will continue along those lines with an inside view of FCM tuning and troubleshooting techniques for DB2 running on SUN Solaris.

This article covers:

  • Fast Communication Manager Memory Structure
  • FCM Troubleshooting
  • FCM Tuning
  • Conclusion

Fast Communication Manager Memory Structure

A DB2 partitioned database can be installed using one of the tree following configurations:

  1. Single database partition configuration (SDP) or Single Logical Node (SLN)
  2. Multiple database partition configuration (MDP) or Multiple Logical Nodes (MLN)
  3. Hybrid asymmetric system, having a mix of SDP and MDP database partitions

a.) Single database partition configuration has a defined single database partition per physical host. The global control block for that DB2 instance has the following FCM structures:

  • FCM Connection Entries
  • FCM Message Anchors
  • FCM Request Block

FCM memory structures are separated, having FCM buffers separated from the rest FCM memory structures. The FCM buffer Pool is compromised ofthe FCM Control Block and FCM Buffers. Communication between FCM processes is over UNIX sockets. For each physical node, the memory structure is identical.

b.) Multiple database partition configuration has several database partitions defined per physical host. All memory structures for the Fast Communication Manager are relocated in the FCM Buffer Pool, having the following characteristics:

  • FCM Connection Entries /one per partition
  • FCM Message Anchors /one per partition
  • FCM Request Block /one per partition
  • FCM Buffers /global per node
  • FCM Control Block /global per node

Inter-partition communication on the same host occurs through shared memory (direct) and not over UNIX sockets (FCM deamon). For each physical node, the memory structure is identical.

c.) Hybrid system is very rare, and it is unusual to find it in practice. Principles discussed for SDP as well for MDP configuration are also applicable to mixed systems.



FCM Troubleshooting



For the most part, a DBA will not know that FCM really exists. The DB2 instance has been pre-optimized for a variable load, with satisfying default parameters. However, FCM buffers are not self-tuned and need to be observed. Let's look at one of the frequent problems with FCM buffers:



Reactive database administrator



Early failure occurrence from the database message log (db2dump.log):



Click for example code



Considering log file and error messages inside, there is an indication of a serious problem with the database resource alloc_buffer. DBM has to react to that error condition either by closing applications or in extreme cases by closing the database.

Failure endpoint from the database message log, where DBM is closing all of the database connections:

base_sys_utilities  stopdbm    Probe:911
 
Database manager is stopped.
 
2003-08-26-17.39.55.191185   Instance:artist   Node:000
PID:27123(db2stop2)   Appid:none
base_sys_utilities  stopdbm    Probe:911
fast_comm_manager  sqlkf_process_goodbye   Probe:40
 
Node 0 has closed its connection.
 
2003-08-26-17.39.51.787015   Instance:artist   Node:002
PID:1510(db2fcmdm 4)   Appid:none
fast_comm_manager  sqlkf_process_goodbye   Probe:40
 
Node 2 has closed its connection.

2003-08-26-17.39.51.787015   Instance:artist   Node:001
PID:1510(db2fcmdm 4)   Appid:none
fast_comm_manager  sqlkf_process_goodbye   Probe:40
 
Node 1 has closed its connection.

Listing system message log, showing many FCM errors:


# cat /var/adm/messages

Aug 26 17:39:53 ARTIST01 DB2[1506]: [ID 212799 user.error] 
  DB2 (artist.002) fast_comm_manager  sqlkf_process_goodbye reports: probe id 40 with error 0 and alert num 0
Aug 26 17:39:53 ARTIST01 DB2[1506]: [ID 482771 user.error]
  extra symptom string provided: Node 1 has closed its connection.
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 860831 user.error] 
  DB2 (artist.000) fast_comm_manager  sqlkfMlnPreConnect reports: probe id 31 with error 2055 and alert num 0
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 139117 user.error]   
  extra symptom string provided: Failed to send hsh msg to node  = 
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 702911 user.error]     data:   00000001              ....
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 860831 user.error] 
  DB2 (artist.000) fast_comm_manager  sqlkfMlnPreConnect reports: probe id 31 with error 2055 and alert num 0
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 139117 user.error]   
  extra symptom string provided: Failed to send hsh msg to node  = 
Aug 26 17:45:14 ARTIST01 DB2[6731]: [ID 702911 user.error]     data:   00000002              ....

Listing application error message:

Error: QueryEngine encountered error: Call SQLFetch got SQL_ERROR
[57011:-6040: on HSTMT] [IBM][CLI Driver][DB2/SUN] SQL6040C No available FCM-Buffer. SQLSTATE=57011

Explaining DB2 error SQL6040:


db2 => ? sql6040

SQL6040C No FCM buffers are available.  
Explanation:  No FCM buffer is available.  
The statement cannot be processed.  
User Response:  Try the request again.  
If the error persists, increase the number of FCM buffers (fcm_num_buffers)
 specified in the database manager configuration file, then try the request again.  
sqlcode:  -6040 
sqlstate:  57011

The reactive DBA has now all of the information needed to know how to fix the problem, (and he should hope that this would not happen again).

Proactive DB2 database administrator

The proactive DBA will start from the very first moment with regular monitoring of database activity. Having more user activity and larger tables results in the transference of more data between partitions. More data increases the demand for the number of request blocks and FCM buffers. More intra-partition operations cause a higher demand on the number of request blocks and message anchors.

To protect the database from periodic occurrences of unsuccessful transactions and the associated rollbacks due the lack of FCM resources, a DBA has to collect the FCM snapshot information for the database manager instance in the following manner:

Click for example code

A demonstrative FCM snapshot, consisting of FCM buffers, message anchors, connection entries and request blocks statistics is shown above. This information has been collected during instance activity by FCM deamon, and stored in the internal catalog tables. The collected statistics have all the elements necessary for proper FCM diagnostic. A careful look at the statistics will uncover a lack in the FCM resources.

For our test we have following instance FCM settings:


  >> db2 get dbm cfg | grep FCM_NUM_BUFFERS
   No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 1024

The number of FCM buffers is 1024 in 4KB blocks, resulting in total of 4MB.

The reconstructed FCM buffers situation just before getting database errors:

2003-08-26-16.00.59.631414   Instance:artist   Node:002
PID:5506(db2agntp (ARTIST) 2)   Appid:820B6595.04DC.030826134443
buffer_Q_services  sqlkqget   Probe:20   Database:ARTIST
DIA9999E An internal error occurred. Report the following error code :
"alloc_buffer".

db2 => get snapshot for FCM FOR ALL NODES
...
Node FCM information corresponds to            = 0
Free FCM buffers                               = 1024
Free FCM buffers low water mark                = 204
...

DB2 was working regularly until the high water mark for the FCM buffers come to a low value of 20%, or 204,8. Once the limit was reached, the application was disconnected and the transaction rolled back. DBM has on disposal 204 FCM buffers, and the DB2 internal mechanism will handle the priority for the FCM buffer requests between agents. The remaining buffers will be used only for medium and high-level priority operations (for example, a rollback).

When a number of free FCM buffers fall more then 10% (102), only high priority requests are served. A shortage of FCM_NUM_BUFFERS can cause serious performance degradation (DB2 may stop responding), proven in practice not even documented in the IBM manuals.

A different error message, SQL6043, indicates that DBM is running out of FCM request blocks:


db2 => ? sql6043

SQL6043C No FCM request blocks are available.
Explanation:  No FCM request block is available.
The statement cannot be processed.
User Response:  Try the request again.
If the error persists, increase the number of FCM request blocks (fcm_num_rqb) 
  specified in the database manager configuration file, then try the request again.
sqlcode:  -6043
sqlstate:  57011

As indicated in the error description, a number of FCM request blocks have to be extended. The DBM parameter fcm_num_rqb is in strict relation with the max_coordagents parameter:

fcm_num_rqb >= max_coordagents*2,5

Setting the number of request blocks to a value lower than what is defined by the formula returns an error.

There is no need to persue this or other FCM parameters since all parameters except the FCM buffers are self-configurable starting in DB2 version 8.1.

FCM Tuning

With DB2 version 7, the DBA has to collect the snapshot statistics and adjust the parameters accordingly. Fortunately, in DB2 version 8.1, most of the FCM configuration parameters are self-configurable besides FCM buffer parameter.

FCM_NUM_BUFFERS

FCM resources can be monitored via database snapshot monitoring by frequently taking snapshots and storing the output in a log file. When the number of free buffers falls below 25% of free FCM buffers, adding new FCM buffers is urgently needed. My opinion in this is in direct opposition to the IBM documentation, which states to add additional FCM buffers at a a low limit of 15%. Because a shortage of FCM buffers can cause database crashes and serious performance degradation, it is my opinion that additional FCM buffers should be added if the free buffers fall below 25%. IBM documentation, for example does not mention the possibility of a database crash due to the lack of FCM buffers. In reality this is happening.

Let's look at a DBM configuration with 204.1 buffers, where the level of 512 free buffers would signal the necessity of making a change.

Online, the FCM buffers extension for DB2 version 8.1:


db2 => update dbm cfg using FCM_NUM_BUFFERS 4096
DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully.

Changing the FCM_NUM_BUFFERS parameter in DB2 version 7 is similar, with only one exception. We need to restart DBM for changes to become active.

FCM_NUM_RQB

The following is an additional hint for DB2 version 7. The Parameter fcm_num_rqb has to be configured to support the maximum number of connections.

Listing DBM parameters related to the maximum connections:


$  db2 get dbm cfg |grep AGENTS
 Max number of existing agents               (MAXAGENTS) = 400
 Agent pool size                        (NUM_POOLAGENTS) = 40 (calculated)
 Initial number of agents in pool       (NUM_INITAGENTS) = 0
 Max number of coordinating agents     (MAX_COORDAGENTS) = (MAXAGENTS - NUM_INITAGENTS)
 Max no. of concurrent coordinating agents  (MAXCAGENTS) = MAX_COORDAGENTS
 Max number of logical agents          (MAX_LOGICAGENTS) = MAX_COORDAGENTS
 Max number of DARI processes                  (MAXDARI) = MAX_COORDAGENTS

According to the actual DBM setting, the calculation is next:

MAX_COORDAGENTS= MAXAGENTS-NUM_INITAGENTS = 400 0 = 400

My test system is configured to support the maximum number of 400 connections.

fcm_num_rqb >= max_coordagents*2,5= 400*2,5=1000

The current setting is sufficient for DBM defined agents.


# db2 get dbm cfg | grep FCM_NUM_RQB
 Number of FCM request blocks              (FCM_NUM_RQB) = 2048

Conclusion

With this article, Fast Communication Manager has been fully explained. There are some differences in FCM behavior between AIX and SUN platform in memory allocation. Intentionally I did not mix AIX into my article, leaving IBM to cover this area with some document in the future. Some things that every DBA should know before starting with any kind of tuning:

  • current instance and database settings
  • design/implementation structure
  • number of database partitions
  • number of active instance users
  • level of application code complexity

Once armed with the above information, a DBA's view becomes sharpened and he can predict potentional hot spots in self-tuned and high intelligence DB2 databases.

» See All Articles by Columnist Marin Komadina

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved