dcsimg
 

Oracle OEM Agent 13c and Perl

Thursday Oct 25th 2018 by David Fitzjarrell

Perl can sometimes 'misbehave' with the Oracle Enterprise Manager (OEM) agent for version 13c.  Read on to see what issues it can create and how to correct them.

Occasionally bugs creep in to code trees and end up in production. Oracle is not immune to this eventuality, and in Oracle Enterprise Manager (OEM), the agents can exhibit some interesting behavior with regard to Perl. The OEM 13c agent, unfortunately, has a bug that causes some Perl processes to consume large amounts of CPU. Let's discuss this and investigate the patch that Oracle has provided.

The OEM agent for the 13c version of Enterprise Manager can, on some platforms, have Perl processes that appear to 'run away' with the CPU, causing CPU alerts in monitoring software and leaving the DBAs to scratch their heads in confusion, especially if the DBA is not familiar with a common UNIX/Linux tool named top. Top reports on various aspects of the system, usually with averaged numbers, giving a relatively decent view of the server performance. Looking at a typical top session we see the following:


top - 14:47:11 up 15 days, 21:16, 15 users,  load average: 5.71, 5.16, 3.65
Tasks: 514 total,   1 running, 513 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.1 us,  1.8 sy,  0.0 ni, 48.1 id, 48.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32944620 total,   224868 free,  2903176 used, 29816576 buff/cache
KiB Swap:  4194300 total,  4067628 free,   126672 used.  6357708 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
13168 oracle    20   0   10.0g   1.4g   1.3g D   4.0  4.3  10:12.54 ora_dw00_splerb
 4965 splunk    20   0  220432  56836   5588 S   1.3  0.2  74:42.96 splunkd
11304 oracle    20   0    9.9g  57820  54976 S   1.3  0.2   2:43.72 ora_vkrm_splerb
  574 root      20   0   13228   1700   1492 S   1.0  0.0  22:33.62 rngd
16727 oracle    20   0  160484   4840   3732 R   1.0  0.0   0:00.16 top
11312 oracle    20   0    9.9g   1.6g   1.6g D   0.7  5.1   0:43.54 ora_dbw0_splerb
11373 oracle    20   0    9.9g 229680 216964 S   0.7  0.7   4:10.27 ora_cjq0_splerb
  493 root       0 -20       0      0      0 S   0.3  0.0   9:21.23 kworker/1:1H
11276 oracle    20   0    9.9g  57240  54408 S   0.3  0.2   1:02.66 ora_psp0_splerb
11807 oracle    20   0 2189184 322576   4176 S   0.3  1.0  56:43.91 java
29163 oracle    20   0 3657680  85792  81204 S   0.3  0.3   2:10.49 ora_scmn_apnbbu
29187 oracle    20   0 3438312 187604 184436 S   0.3  0.6   1:41.53 ora_ckpt_apnbbu
30400 oracle    20   0    9.9g  61796  59024 S   0.3  0.2   0:01.10 ora_pmon_aecm
    1 root      20   0  125752   4836   3108 S   0.0  0.0  18:11.47 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.38 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:40.76 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S   0.0  0.0   9:31.27 rcu_sched
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S   0.0  0.0   8:38.35 rcuos/0
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0
   11 root      rt   0       0      0      0 S   0.0  0.0   0:01.45 migration/0
   12 root      rt   0       0      0      0 S   0.0  0.0   0:04.62 watchdog/0
   13 root      rt   0       0      0      0 S   0.0  0.0   0:04.34 watchdog/1
   14 root      rt   0       0      0      0 S   0.0  0.0   0:01.42 migration/1
   15 root      20   0       0      0      0 S   0.0  0.0   0:40.30 ksoftirqd/1
   17 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H
   18 root      20   0       0      0      0 S   0.0  0.0   2:30.77 rcuos/1
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1
   20 root      rt   0       0      0      0 S   0.0  0.0   0:06.24 watchdog/2
   21 root      rt   0       0      0      0 S   0.0  0.0   0:01.95 migration/2
   22 root      20   0       0      0      0 S   0.0  0.0   3:39.80 ksoftirqd/2
   24 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H
   25 root      20   0       0      0      0 S   0.0  0.0   2:39.00 rcuos/2
   26 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2
   27 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 khelper
   28 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs

An overview of system statistics and performance is in the first five lines of the report, giving the total number of tasks, uptime, CPU usage (updated every 5 seconds on some Linux systems), the load averages over periods of 1, 5 and 15 minutes, memory and swap used and available. The remainder of the output shows top processes as they are executing and the process on the top of the list usually changes with each refresh of the screen. It's not abnormal for a top report to keep one process at the top for two or more refreshes, but processes that stay at the top of the output through many screen refreshes are usually suspect. Many times, such processes are showing that the CPU usage is 80% or higher, with 99% not unheard of for processes 'wildly' consuming CPU.

In the case of the OEM agent for version 13c a singular Perl process can 'roost' at the top of the process list at a CPU usage of 99%. In such a case the top output could appear as shown below:


top - 14:47:11 up 15 days, 21:16, 15 users,  load average: 5.71, 5.16, 3.65
Tasks: 514 total,   1 running, 513 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.1 us,  1.8 sy,  0.0 ni, 48.1 id, 48.9 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32944620 total,   224868 free,  2903176 used, 29816576 buff/cache
KiB Swap:  4194300 total,  4067628 free,   126672 used.  6357708 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
22869 oracle    20   0   10.0g   1.4g   1.3g D  99.7  4.3   0:02.47 perl
 4965 splunk    20   0  220432  56836   5588 S   1.3  0.2  74:42.96 splunkd
11304 oracle    20   0    9.9g  57820  54976 S   1.3  0.2   2:43.72 ora_vkrm_splerb
  574 root      20   0   13228   1700   1492 S   1.0  0.0  22:33.62 rngd
16727 oracle    20   0  160484   4840   3732 R   1.0  0.0   0:00.16 top
11312 oracle    20   0    9.9g   1.6g   1.6g D   0.7  5.1   0:43.54 ora_dbw0_splerb
11373 oracle    20   0    9.9g 229680 216964 S   0.7  0.7   4:10.27 ora_cjq0_splerb
  493 root       0 -20       0      0      0 S   0.3  0.0   9:21.23 kworker/1:1H
11276 oracle    20   0    9.9g  57240  54408 S   0.3  0.2   1:02.66 ora_psp0_splerb
11807 oracle    20   0 2189184 322576   4176 S   0.3  1.0  56:43.91 java
29163 oracle    20   0 3657680  85792  81204 S   0.3  0.3   2:10.49 ora_scmn_apnbbu
29187 oracle    20   0 3438312 187604 184436 S   0.3  0.6   1:41.53 ora_ckpt_apnbbu
30400 oracle    20   0    9.9g  61796  59024 S   0.3  0.2   0:01.10 ora_pmon_aecm
    1 root      20   0  125752   4836   3108 S   0.0  0.0  18:11.47 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.38 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:40.76 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S   0.0  0.0   9:31.27 rcu_sched
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S   0.0  0.0   8:38.35 rcuos/0
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0
   11 root      rt   0       0      0      0 S   0.0  0.0   0:01.45 migration/0
   12 root      rt   0       0      0      0 S   0.0  0.0   0:04.62 watchdog/0
   13 root      rt   0       0      0      0 S   0.0  0.0   0:04.34 watchdog/1
   14 root      rt   0       0      0      0 S   0.0  0.0   0:01.42 migration/1
   15 root      20   0       0      0      0 S   0.0  0.0   0:40.30 ksoftirqd/1
   17 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H
   18 root      20   0       0      0      0 S   0.0  0.0   2:30.77 rcuos/1
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1
   20 root      rt   0       0      0      0 S   0.0  0.0   0:06.24 watchdog/2
   21 root      rt   0       0      0      0 S   0.0  0.0   0:01.95 migration/2
   22 root      20   0       0      0      0 S   0.0  0.0   3:39.80 ksoftirqd/2
   24 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H
   25 root      20   0       0      0      0 S   0.0  0.0   2:39.00 rcuos/2
   26 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2
   27 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 khelper
   28 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs

The Perl process, collecting configuration data from the Oracle Home of a database is consuming record amounts of CPU and to clear the CPU usage error may need to be killed. This is the result of bug 26113746, and a patch is available through Oracle support. Unfortunately, the patch can't be applied to the agents without applying a different patch, 25163555, to the Middleware home of the OEM server. This patch fixes an issue with the agent patch that caused the Agent Upload process to fail to update the OEM metadata because the 13.2 metadata type is missing. Both patches are required to fix the Perl issue in the OEM 13c Agent.

Applying those patches will not be discussed here as that information can be found in the following documents:

https://support.oracle.com/epmos/faces/DocumentDisplay?id=2243040.1&displayIndex=7#CAUSE

https://support.oracle.com/epmos/faces/DocumentDisplay?id=2249682.1&displayIndex=3#FIX

Unexpected CPU spikes are disconcerting, to say the least. When they are the result of a rogue Perl process (usually a very tame executable) things can get exasperating since top doesn't display much in the way of useful information. Querying V$SESSION is where the DBA needs to begin as the relevant information is found in the MODULE column. EM Agent processes will be listed as 'emagent_SQL_oracle_database' and should map to the offending O/S process (join V$SESSION.PADDR to V$PROCESS.ADDR to return the relevant SPID):


SQL > select spid from v$session, v$process where paddr = addr and module = 'emagent_SQL_oracle_database';

SPID
------------------------
22869
3788

SQL >

Knowing this errant process is associated with the EM agent makes it much easier to kill (it won't be a required database process or user session). Unfortunately, until the patches are applied this is the only way to free up the CPU and 'solve' the performance issue.

This problem doesn't affect all systems, but it is considered a generic bug and is treated as such. Knowing something like this COULD affect you puts you in a better position to immediately address it should the need arise.

See all articles by David Fitzjarrell

Home
Mobile Site | Full Site