Monitoring target systems is a critical responsibility of today's database and system administrators. DBA's and SA's are responsible to ensure our systems are available, that their performance is within acceptable parameters, and watch out for any kind of error conditions and/or unusual behavior patterns.
In this series of articles, we've been looking into the functionality behind the new Incident Management features introduced in Oracle EM 12c Cloud Control. We now continue with a look at the main functions of Incident Manager itself and gain an understanding of both rules and rule sets, which are a critical part of controlling what will get automatically created into incidents to be tracked by the Incident Manger feature.
In my previous articles the focus was mostly on understanding the underlying events that are used as the building blocks behind Incidents that can in turn be managed and monitored using the new Incident Management feature of Oracle Enterprise Manager 12c Cloud Control.
In this article we will take a look at the main functions of Incident Manager and gain an understanding of both rules and rule sets, which are a critical part of controlling what will get automatically created into incidents to be tracked by the Incident Manger feature.
This is the heart of the incident management functionality that Oracle introduced in EM 12 Cloud Control. From this interface a DBA can take care of all aspects of monitoring, tracking and investigating incidents, events and problems in their databases.
To access Incident Manager, from the Enterprise menu on the EM home page, select Monitoring and then Incident Manager. At the top right hand side of the page will be a list of any incidents that have happened. To see more detail, simply select the incident and the bottom right will have detailed information about the incident.
There are five tabs of information, General, Events, My Oracle Support Knowledge, Updates and Related Events and Incidents. From the General tab there is a section for tracking the incident as well as potentially a section to help with a "Guided Resolution" where you can drill into areas such as performance findings, metric details, recent changes etc.
On the left hand side is a navigator style section that you can use to look at different views of the incidents, which include My Open Incidents (those assigned to you), Unassigned Incidents, Unacknowledged Incidents, All Open Incidents, Unassigned Problems, All Open Problems and Events without Incidents.
You also have the ability to create your own views of the incidents in the left side navigation pane. Any views you create are specific to your EM account, and are not available to other EM users.
You can manage an incident by choosing the view, All Open Incidents, clicking on the General tab and then choosing "Manage". From the Manage Dialog you can change the status, assign the incident to an EM user for follow up, change the priority, escalate the incident and add comments that other users can see.
If you wish to work on an incident (and thereby assign yourself as the owner of the incident), on the General tab click the link "Acknowledge", which will mark the incident as being acknowledged. From here you can look at the My Oracle Support Knowledge for additional information (and to be able to even issue an SR to Oracle for further assistance) or use the Guided Resolution if you would like to investigate further without necessarily contacting Oracle Support.
You can also suppress further messages or notifications from being processed on an incident, which you may want to do if it's currently being worked on, but it's not fully resolved yet. On the General tab, click on the "More" option and then choose "Suppress".
Generally once the underlying problem or cause of an incident is fixed the next time it's evaluated it will automatically be marked as cleared, however you can also manually clear outstanding events once they are resolved.
Not all events that happen in a database automatically cause an Incident to be logged. We create rules to have event(s) actually generate an incident to be managed. Building up the rules takes time, and so we have the option of manually generating an incident for tracking. There is one view "Events Without Incidents" that can be particularly useful here. It is good practice to periodically look at this view, and if at one point you do see an event that is significant, and you would like to actually track it as an "official" incident, select it, and from the General tab, select "More" and then Create Incident. You can assign an Incident #, assign it for tracking and change the status to "Work in Progress" and then you have the full functionality of Incident Manager to continue to monitor and track work on the incident. And at a later time, you can always generate a rule that would allow that event (or events) to automatically generate an incident in the future.
EM 12c Cloud control uses a combination of rules and rule sets to govern what action or actions should be taken when an event or incident occurs on a managed target.
Rules are essentially a set of directions for EM to take, such as sending an email, generating an incident automatically if specified events happen.
Rules can do any of the following actions:
- Create an incident
- Send a notification like sending an email or generating a help desk ticket
- Perform incident management actions (such as automatically escalating an alert if it's not worked by its assigned administrator in a specified period of time)
Rules consist of two parts - the event/incident or problem that it applies to and the action that should take place.
Rules are processed in the order that they are created or entered into a rule set by default so this must be taken into account when creating the rules and rule sets.
Let's say we want to generate in incident based on a combination of CPU and Memory metrics (perhaps that indicates a high system load). Then based on whether it's a warning or critical level we want to send a page or email, and lastly, if the incident is not closed within 3 days it should be escalated to a level 1 - then this is how the rules should be created
Criteria Condition Action
Rule 1 CPU Util(%) Metric Create Incident
Memory Util(%) Metric
Events - warning or
Rule 2 Second incident of Severity=Critical Notify by Page
Warning/Critical severity Severity=Warning Notify by email
Rule 3 Incident open for 3 days Set escalation level
One or more rules that apply to a target or collection of targets can be grouped into rule sets. We use rule sets to help organize rules into manageable units. For example, a rule set could be created that would be applied to production systems, a second rule set for development systems etc. Or, a rule set for database targets, host targets, etc.
Enterprise rule sets can be used across the enterprise, and they can perform all supported actions. The ability to create enterprise rule sets is restricted. When an action is done by a rule set, that action is done based on the privileges of the rule set creator.
Private rule sets can be created by any administrator in order for that administrator to set up notifications about any of their targets. The only action these rule sets can take is to email the rule set owner.
Rule sets include the following: a name, description, what it applies to, owner, status (enabled or not) and type (public/enterprise or private).
There are several rule sets that are created and activated automatically once EM 12c Cloud Control is installed. These built in rule sets will
- Create an Incident if a target goes down
- Create an incident for an agent unreachable error
- Create an incident for any critical metric alerts
- Create an incident for any service level agreement alerts
- Create an incident for compliance score violations
- Create an incident for any high-availability events.
- Automatically clear metric alerts older than 7 days
- Automatically clear job status change events older than 7 days
- Automatically clear Application Dependency and Performance (ADP) alerts after 7 days
These rule sets cannot be modified or deleted, however they can be disabled or enabled.
Next month we'll be looking at the detailed how-to of creating rules and rule sets in Oracle Enterprise Manger 12c Cloud Control. Until then...