Monitoring target systems is a critical responsibility of today's database and system administrators. DBAs and SAs are responsible for ensuring our systems are available, that their performance is within acceptable parameters, and watching out for any kind of error conditions and/or unusual behavior patterns.
Oracle has made great strides in providing tools and techniques for administrators to be able to take steps to proactively monitor their systems for all of the above conditions, and then some.
This trend definitely continues with the new feature in Oracle Enterprise Manger 12c Cloud Control called Incident Manager.
This new feature provides administrators with a single interface that can be used to view, manage, resolve and track all types of issues that have occurred within their monitored environments.
Incident Manager can be configured to tie into the My Oracle Support knowledge base.
Incident Manager goes well beyond simply an interface to see information on any hiccups that have happened and includes abilities to:
- assign the responsibility of looking after an incident or problem to a specific user
- acknowledge incidents
- set priorities for incidents
- track the status of any incidents
- escalate and/or suppress an incident as necessary
DBAs can also open a helpdesk ticket because of the ability to configure help desk connectors in Incident Manager.
The notification system works with Incident Manager to ensure that EM administrators are notified in a timely manner. Notifications can happen when incidents, events or problems are encountered.
In taking a look at this new feature, the following topics will be covered:
- Database Privileges
- Key Definitions
- Moving From 11g to 12c
- Working with Incidents, Events & Problems * future article
- Working with Rules and Rule Sets * future article
- Using Notifications * future article
Database Privileges Associated with Incident Manager
There are several privileges that govern a user's ability to both view events and incidents, as well as the ability to manage them. It is important that these privileges are granted appropriately to all users who will be utilizing Incident Manager.
View Event - grants permission to view an event as well as the ability to add comments to an event.
Manage Event - allows for making changes to an event such as closing it, escalating it, manually creating an incident and creating a ticket for an event. Also included is the ability to associate incidents and events together.
View Incident - allows a user to view and add comments to an incident.
Manage Incident - allows a user to take actions on an incident, which may include assigning, closing, escalating etc.
View Problem - allows a user to view and add comments to a problem.
Manage Problem - like managing incidents, this allows a user to manage problems.
Create Enterprise Rule Set - Rule sets are now used to set up and manage notifications (replacing Notification Rules in previous releases of OEM). This privilege allows users to edit and create rule sets and their included rules.
There are three key terms that come into play with Incident Manager; they are Events, Incidents and Problems.
An event is something significant that happens to an entity that is detected by Enterprise Manager. Entities can range from specific targets like databases or hosts to jobs that are being run on one of these targets and beyond.
Events all have attributes, which include the event type (the list of event types is very long), the severity, the object it happened to, the message associated with the event, the timestamp and the category it belongs to.
Any event that is detected with a Fatal (a target down) or Critical (immediate action needed) status will automatically generate an Incident.
The other statuses are Warning (attention needed but it is still functioning), Advisory (caution needed, also used for compliancy reporting) and Informational (indicates the event occurred, but does not need to be addressed) will not generate an Incident.
A problem is defined as specifically an Oracle Software error. A problem is often identified when the recommended action to take to address the incident is to raise a Service Request with Oracle Support. Problems are generally addressed with the Support Workbench feature that Oracle introduced with 11g.
For example, if a critical ORA-600 error is thrown by the database, it would be flagged as a problem. This would automatically trigger the corresponding Automatic Diagnostic Repository (ADR) error.
Problems like events, can be combined into Incidents so that they can be tracked using the new Incident Manager features in OEM 12c Cloud Control.
Incidents is the level that the system is monitored at rather than trying to look at each individual event that has occurred. Incidents have one or more events or problems associated with them. The events/problems can even be from different event categories. For example, it might make sense to combine events on both CPU and Memory utilization into one Incident that reflects system resource availability issues.
The purpose of combining events or problems into incidents is to allow for monitoring at a level that is meaningful and significant to the DBA based on their environments. This also means that DBA's can focus on a smaller group of items initially, with the ability to drill into the details as necessary.
A rule tells EM to take a particular action when a problem, event or incident occurs. The most common action is to send an email notification, however rules can also do other things such as running a script or even raising an incident from an occurrence of an event or problem
A group of rules that operate on the same object, like a group of targets for example, is called a Rule Set. Rule sets are used to help manage rules. For example, one rule set can be created that groups the rules that will be used on production systems and a separate rule set can be added for rules for development or test systems.
Migrating from OEM 11g Grid Control to OEM 12c Cloud Control
Notifications on metrics and alerts that an administrator created in previous versions of OEM will be migrated to 12c. This means, that any notifications that were configured will continue to work the same way in 12c.
The notification rules are converted to event rules and an incident will automatically be created for any of these converted event rules.
Administrators will then have the ability to take full advantage of the new Incident Management features to fine tune and manage all of their existing notifications.
In the next article, we will focus more in depth on Incidents, Events and Problems and how to use Incident manager to manage them.