Synopsis. Oracle 10g Release 2 (10gR2) has improved significantly the methodology for tracking performance metrics within the database. This article the last in this series demonstrates how to implement adaptive thresholds and use their metrics to effectively detect significant performance threshold violations while tuning out the "noise" from false positives.
In the prior article in this series, I discussed how Oracle 10gR2's new adaptive thresholds feature can improve the detection of threshold violations for about 15 critical metrics and how they can filter out false positives to provide more accurate warning mechanisms. In this article, I will spend a lot less time on theory. Instead, I will concentrate on how simple it is to set up adaptive thresholds with some actual examples of how to create metric baselines using both static metrics and rolling time periods. I will also demonstrate how to set up adaptive thresholds using metric baselines to elevate your Oracle 10gR2 database's warning and critical thresholds to new heights of sensitivity and meaningfulness.
Before I create any metric baselines, however, I need to activate the metric baseline feature from within Oracle 10g Enterprise Manager Database Control. When I select the Metric Baselines breadcrumb for my test database, the Enterprise Manager screen shown in Figure 2.1 is displayed when no metric baselines have been enabled yet. After I click on the Enable Metric Baselines button, Enterprise Manager requests a final confirmation; then, when I click on the OK button, metric baselines are automatically activated.
Creating a Metric Baseline
My next step is to create a metric baseline. Oracle 10gR2 allows me to do this using either statistics gathered from a rolling time period immediately prior to today, or from a static but specific range of dates. Figure 2.2 shows the Enterprise Manager screen that determines which method to use. Note that I can use a time frame as short as the past seven days for the rolling time period the minimum for the creation of any metric baseline up to the maximum of a rolling 91-day time period.
On the other hand, to create a static metric baseline, I instead specify the name of the new static baseline as well as a range of dates. This date range must correspond to a minimum of seven contiguous days. The time period chosen should be representative of the type of adaptive thresholds that utilize these metrics. For example, if I know that my database has been performing in a relatively stable fashion during a particular time period, the corresponding statistics would be valuable metrics for comparison against future performance.
Once I've selected a time frame, I can just click on the Compute Statistics button, and Oracle 10gR2 will determine if there are minimum sufficient data to facilitate creation of at least one adaptive threshold measurement. Figure 2.3 shows Enterprise Manager's response if there are insufficient data. Note that Oracle marks with a warning symbol any metrics that cannot be determined; also, no data will be displayed when I click into any of the "eyeglass" icons for each faulty metric.
Even in this case, all is not lost! I could also change the variables that are used to calculate the time series for the metric static baseline. For example, I could change the granularity of the Day Grouping series from By Hour of Day to By Day of Week; I could also modify the Week Grouping to reflect Weekends and Weekdays instead of By Day of Week, or even eliminate a grouping scheme altogether. Once I've done this, I would simply click on the Compute Statistics button, and Oracle 10gR2 will once again determine if there are minimum sufficient data to create the static metric baseline.
On the other hand, if sufficient data do exist, then Oracle will display a checkmark to indicate this. Figure 2.4 shows Enterprise Manager's response when there are sufficient data to create the metric baseline. In this case, I have created a new static metric baseline, MB1 using all available data for the time period between March 12, 2005 and March 21, 2005. I can simply click into each of the "eyeglass" icons to see how Oracle 10gR2 will calculate the values for the metric baseline. Clicking the OK button on this page completes the creation of the metric baseline, establishing it for immediate use with adaptive thresholds. (Remember from the previous article that Oracle 10gR2 permits the creation and retention of multiple static baselines for use by adaptive thresholds.)
Now that I've established a valid static metric baseline, it is time to use that baseline to activate appropriate adaptive thresholds. As shown in Figure 2.5, I will select the MB1 baseline from the Metric Baselines screen by clicking the Set Adaptive Thresholds button.
Oracle then passes control to the Manage Adaptive Thresholds page (Figure 2.6). From this screen, I can set specific adaptive thresholds for each of the displayed metrics. In this example, I have selected just the Number of Transactions (per second) adaptive threshold for editing; once selected, I can then set up adaptive thresholds for the metric by clicking on the Edit button on this page.
Figure 2.7 shows the two different sets of options available for setting adaptive thresholds:
- If I choose the Significance Level option, Oracle will use percentiles to determine when an adaptive threshold has been breached. For this example, I've set the warning and critical levels for Number of Transactions to High and Very High. Oracle will then raise a warning alert when the number of transactions measured reach 95% of the adaptive threshold value as of the time specified; likewise, Oracle will raise a critical alert when 99% is reached.
- The other option, Percentage of Maximum, Oracle will determine when an adaptive threshold has been breached based on the specified percentages, using the 99th percentile value for the calculation. For example, if I set the warning and critical threshold values to 125% and 150% respectively for the Number of Transactions metric, and the 99th percentile value is 10 logins as of a particular point in time, then a warning alert would be raised at 13 and 16 logins, respectively.
- Note that in either case, I can also specify the total concurrent number of threshold violations that must occur before an alert is raised.
If I click on the "eyeglass" icon for any of the Metric Names displayed, Oracle 10gR2 will show me a graph (Figure 2.8) that demonstrates how adaptive thresholds will be applied for the selected metric. Note that there are two distinct sets of adaptive thresholds: one for normal daytime hours (7AM 7PM) and one for normal nighttime and early morning hours (7PM 7AM). When I created the MB1 static metric baseline, I selected the By Day and Night option for grouping the baseline's statistics.
Using Adaptive Thresholds: It's the Deflection That Counts
Now that I have established at least one adaptive threshold, I can monitor them via what I like to call a deflection graph. For each of the three categories of baselined metrics (Performance Metrics, Workload Volume Metrics, and Workload Type Metrics), I can choose and position three different metrics for display in the deflection graph. For this example, I have selected the nine metrics as shown in Figure 2.9 on the Configure Normalization Metrics page. I accessed this page by clicking on the Baseline Normalized Metrics breadcrumb from the Metrics Baseline page (see Figure 2.4).
Since all of the metrics have been selected, Oracle 10gR2 provides a series of graphs that show the deflection from the established adaptive thresholds for the metrics. Figure 2.10 shows a typical display of these deflections on the Baseline Normalized Metrics page, which I have accessed from its corresponding breadcrumb on the Metric Baselines page. The most interesting feature of these graphs is that they have no X-axis; indeed, they all essentially use the same axis because they really represent the common deflection as measured across all adaptive thresholds.
Testing Adaptive Thresholds: Observing the Deflection
Note that the graph that represents the Number of Transactions metric shows virtually no deflection from the established adaptive thresholds, while many of the other metrics do indeed show some apparent deflections. However, that's about to change dramatically in my next demonstration.
I first executed the SQL code in Listing 2.1 against my Oracle database at 7:55 PM, passing it a value of 40 to simulate a rapid increase in the number of transactions per second (i.e. over 40/sec). At 8:01 PM, I once again ran the same code, but passed in a value of 12. In both cases, this caused an obvious and observable deflection in the graph on the Baseline Normalized Metrics page for the Number of Transactions (per second) graph as shown in the highlighted section of the graph in Figure 2.11.
When I executed the same SQL code with a value of six (6) at 8:05 PM, however, Oracle doesn't even detect the deflection because the upper boundary of significance is below ten (10) transactions per second as of that the time of day. This is the essence of adaptive thresholds: Oracle will only signal that a warning or critical threshold has been breached when the violation exceeds the threshold that's in force as of the specific point in time.
One final point of interest on the page that displays these deflection graphs: I can also "tune out" even more of the "white noise" for the adaptive thresholds in force by selecting the High setting instead of the Medium or Low setting for Noise Reduction as shown in Figure 2.11. These settings tend to filter out even more of the less-noticeable deflections, thus yielding a more obvious view of the true "spikes" that have been detected so far.
Deactivating Adaptive Thresholds
If I want to deactivate all adaptive thresholds, I simply choose the Disable Metric Baselines button on the Metric Baselines page. As Figure 2.12 shows, Oracle 10gR2 prompts me for a final confirmation; once I click on the OK button, all adaptive thresholds will be deactivated.
Finally, a warning: Oracle strongly recommends gaining experience with adaptive thresholds in a test environment before implementing them in a production environment. Since these thresholds are more complex to understand than regular arithmetic thresholds, and since it is possible to tune out "noise" accidentally, it is not impossible to configure them incorrectly the first time out. I also heartily recommend running a representative workload against the test environment to insure that threshold violations are being trapped as expected!
With the addition of Adaptive Thresholds to my arsenal of DBA tools, Oracle 10gR2 lets me monitor several critical performance metrics for my database based on their significance as of a specific point in time. With a few simple mouse clicks in Oracle 10gR2 Enterprise Manager, I can easily construct the necessary baseline metrics for either a specified historical time frame or based on a rolling time period for the last several days, weeks, or even months. Once the baselines are established, a few more mouse clicks set up the adaptive thresholds that Oracle 10gR2 then monitors for violations. These adaptive thresholds help to eliminate those irritating "false positives" that plain arithmetic thresholds tend to detect erroneously, but they also tend to detect "near-miss" violations that plain arithmetic thresholds most often ignore.
References and Additional Reading
Even though I've hopefully provided enough technical information in this article to encourage you to explore with these features, I also strongly suggest that you first review the corresponding detailed Oracle documentation before proceeding with any experiments. Actual implementation of these features should commence only after a crystal-clear understanding exists. Please note that I've drawn upon the following Oracle 10gR2 documentation for the deeper technical details of this article:
B14214-01 Oracle Database New Features Guide
B14231-01 Oracle Database Administrator's Guide
B16241-01 Oracle Database Enterprise Manager Concepts