Oracle Database 11g: Database Replay, Part 2

Friday May 30th 2008 by Jim Czuprynski

Part 2 of this series - demonstrates how Oracle 11g Database Replay can be used to capture and prepare a workload from a current Oracle 11g production database environment (P+0) and then replay that identical workload in an Oracle 11g testing environment that represents the next iteration (P+1) of the database system.

Synopsis. Oracle Database 11gR1 offers the ability to capture applications’ workloads in a production environment and then replay that workload against a test environment to determine the impact of proposed system, database or application modifications on database performance. This article – the second in this series – demonstrates how Oracle 11g Database Replay can be used to capture and prepare a workload from a current Oracle 11g production database environment (P+0) and then replay that identical workload in an Oracle 11g testing environment that represents the next iteration (P+1) of the database system. This offers an Oracle DBA the chance to analyze and isolate the vectors that might cause detrimental performance or divergence between current and future environments well before any actual changes are rolled out.

A friend of mine who teaches future cardiologists describes the time-honored tradition under which surgeons learn their trade “watch one, assist one, do one.” The prior article in this series provided a primer into the theory and best practices behind Oracle Database 11g’s new Database Replay features, and while understanding the theory behind a complex set of features is certainly a noble goal, it’s time to move on to the assist one / do one phases. This article will focus on how to:

  • Capture an actual workload from an Oracle 11g database
  • Capture the corresponding Automatic Workload Repository (AWR) data for that recorded workload
  • Prepare the test database environment for eventual workload replay
  • Transfer the production workload configuration to the testing environment
  • Preprocess the production workload for replay
  • Replay the workload on the test system
  • Analyze the replayed workload for any performance issues or divergence

Simulated Application Environment

A word about my testing environment for this article: For simplicity’s sake, I’m going to use the same database for both capture and playback operations. I’m using the basic Oracle 11g “seed” database with all sample schemas installed. The database will be run in ARCHIVELOG mode with Flashback Logging activated so that I can use the FLASHBACK DATABASE command to quickly “rewind” the database to an appropriate starting point for replay.

Phase 1: Recording a Workload

To set up a valid capture and reply scenario, I’ve constructed several new users, tables, indexes, and related PL/SQL objects:

  • A new user, ADMIN, will be used as a repository for global administrative objects within my Oracle 11g database. I also created a new table for storage of primary key values. The code to reproduce these objects is shown in ADMINSetup.sql.
  • PL/SQL package ADMIN.PKG_SEQUENCING controls assignment of new primary key values. The source code for its package specification is shown in pkg_sequencing.spc, and the initial version of its package body is shown in pkg_sequencing_v1.bdy.
  • Another user, AP, will encapsulate the schema for a simulated Accounts Payable system, including new tables AP.VENDORS, AP.INVOICES, and AP.INVOICE_ITEMS. The code to create this new schema and its related objects is found in APSetup.sql.
  • To populate AP schema objects, I’ve set up package AP.PKG_LOAD_GENERATOR. The source code for its specification and body can be found in pkg_load_generator.spc and pkg_load_generator.bdy, respectively.
  • Finally, the code in APInitialization.sql loads table AP.VENDORS with a few hundred rows of simulated data, creates 25 new Invoices in table AP.INVOICES and corresponding Invoice line item detail in table AP.INVOICE_ITEMS, and gathers initial optimizer statistics for all objects in the ADMIN and AP schemas. It also creates a directory object, DBRControl, that Database Replay will use to store the resulting scripts and XML files generated during workload capture.

Setting Up for Workload Capture

Now that my source database’s environment is initialized, I’ll initiate the capture of an actual workload. Figure 2.1.1 below shows the initial screen for Database Replay, which is accessed from the Database Replay link under the Real Application Testing section on the Software and Support tab of the EM Database Control facility:

Figure 2.1.1: Workload Capture Setup: Initial Screen

Once I’ve selected the first task, I’m presented with a checklist that lists all prerequisite steps that I’ll need to acknowledge before my capture session is allowed to proceed.

Figure 2.1.2: Workload Capture Setup: Plan Environment Checklist

Oracle 11g next presents me the ability to either restart or not restart the database before officially capturing the workload, and it lets me filter out unnecessary session activity (e.g. from EM itself!). Note that I’ll follow Oracle 11g’s recommended best practice for “clean captures”: I’ll accept EM’s suggestion to stop and restart the test database to establish an effective start time for the capture.

Figure 2.1.3: Workload Capture Setup: Options

Next, I’m presented with the chance to name the capture session and specify the directory object into which all replay scripts will be captured.

Figure 2.1.4: Workload Capture Setup: Setting Parameters

Oracle 11g next needs me to specify the name of a scheduled EM Database Control task that will handle the workload’s capture. Figures 2.1.5 and 2.1.6 show the final confirmation screens for that task’s setup.

Figure 2.1.5: Workload Capture Setup: Assigning EM Task Name

Figure 2.1.6: Workload Capture Setup: Final Task Review

Finally, Oracle 11g requests one last confirmation ...

Figure 2.1.7: Workload Capture Setup: Task Submission

... and the capture is initiated. Once Oracle 11g displays this screen, it’s essentially waiting for me to start the representative workload against the source database.

Capturing an Actual Workload

To simulate multiple executions of similar code by different users, I’ve prepared a simple shell script named RandomLoadGenerator.sh that executes simple queries that do CPU intensive computations, generates complex queries against the AP schema, and also “grinds through” the insertion of several thousand rows into the new AP schema’s tables. I’ve also configured my source database environment to use multiple service names, one for each class of users, as shown in SI_Services_tnsnames.ora. I’ve added those service names as potential connection aliases in my database’s TNSNAMES.ORA configuration file.

I started this workload against my P+0 database environment. Once it was complete, I logged back into EM Database Control and viewed the results of its execution as shown in Figure 2.2.1 below, and then clicked on the Stop Capture button to terminate the workload capture.

Click for larger image

Figure 2.2.1: Workload Capture: Reviewing Capture Task Status

Oracle 11g then asks for confirmation of the capture task’s termination, and displays a stopwatch until the capture has been completed.

Figure 2.2.2: Workload Capture: Confirming Capture Termination

Figure 2.2.3: Workload Capture: Confirming Capture Termination

Just as capture termination completes, Oracle 11g asks if I’d like to capture the corresponding Automatic Workload Repository (AWR) data for the recorded workload. As shown in Figure 2.2.4 below, I’ve selected to capture any related AWR snapshot(s) for later comparison reporting.

Figure 2.2.4: Workload Capture: Requesting AWR Snapshot Generation

Once the capture is complete, I can review the Oracle 11g’s capture results to determine if the capture was successful and if it contains sufficient data. If I find that it’s insufficient, the FLASHBACK DATABASE command set allows me to “rewind” the database to just before capture started and then rerun the capture. I also selected the View Workload Capture Report button and generated a Database Capture Report that summarizes the completed capture.

Phase 2: Preparing for Replay

Even though I’ve successfully completed capturing a sufficient workload within my P+0 database environment, I still have some work to do before I can play it back against the P+1 environment.

Resetting the P+0 Environment

Since I’m using the same database for both the source and target environments, I’ll first need to reset my current P+0 database system to the state immediately before I started workload capture. My database is in flashback logging mode, so I’ll simply use Recovery Manager’s FLASHBACK DATABASE command to “rewind” the database to its initial state:

$> rman target /
RMAN> shutdown immediate;
RMAN> startup mount;
RMAN> reset database to incarnation 6;
RMAN> flashback database to scn= 4162947;

Migrating to the P+1 Environment

My next step is to apply the changes necessary to bring my database environment to the P+1 state. For this rather simple demonstration, I’ll make two changes that are almost certain to impact the performance of my database in P+1 mode:

  • I’ve prepared an improved version of procedure ADMIN.PKG_SEQUENCING.NEXT_ID that uses sequences instead of table ADMIN.NEXT_IDS to determine the next primary key values for tables in the AP schema. This should dramatically increase the performance of procedure AP.PKG_LOAD_GENERATOR.RANDOMDML, which creates a large number of random entries in the AP.INVOICES and AP.INVOICE_ITEMS tables, during workload replay.
  • I’ll also drop the index on AP.INVOICES.CUSTOMER_ID and recalculate statistics on the AP schema. Since procedure AP.PKG_LOAD_GENERATOR.RANDOMQUERY often uses this index to select rows efficiently when it generates random queries against view AP.RV_INVOICE_DETAILS, I should expect to see a noticeable regression in its performance during replay of the workload.

“Massaging” the Workload

Now that my P+1 environment is in place, I’m ready to preprocess the workload for playback. Once again, I’ll use EM Database Control to initiate the preprocessing sequence. Figure 2.3.1 shows the results of selecting the Preprocess Workload option from the main Database Replay panel once I’ve selected the DBRControl directory object as the target for preprocessing:

Figure 2.3.1: Preprocess Captured Workload: Selecting a Captured Workload

Once I’ve chosen the desired workload, Oracle 11g reminds me that I’ve got to use the same database version to eventually replay the selected workload ...

Figure 2.3.2: Preprocess Captured Workload: Database Version Warning

... and then sets up a new EM scheduled task to complete the preprocessing.

Figure 2.3.3: Preprocess Captured Workload: Scheduling the Preprocessing Task

Oracle 11g prompts for a final confirmation before submitting the preprocessing task, and then it fires it off immediately.

Figure 2.3.4: Preprocess Captured Workload: Final Confirmation

Phase 3: Replaying the Workload

Since my target database has now been reset to its prior state just before workload capture was initiated and all my P+1 changes are now in place, at last I’m ready to initiate the replay of the workload. I’ll use EM Database Control to initiate the replay, as shown in Figure 2.4.1 below.

Click for larger image

Figure 2.4.1: Workload Replay: Starting Point

Once I’ve selected a directory that already contains a captured workload, Oracle 11g makes sure that I’ve reviewed all required prerequisites for a successful replay, including what external references (e.g. external files and directories) need to be resolved before proceeding, as shown in Figures 2.4.2 and 2.4.3 below.

Figure 2.4.2: Workload Replay: Confirming Prerequisites

Figure 2.4.3: Workload Replay: Confirming References to External Systems

Once all prerequisites have been confirmed, it’s time to set up the actual EM task that will perform the replay. Figure 2.4.4 shows how I set up the basics for this task ...

Figure 2.4.4: Workload Replay: Setting Up Replay Task

... and Figure 2.4.5 shows how I remapped the current connection strings generated during the initial workload capture so that the identical connection strings are used during this database replay operation.

Figure 2.4.5: Workload Replay: Setting Up Connection Strings

As I described in the Database Replay primer article, Oracle 11g allows me to alter the frequency at which a workload is played back with extreme granularity. To keep this demonstration simple, I’ve accepted the default options shown in Figure 2.4.6.

Figure 2.4.6: Workload Replay: Customizing Replay Options

It’s now almost time to start the Workload Replay Client (WRC) to play back the previously captured workload. As Figure 2.4.7 illustrates, my next step is to start the WRC with the appropriate parameters and then return to this screen and select the Next button to initiate the workload’s automatic replay:

Figure 2.4.7: Workload Replay: Initiating Workload Replay Clients

I’ll start the replay session by opening a terminal session and invoking the WRC client executable. Note that I first changed my present working directory to /home/oracle/DBRControl before starting up the WRC:

$>  wrc system/oracle@orcl mode=replay replaydir=.
Workload Replay Client: Release - Production on Thu May 22 19:28:59 2008
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Wait for the replay to start (19:28:59)

Returning to EM Database Control, I simply click on the Next button, and Oracle 11g acknowledges that it has taken control of any active database replay operations ...

Figure 2.4.8: Workload Replay: Controlling Replay Operations

... and that’s also reflected from within the terminal sessions themselves.

Wait for the replay to start (19:28:59)
Replay started (19:31:17)

Back at EM Database Control, the status of the database replay operation is constantly being monitored ...

Figure 2.4.9: Workload Replay: Monitoring Active Replay Operations

..; until the replay operation is completed.

Wait for the replay to start (19:28:59)
Replay started (19:31:17)
Replay finished (19:37:02)

Figure 2.4.10: Workload Replay: Completed Replay Operation

Phase 4: Regression Analysis

Oracle 11g gives me several tools for comparing the captured vs. replayed workloads. As shown in Figures 2.5.1 and 2.5.2 below, the most succinct comparison is the one that Database Replay provides immediately after a successful replay operation has completed:

Click for larger image

Figure 2.5.1: Replay Regression Analysis, Part 1

Click for larger image

Figure 2.5.2: Replay Regression Analysis, Part 2

As these summaries illustrate, it’s obvious that the changes I’ve made to the database’s objects have had an overall positive effect because the time it took to execute the captured workload is significantly longer than the time it took to replay the identical workload, and that immediately indicates that the system modifications have positively increased database throughput. It’s just as important to note, however, that there were no deleterious effects introduced. This is easiest to see in the second half of the regression analysis screen because there is absolutely no divergence between the generated data, and there were no unexpected errors generated.

Reporting Analyses

Finally, Database Replay offers several reports in HTML format that analyze the results of the completed database replay operation:

  • DB Replay Report. This report compares the execution of the captured workload to that of the replayed workload, and it searches for the source of any possible data and/or error regressions.
  • AWR Report. This report provides an Automatic Workload Repository (AWR) report that summarizes and analyzes the database’s overall performance between the database replay operation’s starting and ending time periods.
  • ASH Report. At an even lower level of detail, this report shows exactly which SQL statements and wait events caused the largest impact on database performance by looking at the contents of the database’s Active Session History (ASH) buffer during the execution of the replay operation.

Next Steps

In the final article in this series, I’ll expand on the scenario that played out in this article by tackling one of the thornier concerns most Oracle DBAs have for systems that are about to migrate to Oracle 11g: How will my current database perform in a brand-new environment? And to make it even more relevant and interesting, I’ll target a two-node Oracle 11g Real Application Cluster (RAC) clustered database as the testing environment.

References and Additional Reading

While I’m hopeful that I’ve given you a thorough grounding in the technical aspects of the features I’ve discussed in this article, I’m also sure that there may be better documentation available since it’s been published. I therefore strongly suggest that you take a close look at the corresponding Oracle documentation on these features to obtain crystal-clear understanding before attempting to implement them in a production environment. Please note that I’ve drawn upon the following Oracle Database 11gR1 documentation for the deeper technical details of this article:

B28274-01 Oracle Database 11gR1 Performance Tuning Guide

B28279-02 Oracle Database 11gR1 New Features Guide

B28419-02 Oracle Database 11gR1 PL/SQL Packages and Types Reference

» See All Articles by Columnist Jim Czuprynski

Mobile Site | Full Site