Oracle 10g DataPump, Part 1: Overview

Friday Mar 25th 2005 by Jim Czuprynski
Share:

Oracle 10g's new DataPump utility is designed as the eventual replacement for the original Oracle Import and Export utilities. This article - the first in a series - provides an overview of the DataPump's suite of tools for extracting, transforming, and loading data within an Oracle database.

Synopsis. Oracle 10g's new DataPump utility is designed as the eventual replacement for the original Oracle Import and Export utilities. This article - the first in a series - provides an overview of the DataPump's suite of tools for extracting, transforming, and loading data within an Oracle database.

Oracle's Import and Export utilities offer a great way to create, retain, and reload a point-in-time snapshot of a database's structure and contents at the table, schema, tablespace, or database level. Database exports also have a place as a potential third line of defense in an in-depth disaster recovery plan (please view my article on Becoming A Master of Disaster for a more detailed discussion of this concept).

I have also used these utilities on a regular basis to migrate data from our production environment to multiple development environments, but as reliable as these utilities are, they do have some drawbacks that needed to be addressed:

Operations are difficult to restart. If an export session died because it ran out of disk space, or if the session was terminated deliberately or accidentally, the only option is to restart the entire export from the beginning, or to create a new export control file at the point of failure and initiate the export again.

Execution is client-side and single-threaded. The Import and Export utilities use a client process to execute their respective operations. This means that the most powerful source of information - the database server itself - is unavailable to help control the Import or Export operation. Also, there is no way to utilize parallel processing on a database server while exporting data, so each Export operation is essentially one single, serial, potentially long-running task.

Dump files can grow enormously. The Export utility creates one or more export dump files during each session; during a complete dump of a database, these files can obviously grow quite large. Since it is difficult to estimate the approximate expected size of the completed export files, the operation may fail due to disk space constraints. Also, to import just one table or database object from a multi-schema export file, the Import utility has to read all dump files to locate the object before it can be restored - obviously not an optimal situation!

Tuning mechanisms are limited. Methods to tune a poorly executing Export or Import operation are relatively limited and somewhat arcane. Experimenting with different settings for the BUFFER parameter -- which controls the maximum number of rows fetched into the Export utility's buffer at one time - is one of the few recommended options for improving performance.

Progress monitoring is difficult. While a DBA can certainly monitor the command session that is running the operation, or review the spooled output from an operation, there is no simple way to tell exactly how efficiently an Export or Import operation is performing, and there is no simple way to tell exactly what percentage of a large operation is completed.

Database object filtering features are limited. While it is certainly possible to export and import selected individual tables from a database or schema, filtering features are relatively limited. For example, it is impossible to export all tables with common names across schemas within the same export session. It is also impossible to select specific database object types other than tables (like packages, stored procedures, or triggers) for inclusion in an export dump file.

Enter The DataPump: Features Overview

Fortunately, Oracle 10g has addressed many of these issues with the DataPump, a new set of utilities that significantly expands the capabilities of the original Export and Import utilities.

Server-side parallel execution. DataPump Import and Export operations are scheduled and processed within the database server itself as a DataPump job - a much more effective method than client-side execution. In addition, if the database server's OS and hardware can adequately support it, DataPump export operations can take advantage of parallel processes to write out multiple data streams simultaneously.

Improved control and restartability of operations. Since the database server controls the DataPump, a DBA can start an Export or Import operation and then detach from it, allowing the operation to run to completion without further intervention. The DBA can also reconnect to the operation from another command-line session, interrogate its current status and choose to pause the operation if necessary. And if it appears that the operation is dangerously close to failing - for example, because of insufficient disk space for an export file - the DBA can even add more disk files "on the fly" without canceling the existing operation. This provides a level of control at least an order of magnitude more powerful than the current Export/Import utilities.

Simplified monitoring of operational status. DataPump provides a new STATUS parameter that shows the status of an ongoing Export or Import operation. When STATUS is specified at operation startup, Oracle will report the operation's status to the terminal session of the command-line interface at the specified interval. The DBA can also reattach to a detached DataPump job, issue a STATUS request, determine how far the operation has progressed, and then detach from the job again.

Automated performance tuning. Unlike the traditional Export and Import utilities, DataPump implements self-tuning mechanisms to insure that tables are exported and imported in the most efficient manner. Gone are the BUFFER, COMMIT, COMPRESS, CONSISTENT, DIRECT, and RECORDLENGTH parameters.

Improved database object filtering. DataPump implements filtering of the database objects to capture during an Export operation or include as part of an Import operation with a new set of mutually-exclusive parameters, INCLUDE and EXCLUDE. These parameters allow the DBA to specify almost any database object for inclusion or exclusion based on its object type and object name. This dramatically improves flexibility when exporting and importing; for example, it is now possible to gather objects with similar names across an entire database.

Export dump file control enhancements. In prior releases, the DBA could specify more than one dump file as a destination for an Export session; however, once the session commenced, no additional files could be specified. DataPump Export improves upon this by permitting parameterized file names similar to the naming conventions for redo logs and RMAN backup files.

DataPump Concepts

The DataPump is the next logical progression in a series of improvements that have appeared during the last few Oracle releases. The two most dramatic upgrades involve the ability to read data extremely quickly via direct path unloading, and the capability to write data directly to external tables.

Direct Path Unloads. Prior releases of Oracle had introduced the concept of the direct path load, which permits the server to bypass SQL parsing of INSERT statements so that data can be inserted directly and with blinding speed into database tables, providing the tables met certain conditions. Building on this concept, Oracle 10g adds the capability to perform direct path unloads of database tables within the same limits that apply for direct path loads.

Writing To External Tables. Oracle 9i Release 2 introduced the ability to read data from files stored outside the database as external tables using existing Oracle SQL*Loader access methods. Oracle 10g expands upon this feature by including a new access method, ORACLE_DATAPUMP, that allows the server to write directly to an external file.

DataPump Export uses these two enhancements to unload data extremely quickly. Even better, DataPump needs virtually no assistance in deciding which method is fastest for exporting data, and is relatively self-tuning. For example, the concept of direct vs. conventional exports is now meaningless, and several of the tuning parameters (e.g. BUFFER, RECORDLENGTH) have been deprecated.

The DataPump engine will choose to write to external tables over direct path unload whenever a table meets any of the following criteria:

  • The table is part of a cluster.
  • There is an active trigger on the table.
  • Fine-grained access control is enabled for the table for INSERTs.
  • One of the table's columns has a referential integrity constraint.
  • The table has an LOB column, and that LOB column has a domain index.
  • The table contains a BFILE column or VARRAY column, and those column(s) have an embedded, opaque TYPE.

DataPump Components

DataPump consists of three components that work in concert to manage and perform operations:

Command-Line Interface. Like the original Export (EXP.EXE) and Import (IMP.EXE) command-line interfaces, the DataPump provides two command-line interfaces, EXPDP.EXE and IMPDP.EXE, for controlling DataPump Export and Import operations, respectively.

DataPump also expands the command-line interface's capabilities by providing Interactive-Command Mode. This mode allows the DBA to start a DataPump operation and then disconnect from it by simply issuing a CTL+C keystroke. Later, the DBA simply opens another DataPump session while specifying the operation's job name and then issues the ATTACH directive to reconnect to the session. The DBA can then issue a series of commands, from taking a pulse of the operation via the STATUS directive, to adding more export dump files, or even terminating a long-running operation if desired.

DBMS_METADATA. Introduced in Oracle 9i, this PL/SQL supplied package provides methods to extract and format metadata - literally, "information about information" - from an Oracle database. At its core, DBMS_METADATA stores metadata in XML format for ultimate flexibility in its presentation of that metadata.

DataPump Export uses DBMS_METADATA to gather all metadata about the database objects being processed then stores that metadata in the Export operation's master table. DataPump Import uses the corresponding master table to extract information from the export dump file(s), and then uses DBMS_METADATA to create the DDL statements it needs to create new database objects as part of the Import operation.

DBMS_DATAPUMP. At the heart of the DataPump is the new DBMS_DATAPUMP PL/SQL supplied package. This package contains methods for exporting data from and importing data into any Oracle database. EXPDP and IMPDP are actually making calls to this new package to perform their respective operations.

What is especially powerful about DBMS_DATAPUMP is that a DBA can also utilize it directly, either in anonymous PL/SQL blocks or within a stored procedure, stored function, or package, to create custom DataPump Export and Import jobs. I will demonstrate how to accomplish this in the final article in this series when we look at some of the other advanced features of the DataPump.

DataPump Jobs, Master Processes, and the Master Table

DataPump also implements major (and in my opinion, elegant!) improvements to the execution and management of Export and Import operations. Each DataPump operation uses a master process to manage and control the export or import session. Each master process is identified by a unique job name, and DataPump automatically assigns one by default if one is not specified at execution time.

The master process controls the DataPump operation by invoking at least one worker process that actually performs the Export or Import. (This is extremely similar to the way Recovery Manager (RMAN) manages backup, restoration and recovery processing.) If the DataPump operation has requested additional parallel processes for multiple-threaded processing, more than one worker process will be created and managed.

The master process also creates a master table as a user table within the user account in which the DataPump job is invoked. This master table is always named the same as the DataPump job, and it is used slightly differently depending on the type of DataPump operation, but it is always created within the same user account that is invoking the operation.

During an Export operation, the master table is created when the job commences, contains all target objects for the Export, and is retained until the job finishes successfully, at which point it is dropped. However, if the job is paused or fails due to error, the Export process uses the master table to determine what objects still need to be exported and how best to resume the job. Once the job completes successfully, the master table's contents are written to the Export file itself for use by an Import process.

On the other hand, an Import operation reads the master table from the Export dump file and then creates the master table under the appropriate user account. During an Import, the master process uses the master table to decide which database objects need to be created and in what order before any data is actually imported.

A Simple DataPump Export Scenario

To demonstrate an example of how easy it is to use DataPump for exporting data, I will create a dump file containing all database objects stored in the Human Resources (HR) schema for use in a later DataPump Import operation. (The next article will greatly expand upon the capabilities of DataPump Export, I assure you!)

Listing 1.1 shows how to create a DIRECTORY database object for storage of all DataPump dump files. DataPump also uses the DIRECTORY object as a default location for log files and parameter files.

Listing 1.2 provides a set of queries to display listings of what object types can be processed with the DataPump, and whether a filter can be applied against the object type at the database, schema, or table level.

Listing 1.3 shows the DataPump Export command issued, its corresponding parameter file, and the log file that documents the results of the DataPump Export operation.

A Simple DataPump Import Scenario

Using DataPump to import data is equally simple. To illustrate, I have constructed the following scenario:

  • The DataPump Export dump file created in the previous scenario will be used to import data into a new schema, HR_OLTP, that will eventually become the target of a new online transaction processing application under development.
  • The new schema needs all of the tables from the existing HR schema and those tables' related objects, i.e. referential constraints, indexes, sequences, and triggers.
  • No other database objects should be loaded to this new schema from the existing HR schema.

See Listing 1.4 for the SQL to create the new schema, the DataPump Import command issued, its corresponding parameter file, and the log file that documents the results of the DataPump Import operation.

Monitoring DataPump Operations

Oracle 10g provides two new views, DBA_DATAPUMP_JOBS and DBA_DATAPUMP_SESSIONS that allow the DBA to monitor the progress of all DataPump operations. Listing 1.5 shows two queries that return valuable information about ongoing DataPump jobs and sessions.

Conclusion

Oracle 10g's new DataPump offers significant improvements over the original command-line-based Export and Import utilities, including multi-threaded processing, improved object filtering capabilities, and restartability. The next article in this series will dive deeply into utilizing the DataPump effectively to accomplish some typical (and not so typical!) "real-world" export and import tasks.

References and Additional Reading

While there is no substitute for direct experience, reading the manual is not a bad idea, either. I have drawn upon the following Oracle 10g documentation for the deeper technical details of this article:

B10750-01 Oracle Database New Features Guide

B10825-01 Oracle Database Utilities

» See All Articles by Columnist Jim Czuprynski

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved