Oracle 10g DataPump, Part 2: Implementation

Thursday Apr 28th 2005 by Jim Czuprynski
Share:

Oracle 10g's new DataPump is designed as the eventual replacement for the Oracle Import and Export utilities. This article - the second and final in this series - focuses on using the DataPump's features in real-world scenarios, including creation of custom DataPump jobs with DBMS_DATAPUMP.

Synopsis. Oracle 10g's new DataPump is designed as the eventual replacement for the Oracle Import and Export utilities. This article - the second and final in this series - focuses on using the DataPump's features in real-world scenarios, including creation of custom DataPump jobs with DBMS_DATAPUMP.

The previous article in this series identified some of the deficiencies of the original Export and Import utilities, and then provided a broad overview of Oracle 10g's new DataPump utility. This article will concentrate on how to effectively utilize the DataPump for common import and export tasks, including how to create custom DataPump scheduled tasks that can be scheduled via the Oracle 10g Scheduler. I will start with some real-world scenarios that show how Oracle 10g's DataPump Export utility provides plenty of flexibility in handling typical export requirements.

DataPump Export Scenarios

Creating a Nightly Disaster Recovery Export Scheme. As part of my personal list of Oracle disaster recovery "best practices," I have made a habit of building a nightly disaster recovery export scheme for a new Oracle database shortly after the database's creation. In addition, whenever I create a new schema for an existing database, I make sure to add that schema to the existing nightly export scheme. At a minimum, I insure that at least the contents of all tables are exported regularly.

Setting up a nightly export scheme via the DataPump Export utility is relatively simple. Listing 2.1 shows the DataPump parameter file used to generate the export files. Since my target database for this example is relatively small, I am using the FULL parameter to generate an export that contains the entire contents of the database including all metadata, all database objects, and of course the data itself. I am also using the STATUS parameter to tell the DataPump to write the status of the ongoing operation to the log file at regular intervals - in this case, every 30 seconds. Log 2.1 shows the resulting log file generated during this DataPump Export operation.

Estimating an Export's Size. Export operations like the nightly disaster recovery export scheme I have just described can obviously use quite a bit of storage, especially for a large database. Thankfully, DataPump offers improved mechanisms for estimating an export's size. The ESTIMATE_ONLY parameter accepts a value of either BLOCKS (the default), which performs an estimate based on the number of physical blocks that will be accessed, or STATISTICS, which performs an estimate based on stored database statistics (providing, of course, that statistics have been generated for the objects that are being exported).

Listing 2.2 shows the DataPump parameter file that generates the probable sizes of the export files for the nightly disaster recovery export. Note that I have set ESTIMATE_ONLY to a value of STATISTICS, and that the DUMPFILE directive has been removed (and in fact will cause an error if included during an estimates-only operation). See Log 2.2 for the resulting log file generated during this DataPump Export operation.

Multiple Server Processes and Dump Files During Exports. Another common request I receive regularly involves producing a subset of a production database to refresh the same tables in a development database. For example, recently one of my developers asked me to export the data from just one schema from our production database so she could import that data into another database to use it for application regression testing. In addition, the developer wanted to save the data on alternate media - in this case, standard 650MB CD-ROMs - for archival purposes.

The new DataPump Export makes quick work of this type of request. To demonstrate, I will configure the FILESIZE directive to limit the file size to only 650 MB, and I will provide for multiple export dump files to be written by specifying the %U variable for the DUMPFILE parameter. By attaching %U to the file name, DataPump will automatically increment a number in the range of 01 to 99 each time that a new file is created.

To speed up this operation, I will specify the use of multiple server processes by setting the PARALLEL parameter to 2 to enable two DataPump worker processes to distribute the load. Oracle recommends that there are at least as many DUMPFILEs available as the number of PARALLEL processes specified.

Listing 2.3 shows the DataPump parameter file that generates these export files, and Log 2.3 contains the resulting log file generated during this operation.

Exporting Selected Tables. As part of our regular development cycle, I need to refresh data held in several tables for a database used by our HR department for training Payroll application users. This database contains a mere fraction of the total transactional data in our production databases - just enough to provide examples that match our current training class documents - but any outstanding payroll transactions need to be applied to this "baseline" information at the start of each training class.

To make quick work of this task, I will use DataPump Export to dump out just the data from the required tables by specifying them as values for the TABLES parameter. Listing 2.4 shows the corresponding DataPump parameter file to perform this operation, and Log 2.4 displays the resulting log file generated.

Exporting Different Versions of a Database Object. Occasionally a developer needs to see how data in a table was stored before an operation took place, or which version of a particular database object was in force at a particular date and time. DataPump Export can export an object as of a particular system control number (SCN) specified for the FLASHBACK_SCN parameter, or as of a particular time specified for the FLASHBACK_TIME parameter. (Note that these options are mutually exclusive.)

Listing 2.5 shows an example of a DataPump Export parameter file that would export all objects in the HR schema while reflecting the state of those objects as of April 10, 2005 just before midnight.

DataPump Import Scenarios

Oracle 10g's DataPump Import utility also demonstrates its power and flexibility when it is aimed at some real-world data and object import requirements:

Importing From Multiple DataPump Export Dump Files. DataPump Import automatically uses all available export dump files during an import operation as long as the dump file name qualifier specified for the DUMPFILE parameter matches that of the desired export files (including that handy %U variable).

To illustrate, I will use the export files I created via the example in Listing 2.4 as the data source. I will target those few tables that need to be refreshed in the Human Resources schema for the Payroll application refresh scenario. Listing 2.6 shows the DataPump Import control file used to accomplish this, and Log 2.6 contains the resulting log file generated.

Importing Selected Objects. Just as the original Import utility did, DataPump Import can load data from an existing DataPump Export dump file into existing tables. DataPump Import has enhanced this capability by providing a more robust set of parameters, INCLUDE and EXCLUDE, to specify limits on which objects should be imported.

Listing 2.7 shows the DataPump Import control file that illustrates this feature. I will use the disaster recovery nightly export dump file I generated in a previous example as the source for reloading data into the Sales History (SH) schema. I have specified the tables I want to reload in the INCLUDE parameter, and I have also instructed DataPump Import to first truncate the tables before loading them by specifying a value of TRUNCATE for the TABLE_EXISTS_ACTION parameter.

Log 2.7 contains the resulting log file from this import operation. Note that the DataPump wisely did not truncate any tables upon which a foreign key constraint depended, and gently reminded me of this fact by logging an error message during the attempted TRUNCATE operation.

Generating SQLFILES Instead of Importing Data. DataPump Import also offers the ability to save the resulting DDL statements that the DataPump would have executed to create the objects and tables in the export file to a SQLFILE.

To demonstrate, I will set up a DataPump Import control file (shown in Listing 2.8) that redirects the DDL to a SQLFILE stored within the export directory; Log 2.8 shows the SQLFILE output that results from this type of operation. The SQLFILE output produced is virtually identical to the type of metadata output generated from DBMS_METADATA.GET_DDL when that function is run against a database's objects - not surprising, of course, since DataPump relies on that supplied PL/SQL procedure to generate all metadata during export and import operations.

Remapping Database Object Attributes During DataPump Imports. Another handy feature of DataPump Import is the ability to remap database tables and objects stored in one schema to another completely new schema. See Listing 1.4 from the prior article in this series for an example of how to implement this feature.

Monitoring and Controlling DataPump Jobs

As mentioned in the prior article, the DataPump utility provides a new interactive mode that gives an Oracle DBA extensive control over DataPump Export and Import operations.

Switching Between Logging and Interactive Client Mode. To illustrate, I will initiate a long-running DataPump Export based on the parameter file shown in Listing 2.9:

C:\RMANCMD>expdp system/oracle parfile=c:\rmancmd\longexport.dpectl

Export: Release 10.1.0.2.0 - Production on Saturday, 16 April, 2005 12:44

Copyright (c) 2003, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SYSTEM"."LONGEXPORT":  system/******** parfile=c:\rmancmd\longexport.dpectl
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA

Now I will immediately switch this job to interactive client mode by typing CTRL-C:

Export>

Viewing Job Status. From interactive mode, I can issue the STATUS command to view the job's current status:

Export> status
Job: LONGEXPORT
  Operation: EXPORT
  Mode: FULL
  State: EXECUTING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: c:\oracle\export_dir\long_export_%u.dmp
    size: 262,144
  Dump File: C:\ORACLE\EXPORT_DIR\LONG_EXPORT_01.DMP
    size: 262,144
    bytes written: 4,096

Worker 1 Status:
  State: EXECUTING

Closing and Reattaching To A Job. To switch from interactive client mode back to logging mode, I can issue the CONTINUE_CLIENT command, and if I want to detach from a client session, but leave a running job still executing, I can issue the EXIT_CLIENT command:

Export> exit_client
C:\RMANCMD>

Likewise, to reattach to a running job after closing the client connection, I can reopen a session while specifying the ATTACH=<job_name> parameter on the command line:

C:\RMANCMD>expdp system/oracle attach=longexport

Export: Release 10.1.0.2.0 - Production on Saturday, 16 April, 2005 13:03
Copyright (c) 2003, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

Job: LONGEXPORT
  Owner: SYSTEM
  Operation: EXPORT
  Creator Privs: FALSE
  GUID: 0CC46AD48B4B4C5AB17B9CBD2C85810C
  Start Time: Saturday, 16 April, 2005 13:03
  Mode: FULL
  Instance: zdcdb
  Max Parallelism: 1
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        system/******** parfile=c:\rmancmd\longexport.dpectl
     DATA_ACCESS_METHOD    AUTOMATIC
     ESTIMATE              BLOCKS
     INCLUDE_METADATA      1
     LOG_FILE_DIRECTORY    EXPORT_DIR
     LOG_FILE_NAME         LongExport.explog
     TABLE_CONSISTENCY     0
  State: IDLING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: c:\oracle\export_dir\long_export_%u.dmp
    size: 5,242,880
  Dump File: c:\oracle\export_dir\long_export_01.dmp
    size: 5,242,880
    bytes written: 4,096

Worker 1 Status:
  State: UNDEFINED
Export>

Controlling A Job's Execution. If I decide to add more files to the job, I can do that via the ADD_FILE command, and if I want to add more worker processes, I can issue the PROCESSES command:

Export> status

Job: LONGEXPORT
  Operation: EXPORT
  Mode: FULL
  State: EXECUTING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: c:\oracle\export_dir\long_export_%u.dmp
    size: 262,144
  Dump File: C:\ORACLE\EXPORT_DIR\LONG_EXPORT_01.DMP
    size: 262,144
    bytes written: 4,096

Worker 1 Status:
  State: EXECUTING

Export> parallel=2

Export> status

Job: LONGEXPORT
  Operation: EXPORT
  Mode: FULL
  State: EXECUTING
  Bytes Processed: 0
  Current Parallelism: 2
  Job Error Count: 0
  Dump File: c:\oracle\export_dir\long_export_%u.dmp
    size: 262,144
  Dump File: C:\ORACLE\EXPORT_DIR\LONG_EXPORT_01.DMP
    size: 262,144
    bytes written: 4,096

Worker 1 Status:
  State: EXECUTING

Worker 2 Status:
  State: WORK WAITING

I can also halt the execution of the job via the STOP_JOB command:

Export7gt; stop_job;

At this point, I have two options. If I want to restart the job, I can reattach to the job and issue the START_JOB command. The job will resume at its current point. On the other hand, if I want to end the job -- perhaps because it is running too long, was executed in error, or is consuming too much space - I can reattach to the session and then issue the KILL_JOB command:

C:\RMANCMD>expdp system/oracle attach=longexport

Export: Release 10.1.0.2.0 - Production on Saturday, 16 April, 2005 13:03

Copyright (c) 2003, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

Job: LONGEXPORT
  Owner: SYSTEM
  Operation: EXPORT
  Creator Privs: FALSE
  GUID: 0CC46AD48B4B4C5AB17B9CBD2C85810C
  Start Time: Saturday, 16 April, 2005 13:03
  Mode: FULL
  Instance: zdcdb
  Max Parallelism: 1
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        system/******** parfile=c:\rmancmd\longexport.dpectl
     DATA_ACCESS_METHOD    AUTOMATIC
     ESTIMATE              BLOCKS
     INCLUDE_METADATA      1
     LOG_FILE_DIRECTORY    EXPORT_DIR
     LOG_FILE_NAME         LongExport.explog
     TABLE_CONSISTENCY     0
  State: IDLING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: c:\oracle\export_dir\long_export_%u.dmp
    size: 5,242,880
  Dump File: c:\oracle\export_dir\long_export_01.dmp
    size: 5,242,880
    bytes written: 4,096

Worker 1 Status:
  State: UNDEFINED

Export> kill_job
Are you sure you wish to stop this job ([y]/n): y
C:\RMANCMD>

What to Do If a DataPump Job Crashes

Each DataPump job uses a master table that is created within the schema of the user account that is executing the job to control the requested operation's activities, record its current status, and maintain its restartability. The master table is named the same as the DataPump operation being executed, even if the job name has been generated automatically (the default behavior if no job name is specified via the JOB_NAME parameter).

If a DataPump job should crash, however, this master table will remain behind in that schema. Oracle will not permit another instance of that job to be started until this master table is removed from the schema via the DROP TABLE <job_name>; command.

Building a Custom DataPump Job

As I mentioned in my prior article, DataPump utilizes three components to do its work: a command-line interface to control DataPump operations; DBMS_METADATA, for dumping and processing DDL; and the DBMS_DATAPUMP PL/SQL package that is supplied with Oracle 10g Enterprise that actually performs DataPump Export and Import operations. One really neat byproduct is that it is now possible to create a stored procedure or package that calls DBMS_DATAPUMP procedures and functions directly to create and manage DataPump jobs.

To illustrate, I will create a new stored procedure named HR.SP_EXPORT. This new procedure will control a DataPump Export job that dumps out all metadata and data for all tables and their dependent objects in the Human Resources (HR) schema. Most of the DBMS_DATAPUMP procedure and function names are extremely similar to those used in DataPump interactive mode, so I found that coding this simple example to be quite intuitive (but not trivial).

Listing 2.10 shows the stored procedure source code I used to implement this example, and Log 2.10 shows the resulting DataPump Export log file generated when this export session was executed.

Since I can create virtually any DataPump task as a stored procedure or package, this also grants me the flexibility to schedule this task using the new Oracle 10g Scheduler. Here is an example of how to schedule this new stored procedure to execute on a nightly basis at 23:00 every day:

BEGIN
	DBMS_SCHEDULER.CREATE_JOB (
		 job_name => 'HR_EXPORT'
		,job_type => 'PLSQL_BLOCK'
		,job_action => 'BEGIN HR.SP_EXPORT;END;'
		,start_date => '04/18/2005 23:00:00.000000'
		,repeat_interval => 'FREQ=DAILY'
		,enabled => TRUE
		,comments => 'Performs HR Schema Export nightly at 11 PM'
    );
END;
/

For more information on the Oracle 10g Scheduler, please review my earlier series on this set of scheduling tools.

Conclusion

As these previous practical examples have shown, Oracle 10g's DataPump Export and Import utilities offer an impressive range of control over export and import operations than ever before, The new interactive command mode available during DataPump operations grants an Oracle DBA a high level of control over DataPump tasks. Finally, Oracle DBAs can create customized DataPump jobs by calling DBMS_DATAPUMP methods and attributes directly, and these jobs can be scheduled via the Oracle 10g Scheduler. These new capabilities are a welcome addition to any Oracle DBA's "tool belt" and offer significant alternatives to the original Import and Export utilities.

References and Additional Reading

While there is no substitute for direct experience, reading the manual is not a bad idea, either. I have drawn upon the following Oracle 10g documentation for the deeper technical details of this article:

B10750-01 Oracle Database New Features Guide

B10825-01 Oracle Database Utilities

» See All Articles by Columnist Jim Czuprynski

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved