Deleting Historical Data on Oracle Databases

Friday Aug 29th 2003 by Marin Komadina
Share:

Archiving historical database data is an important, complex and in most cases forgotten task of database projects. Marin Komadina examines the problem of deleting historical data and presents some practical tips on how to execute and check the deleting performance.

Archiving historical database data is an important, complex and in most cases forgotten task of database projects. Historical data has to be extracted and deleted from the database as a part of the archiving procedure. Extracted data is kept for several years as archived data, while the database retains only the actual, active set of data. If we do not archive, the data volume will increase, and so will the project costs. Just buying a new disk is not a permanent solution for handling expanding archive data. An Oracle DBA must understand the application model before stepping into implementing an archive procedure. A simple archive plan has the following basic steps:

  1. Prepare data to be archived
  2. Extract and archive data to be deleted
  3. Delete historical data from database

In this article, we will examine the problem of deleting historical data and present some practical tips on how to execute and check the deleting performance.

  • Highwather Mark and Delete Operation
  • Delete Operation Measurements
  • Code Sample for Measurement Test
  • Final Recommendations
  • Conclusion

Highwater Mark and Delete Operation

Highwater Mark (HWM)

For each object, Oracle maintains a record of the highest relative block used to hold data.

HWM is the last data block formatted to receive data, not necessarily carrying data. The table highwater mark (HWM) is the number stored in the table segment header. Adding new data to the table moves HWM to the next block, or to the last used block. Oracle optimizer relies on the HWM during calculation for the full table scan, and on full table scans, Oracle optimizer will always scan all database blocks
until the HWM pointer. This highwater mark is maintained in multiples of five blocks. Oracle has provided a way to find HWM for the table using the DBMS_SPACE package, which is not dependent on the analyzed status of the table. Deleting data from the table leaves the HWM unchanged. The next table insert will use the first free block on the free block list, which is beyond the highwater mark causing the table to continue growing.

Delete Operation

Delete command example:

SQL > delete * from artist.tb1 where INSERT_DATE > TO_CHAR('12-05.2001');

SQL > commit;

Click for larger image

The delete operation will scan the table until it reaches the HWM (Highwater Mark) position and afterwards will remove records which match the desired where condition. The deletion operation will not change the HWM point and the next full table scan will scan again until the HWM, regardless of whether data exists in the table or not. All the previous extent remains allocated for the table. When the matching column has an index on it, optimizer will most likely scan only the index for the matching records.

All deleted entries from the index tree will be marked as deleted but still retain allocated space in the index tree.

Oracle's internal mechanism will generate the rollback content to provide backup information for deleted table data in case the operation is cancelled. On commit, deletions are marked as permanent. The database changes will be logged in the redo log file and transferred in the rollback segments.

Delete Operation Measurements

For making performance measurements I choose Oracle database 9.0.1 Enterprise Edition on the Sun Solaris operating system. Let's assume in database schema ARTIST we have one big table ARTIST_TEST with 19856 records. From this table we are going to delete 6768 records using following syntax:

delete from artist_test where SECOND_COLL='JAVA CLASS';

The table has only one index that is never dropped or rebuilt. Before running every test we will make regular compute-analysis of the table.

SQL code to rebuild table and prepare system for each measurement:

SQL> create table artist_test (
  2  no  number, first_col number,  second_coll varchar2(1000),  rest  varchar2(1000), 
  3  constraint artist_pk primary key (no)) tablespace artist
  4  /
Table created.
SQL> insert into artist_test select  rownum,OBJECT_ID,OBJECT_TYPE,rpad('H',100) from all_objects;
19856 rows created.
SQL> commit;
Commit complete.
SQL> analyze table artist_test compute statistics; 
Table analyzed.
SQL> alter system flush shared_pool;
System altered.

The Command ANALYZE TABLE will provide fresh information for the optimizer and the command ALTER SYSTEM FLUSH SHARED_POOL will erase all previously cached statements from the shared pool. During a test, there is regular activity on the database that simulates the real production system. To obtain as much precise information for own session activity, a special view has been created named "V$ROGER" under sys account. This view is just a link to the regular database dynamic performance view "V$MYSTAT" provided by Oracle for collecting local session information.

SQL> create or replace view v$roger as select sn.name,ms.value 
	from v$mystat ms,v$statname sn where ms.statistic# = sn.statistic#;
View created.
SQL> create public synonym v$roger for v$roger
Synonym created.
SQL> grant select on v$roger to public;
Grant succeeded.

The delete statement statistics are generated using special event 10046. Setting this event will start tracing with level 8, generating in the trace file standard SQL_TRACE information, bind values and wait events. Short description for that event:

SQL> alter session set events '10046 trace name context forever, level 8';
Session altered. 

10046 EVENT levels: 
        1  - Enable standard SQL_TRACE functionality (Default) 
        4  - As Level 1 PLUS trace bind values 
        8  - As Level 1 PLUS trace waits 
             This is especially useful for spotting latch wait etc. 
             but can also be used to spot full table scans and index scans. 
        12 - As Level 1 PLUS both trace bind values and waits

A trace file will be processed with regular Oracle tool TKPROF using following syntax:

# tkprof artist_ora_23641.trc sql_cost3.prf sort=exeela,fchela explain=artist/artist

Timing information is generated using the database procedure GET_TIME from the DBMS_UTILITY package provided by Oracle.

time_before:=DBMS_UTILITY.GET_TIME;
time_after:=DBMS_UTILITY.GET_TIME;
time_elapsed:=time_after-time_before;


Code Sample for Testing



Test Case 1 - Basic Delete

delete from artist_test 
    where SECOND_COLL='JAVA CLASS';


Test Case 2 - Delete with PLSQL and forall loop, committing only once

DECLARE 
TYPE artist_value IS TABLE OF VARCHAR2(1000) INDEX BY BINARY_INTEGER; 
del_value artist_value; 
rowkey PLS_INTEGER; 
BEGIN 
SELECT rowid BULK COLLECT INTO del_value FROM artist_test WHERE SECOND_COLL='JAVA CLASS'; 
IF del_value IS NOT NULL THEN 
FORALL rowkey IN del_value.FIRST..del_value.LAST 
DELETE artist_test WHERE rowid = del_value(rowkey); 
commit;
END IF; 
END;
/

Test Case 3 - Delete using anonymous PL/SQL block, looping through the cursor, and deleting one tow at a time. Loop has counter, programmed to trigger commit after each 1000 records.

SQL> declare
        rec_read       number(6)    := 0;
        rec_rest       number(6)    := 0; 
        vc_var_out   varchar2(41) := 'Delete Archive Data';
        cursor rec_sql is select SECOND_COLL,rowid from artist_test 
			where SECOND_COLL='JAVA CLASS' order by SECOND_COLL,rowid;
begin
       for rec_sql_run in rec_sql loop
            rec_read := rec_read + 1 ;
            rec_rest := rec_rest + 1 ;
            delete from artist_test A where A.rowid = rec_sql_run.rowid ;
            if rec_rest > 1000 THEN
                 COMMIT;
                 rec_rest := 1;
                 dbms_output.put_line('COMMIT AT..'|| rec_read);
             END IF;
        end loop;
end;
/  

Test Case 4
a.) Deleting using PL/SQL stored procedure, commit each 1000 records. ( exec delete_table(1000))
b.) Deleting using PL/SQL stored procedure, commit only once ( exec delete_table(100000))

CREATE OR REPLACE PROCEDURE delete_table (commit_size in number default 1000) is
  sel_id      INTEGER;
  del_id      INTEGER;
  exec_sel    INTEGER;
  exec_del    INTEGER;
  cur_stmt    VARCHAR2(2000);
  del_rowid   ROWID;
BEGIN
  cur_stmt := 'SELECT rowid FROM artist_test where SECOND_COLL=''JAVA CLASS''';
  sel_id := DBMS_SQL.OPEN_CURSOR;
  DBMS_SQL.PARSE(sel_id,cur_stmt,dbms_sql.v7);
  DBMS_SQL.DEFINE_COLUMN_ROWID(sel_id,1,del_rowid);
  exec_sel := DBMS_SQL.EXECUTE(sel_id);
  del_id := DBMS_SQL.OPEN_CURSOR;
  DBMS_SQL.PARSE(del_id,'delete from artist_test where rowid = :del_rowid',dbms_sql.v7);
 LOOP
   IF DBMS_SQL.FETCH_ROWS(sel_id) >0 THEN
      DBMS_SQL.COLUMN_VALUE(sel_id,1,del_rowid);
      DBMS_SQL.BIND_VARIABLE(del_id,'del_rowid',del_rowid);
      exec_del := DBMS_SQL.EXECUTE(del_id);
      if mod(dbms_sql.last_row_count,commit_size) = 0 
      then
     COMMIT;
      end if;
   ELSE 
       exit;
   END IF;
 END LOOP;
  COMMIT;
  DBMS_SQL.CLOSE_CURSOR(sel_id);
  DBMS_SQL.CLOSE_CURSOR(del_id);
END;
/

Test Case 5 - Delete using standard anonymous PL/SQL block, using execute immediate function and committing only once.

SQL > declare
    proc_rows number := 0;
  begin
    execute immediate 'begin 
                        delete from artist_test where SECOND_COLL=''JAVA CLASS''; 
                        :x :=sql%rowcount;
                        commit;
                        end;' 
    using OUT proc_rows;
    dbms_output.put_line('Records processed........'|| proc_rows);
end;
/

Final Recommendations

The result from tracing files, the v$roger session performance view and from the timing information obtained using DBMS_UTILITY.GET_TIME procedure was collected and sorted in the following table:

Delete Test

Elapsed Time (sec)

Execute Count

Parse Count

Parse Count Hard

Recursive Calls

Enqueue requests

Sorts (memory)

Basic Delete

35,75

156

87

29

2839

33

61

PL/SQL Loop forall

26,94

296

168

36

5008

46

109

PL/SQL Loop
commit 1000

21,25

7219

342

36

18873

302

67

Stored Procedure
commit once

15,52

7105

197

42

25830

54

118

Stored Procedure
commit 1000

12,03

7100

197

38

25700

50

114

Execute Immediate

9,87

298

162

42

5084

50

113

Elapsed Time - real time needed to finish test

Execute Count - number of execution recorded for the session

Parse Count - total number of parsing calls for the session (soft and hard)

Parse Count Hard - number of hard parsing calls for the session

Recursive Calls - number of recursive calls for the session

Enqueue Requests - number of locking requests for he session

Sorts (memory) - number of sorts in memory for the session

The best result was using EXECUTE_IMMEDIATE calls inside an anonymous PL/SQL block, during test case 5. This is not surprising, since Oracle suggests this method for mass data deletion. The EXECUTE IMMEDIATE call came first time in version Oracle8i. It is supposed to be a replacement for DBMS_SQL package. EXECUTE_IMMEDIATE calls parses and immediately executes a dynamic SQL statement or a PL/SQL block, which is preferred and faster than other coding tests.

I did not look in the tests for other potential problems, rising CPU usage and disk I/O utilization due to mass delete action. The assumption was that we were going to delete from the big table via a regular batch job, on system offloaded from regular users. In our test case, ARTIST_TEST table was not partitioned and distributed
over different disk devices, which is an additional point to consider for performance
improvements.

Every DBA has to test all existing methods and find one that best suite.

Some last recommendations to speed a delete operation:

  • do not delete all the data in one shot ; break data into chunks and commit in between
  • delete indexes before and recreate after
  • for tables with many deletes, try to avoid bitmap indexes which cause locking problems and performance degradation 
  • use parallel DML when possible
  • check for index on the FK columns, which have to be locked when a row is deleted from the parent (each row deleted in the Master is causing a full table scan in one or both of the cascaded tables.)
  • deleting using ROWID (fetching the entire ROWID result set into a cursor)

Conclusion

Oracle 9i has several new and improved features aimed at making the availability, manageability and serviceability of the database easier. Beside Oracle 9i's strong partition orientation, at least 1/5 of the running Oracle databases have no implemented partitions.

Deleting historical data at the end of month becomes a question of existence:

"To delete old data and continue working or close the business due to bad performance and high costs."

» See All Articles by Columnist Marin Komadina

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved