Creating NOT NULL Columns in Huge Oracle Tables

Thursday Jun 10th 2004 by Amar Kumar Padhi
Share:

Databases are often taxed by applying SQL statements to enormous tables. One such activity is when we add a new NOT NULL column with default value to a huge transaction table. Amar Padhi shares several methods to add a new column with default value, comparing the advantages and disadvantages of each.

Databases are often taxed by applying SQL statements to enormous tables. One such activity is when we add a new NOT NULL column with default value in a huge transaction table. By 'huge' I mean the number of records. I will discuss here, the addition of a new column with default value specifically; however, the methods discussed below can be used for other kinds of batch processing also.

The main concern in performing such activities is to reduce the downtime as well as structural changes (privileges, synonyms, exporting/importing objects, rollback segments, temporary tablespace etc.). Sometimes we also need to focus on reducing resource utilization (if the process is required to run online!).

If the concerned table has records numbering in the thousands, a direct statement would suffice in an acceptable time frame. However, if the concerned table has records numbering in the millions, then the updates need to be revisited and the database settings for undo segments and temporary tablespace sizes need to be considered. A huge DML/DDL activity would take lot of time and would result in space usage problems and heavy resource utilization, hence such batch processing is normally scheduled in off peak hours.

There are many ways of creating a column with a default value or updating an existing column. Some of the options used to speed up heavy DML jobs are: using parallel processing, the export/import utility, partitions or by simply breaking the activity into multiple jobs. Below is comparison of some of the methods that are commonly used.

For examples in this article, I will make use of a table, MTL_TRX, present in my database with 2.2 million records. It has a unique index on the TRX_ID column. I have also specified the approximate time taken by each method; this may vary according to setup.

Direct Column addition

Look at the following statement.

SQL> alter table mtl_trx add dumind varchar2(1) default 'N' not null;

Adding a new column with a default value takes considerable time. Here Oracle not only creates the column but also updates it with the default value; this will generate redo/undo information. Most often, such statements on massive tables will either hang or abort with an error.

The above statement took about 30-35 minutes to execute on my system (excluding statistics generation) with properly sized undo segments.

Advantage: No structural/system changes are done apart from the column addition, perfect for tables that are not huge!

Disadvantage: Resource intensive, time consuming, undo segment errors quite common, no commits possible at intervals.

The methods that follow are intended as a work around for preventing the above disadvantages.

Using an existing index

We can make use of an existing column that has an index on it for creating a not null column. By using an index, the workload can be split in multiple updates.

SQL> alter table mtl_trx add chrind varchar2(1);

Table altered.

SQL> update mtl_trx
  2  set    chrind = 'L'
  3  where  trx_date < '01-jan-04';

SQL> commit;

SQL> update mtl_trx
  2  set    chrind = 'L'
  3  where  trx_date >= '01-jan-04' and trx_date < '01-feb-04';

and so on..

Once the above updates are complete, the column can be marked as not null. The complete process takes around 24-26 minutes; this can be further reduced by running the updates in different sessions. This option is also used with partitions and in multiple job processing as covered subsequently.

Advantage: time reduced, No structural/system changes are done apart from the column addition, commits are possible at intervals.

Disadvantage: Initial groundwork required for running the updates, explicit coding is required if the updates are to be run in parallel.

Parallel Processing

Parallel processing can be used for running huge updates. This is achieved by setting the table's degree to a higher value. This would result in an UPDATE statement being fired in multiple threads. The number of threads invoked would be equal to what is set as the table degree. The maximum threads used by Oracle does not exceed the value specified by the initialization parameter PARALLEL_MAX_SERVERS.

The tables V$PX_PROCESS, V$PX_SESSION, V$PX_SESSTAT and V$PX_PROCESS_SYSSTAT are used to monitor parallel processing activities. Before running the update process, one large rollback segment is made available for it, all the rest are made offline. If using Undo tablespace, the size should be verified and increased if necessary. In my case, I keep a rollback segment of 500MB online.

session 1:

SQL> select name, value from v$parameter 
  where name = 'parallel_max_servers';

NAME                                   VALUE
-------------------------------------- --------------
parallel_max_servers                   8


SQL>  alter rollback segment rbs01 offline;  
 -- This is done for all rbs (not system!) 
     except the big one.

Rollback segment altered.


SQL> alter table mtl_trx parallel 
  (degree 6);


SQL> alter table mtl_trx add lovind varchar2(1);

Table altered.

SQL> update mtl_trx set lovind = 'X';

Session 2 (While the above statement is running):

SQL> select * from v$px_process;

SERV STATUS           PID SPID             SID    SERIAL#
---- --------- ---------- --------- ---------- ----------
P000 IN USE            62 5478              26      19690
P001 IN USE            65 5480              81        946
P002 IN USE            68 5482              78       1465
P003 IN USE            70 5484               9      16174
P004 IN USE            71 5486              56       6246
P005 IN USE            72 5488              82        272

6 rows selected.


SQL> select sid, serial# from v$session where program like '%P00%';

       SID    SERIAL#
---------- ----------
         9      16174
        26      19690
        56       6246
        78       1465
        81        946
        82        272

6 rows selected.


SQL> select saddr, sid, serial#, server#, degree,  req_degree
  2  from   v$px_session
  3  where sid in (9, 26, 56, 78, 81, 82);

SADDR           SID    SERIAL#    SERVER#     DEGREE REQ_DEGREE
-------- ---------- ---------- ---------- ---------- ----------
8C0A7ED8         26      19690          1          6          6
8C0C4E0C         81        946          2          6          6
8C0C34C8         78       1465          3          6          6
8C09EFAC          9      16174          4          6          6
8C0B7B80         56       6246          5          6          6
8C0C5678         82        272          6          6          6

6 rows selected.


SQL> select * from v$px_process_sysstat;

STATISTIC                           VALUE
------------------------------ ----------
Servers In Use                          6
Servers Available                       0
Servers Started                        18
Servers Shutdown                       12
Servers Highwater                       6
Servers Cleaned Up                      0
Server Sessions                         6
Memory Chunks Allocated                 1
Memory Chunks Freed                     0
Memory Chunks Current                   1
Memory Chunks HWM                       1

STATISTIC                           VALUE
------------------------------ ----------
Buffers Allocated                  104146
Buffers Freed                      104128
Buffers Current                        18
Buffers HWM                            28

15 rows selected.

Session 1:

The session  completes update.

2205961 rows updated.

SQL> commit;

Commit complete.

SQL> alter table mtl_trx modify lovind default 'X' not null;

Table altered.

SQL> alter table mtl_trx noparallel; 

Table altered.

SQL> select degree from dba_tables where table_name = 'MTL_TRX';

DEGREE
----------
         1

The last step of setting the degree back to 1 should not be skipped, as it may affect online queries. The total process took around 16-18 minutes (excluding statistics generation). This can be further reduced by increasing the number of threads.

Advantage: Oracle provided threading option is used, process time reduced.

Disadvantage: Resource intensive, undo segment errors are common if sizing is not done before hand, no commits are possible in intervals, the complete process fails if a single thread returns an error (rolling back changes takes considerable time), temporary alteration of degree is required for the table (the initialization parameter PARALLEL_MAX_SERVERS may also need to be set).

Exp/Imp

Data can be exported and re-imported with a default value on the column. I personally don't recommend this, as the idea of taking data out of the table in a production database is risky and sometimes dependencies exists between objects (e.g.: materialized views, foreign key relations etc.). A copy could be made for safety sake but this would depend on space availability.

Running export at OS level:

$ exp userid=apps/apps file=mtl_trx.dmp log=1.log tables=mtl_trx rows=y triggers=y grants=y indexes=y

Export: Release 8.1.7.4.0 - Production on Wed May 26 15:06:32 2004

(c) Copyright 2000 Oracle Corporation.  All rights reserved.

Connected to: Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
With the Partitioning option
JServer Release 8.1.7.4.0 - Production
Export done in US7ASCII character set and US7ASCII NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                        MTL_TRX    2205961 rows exported
Export terminated successfully without warnings.

Instead of truncating, I rename the tables and the indexes and create a new structure. This will have the new column with the default value.

SQL> rename mtl_trx to bkp_mtl_trx;

Table renamed.

SQL> alter index mtl_trx_u1 rename to bkp_mtl_trx_u1;

Index altered.

SQL> create table mtl_trx as select * from bkp_mtl_trx where 1 = 2;

Table created.

SQL> alter table mtl_trx add xtcind varchar2(1) default 'N' not null;

Table altered.

data imported back:

$ imp userid=apps/apps file=mtl_trx.dmp log=2.log buffer=2097152 ignore=y commit
=y tables=mtl_trx

Import: Release 8.1.7.4.0 - Production on Wed May 26 15:59:18 2004

(c) Copyright 2000 Oracle Corporation.  All rights reserved.


Connected to: Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
With the Partitioning option
JServer Release 8.1.7.4.0 - Production

Export file created by EXPORT:V08.01.07 via conventional path
import done in US7ASCII character set and US7ASCII NCHAR character set
. importing APPS's objects into APPS
. . importing table                      "MTL_TRX"    2205961 rows imported
Import terminated successfully without warnings.

The new column is populated with the default value when the import takes place. The total process takes around 20-24 minutes (excluding statistics generation).

Advantage: Process time reduced, commits are possible at intervals with BUFFER and COMMIT setting in import, no Undo-space issues.

Disadvantage: Complete downtime is required, structural changes are done, sufficient space is required at both OS and database level, a risk involved as data is being moved out of the database, the more indexes that are present on the table the more the process time increases, connected objects like triggers and materialized views have to be evaluated for their role.

Using multiple jobs

I have used this method in many places for DML activities, with or without conditions, and found it to be quite fast (most of the time even faster than the other methods). The processes can be tuned to work in the available resources. The idea is to break a big job into smaller units that can be run independently of each other. These smaller jobs can be executed in groups at proper intervals. In case a job fails, the same can be re-run and will not require the whole process to be started again. Changes are independently committed in each job, making them separate transactions.

This process does not require any structural changes except for the new column addition. The number of jobs can be altered to moderate the processing. The Oracle provided job scheduler, (DBMS_JOB), is used to submit and run the jobs.

Steps followed:

1. Add the new column to the table with no default value.

2. Submit jobs to update the table. Each job will update a certain number of rows and commit the changes.

3. Once all of the jobs are completed, modify the column to specify a default value and make it not null.

To break an update on a huge table into smaller units of work, we require a unique column or a non-unique column that has good cardinality for the values present. If an index does not exist on such a column, one needs to be created temporarily. Preferably, it should be a key column such as TRX_ID (unique) of MTL_TRX table in my case. We need to decide how many jobs we should generate and what will be the load for each job based on the identified column.

For example, my table, MTL_TRX, has 2.2 million records; I want to submit 7000 records approximately in each job. Therefore, about 300 threads need to be created. This information can be hardcoded in the script, but I thought of going a step further and making a generic routine that could be used for all such processing. The package JNC_SLICE_JOB takes input (minimum key value, maximum key value and number of threads) from the calling program and stores the job breakup information in a PL/SQL table of records. This information is then used by the calling program to submit job requests.

Running the routine from outside produces the following result. The routine code is present at the end of the article.

SQL> set serverout on
SQL> declare
  2    l_status varchar2(200);
  3  begin
  4    jnc_slice_job.slice(1000, 5000, 8, l_status);
  5    dbms_output.put_line(l_status);
  6  end;
  7  /
1000 : 1499       --the from and to range for each thread.
1500 : 1999
2000 : 2499
2500 : 2999
3000 : 3499
3500 : 3999
4000 : 4499
4500 : 5000
OK

Inputs provided are: minumum value of a key column (1000), 
                     maximum value of the key column (5000), 
                     no of threads to create (8)
Ouput returned is  : status of the execution, OK (if successful), oracle error (if failure).
The table SLICE_TAB is populated with the processing range for each job. This 
table can then be referred in the calling routine to submit the jobs.

The code mentioned here deals with only numeric data and can be further modified to work with VARCHAR2 datatypes or even ROWID. Now, coming back to our original requirement, the job processing parameters are set as below:

SQL> select name, value 
  2  from   v$parameter
  3  where  name like 'job%';

NAME                             VALUE
-------------------------------- -----
job_queue_processes              20
job_queue_interval               30

The above setting means that 20 jobs are to be run every 30 seconds. I can also modify the count of jobs to be executed on the fly, based on system load.

alter system set job_queue_processes = 25;

The following script is called to submit Oracle jobs.

set serverout on
declare
  l_thread  pls_integer := 300;
  l_stanum  number := 0;
  l_endnum  number := 0;
  l_status  varchar2(200);
  l_job_id  number;
  l_job     varchar2(4000);
  errexc    exception;
begin
  execute immediate 'alter table mtl_trx add invind varchar2(1)';

  select min(trx_id)
  into   l_stanum
  from   mtl_trx;

  select max(trx_id)
  into   l_endnum
  from   mtl_trx;

  jnc_slice_job.slice(l_stanum, l_endnum, l_thread, l_status);
  if l_status != 'OK' then
    dbms_output.put_line(l_status);
    raise errexc;
  end if;

  for i in nvl(jnc_slice_job.slice_tab.first, 0) .. nvl(jnc_slice_job.slice_tab.last, 0) loop
    l_job :=  'begin update mtl_trx set invind = ''N'' where  trx_id between ' 
              || jnc_slice_job.slice_tab(i).minlmt
              || ' and '   
              || jnc_slice_job.slice_tab(i).maxlmt 
              || ';' 
              || '  commit; end;';

    dbms_job.submit(l_job_id, l_job, sysdate, null);
  end loop;
  commit;
  dbms_output.put_line('Please monitor jobs submitted..');
end;
                    

On running the above script, 300 jobs are submitted and execution is started at once. The job processing can be monitored from DBA_JOBS and DBA_JOBS_RUNNING tables.

SQL> select job, what from user_jobs;

       JOB WHAT
---------- ------------------------------------------------------------------------
     14070 begin update mtl_trx set invind = 'N' where  trx_id between 22214424 and..
     14071 begin update mtl_trx set invind = 'N' where  trx_id between 22236581 and..
     14072 begin update mtl_trx set invind = 'N' where  trx_id between 22258738 and..
     14073 begin update mtl_trx set invind = 'N' where  trx_id between 22280895 and..
     14074 begin update mtl_trx set invind = 'N' where  trx_id between 22303052 and..
     14075 begin update mtl_trx set invind = 'N' where  trx_id between 22325209 and..
     14076 begin update mtl_trx set invind = 'N' where  trx_id between 22347366 and..
     14077 begin update mtl_trx set invind = 'N' where  trx_id between 22369523 and..

Upon completion of all the jobs, the column is marked as not null with a default value.

SQL> alter table mtl_trx modify invind default 'N' not null;

Table altered.

The complete process takes 6-9 minutes (excluding statistics generation). The execution of jobs in threads is quite fast as the WHERE condition uses the unique index present on the column specified. This is acceptable timing without any structural changes being done. The timing can be further brought down by reducing the time interval or increasing the number of jobs.

Advantage: Process time is considerably reduced. Commits are possible at intervals. Each job is a separate transaction and can be run again independently in case of failure (and avoid re-doing the whole work again). The option of submitting in chunks gives us the advantage of running it online also, multiple undo segments are used, no structural changes are required apart from column addition, and updates based on a logical analysis in the code are possible.

Disadvantage: Requires initial groundwork and proper planning to have the process in place, initialization parameters are altered for a short time.

Using Partitions

Partitioning is used in CBO environments. This feature allows us to break a large table down into smaller segments. If a new column is being added to an already partitioned table, we can modify the statement to make use of the partitions and save time.

For this example, I recreated the MTL_TRX table with partitions on the transaction date column. The table is analyzed and the update is carried out with optimizer mode set to FIRST_ROWS.

create table mtl_trx_2
(TRX_ID       NUMBER,
 TRX_DATE     DATE,
 ITEM_ID      NUMBER,
 SHOP         VARCHAR2(10),
 TRX_TYPE_ID  NUMBER,
 QTY          NUMBER,
 COST         NUMBER)
partition by range (trx_date)
(partition p1 values less than (to_date('01-dec-2003', 'dd-mon-yyyy')),
 partition p2 values less than (to_date('01-jan-2004', 'dd-mon-yyyy')),
 partition p3 values less than (to_date('01-feb-2004', 'dd-mon-yyyy')),
 partition p4 values less than (to_date('01-mar-2004', 'dd-mon-yyyy')),
 partition p5 values less than (to_date('01-apr-2004', 'dd-mon-yyyy')),
 partition p6 values less than (to_date('01-may-2004', 'dd-mon-yyyy')),
 partition p7 values less than (to_date('01-jun-2004', 'dd-mon-yyyy')),
 partition p8 values less than (to_date('01-jul-2004', 'dd-mon-yyyy')));

This allows me to split data month-wise in different segments. I will now create the new column without any default value.

SQL> alter table mtl_trx add stcind varchar2(1);

Table altered.

The update of default value is carried out partition-wise. This will result in only one segment being updated at a time. The update time is reduced as a full-table scan is avoided and Oracle concentrates only on the relevant partition that is referred to by the WHERE clause condition (partition pruning feature of the optimizer).

SQL> update mtl_trx
  2  set    stcind = 'A'
  3  where  trx_date >= '01-apr-04' and trx_date < '01-may-04';

214930 rows updated.

and so on..

I run the partition-wise updates individually for different segments. The column is finally marked as not null once all of the updates are complete. The total process takes me around 10-12 minutes.

Advantage: Process time is reduced, commits possible at intervals (if multiple updates are run separately).

Disadvantage: Table should be partitioned, no other structural/system change apart from the column addition, an undo space problem may occur if the partition size is too large.

Conclusion

As per my analysis, the process of running jobs in smaller units of work has proven to be quite flexible, convenient and fast in the long run. Other methods can also be used, depending on the size of the tables being dealt with.

As the database size increases, doing such massive activities becomes more cumbersome and prone to failure. We need to look out for other alternatives, as mentioned above, to make things easier and faster. At all times we need to make sure that existing data is not being put at risk.

» See All Articles by Columnist Amar Kumar Padhi

---------------------------------------------------------------------------------------

--Script : jnc_slice_job.sql
--Subject: Routine for slicing a given value and placing it in table format.
--Note   : The process will pick up the threading information from the calling routine.
---------------------------------------------------------------------------------------

create or replace package jnc_slice_job is
  procedure slice(pi_min_id in             number,
                  pi_max_id in             number,
                  pi_threads in            number,
                  pio_status in out nocopy varchar2);

  type slice_row is record(minlmt  number, maxlmt number);
  type slice_typ is table of slice_row
      index by binary_integer;

  slice_tab  slice_typ;
end jnc_slice_job;
/

create or replace package body jnc_slice_job is

procedure slice(pi_min_id in             number,
                pi_max_id in             number,
                pi_threads in            number,
                pio_status in out nocopy varchar2) is

  l_min_id   number := pi_min_id;
  l_max_id   number := pi_max_id;
  l_threads  number := pi_threads;
  l_thread_load number := 0;
  l_dif      number := 0;
  l_cnt      number := 0;

begin
  slice_tab.delete;

  pio_status := 'OK';

  l_dif := l_max_id - l_min_id;

  l_thread_load := round(l_dif/l_threads);

  while l_min_id < l_max_id loop
    l_cnt := l_cnt + 1;
    slice_tab(l_cnt).minlmt := l_min_id;
    l_min_id := l_min_id + l_thread_load;

    if l_min_id >= l_max_id then
      slice_tab(l_cnt).maxlmt := l_min_id;   
    else
      slice_tab(l_cnt).maxlmt := l_min_id - 1;   
    end if;
  end loop;

  for i in slice_tab.first .. slice_tab.last loop
   dbms_output.put_line(slice_tab(i).minlmt || ' : ' || slice_tab(i).maxlmt);
  end loop;  

exception
  when others then
    pio_status := 'Error in slicing. ' || substr(sqlerrm, 1, 150);
end;

end jnc_slice_job;
/

sho err



Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved