Oracle Optimizer: Moving to and working with CBO

Monday Jul 28th 2003 by Amar Kumar Padhi
Share:

Learn about the Oracle Optimizer and various efficient ways of moving to Cost Based Optimizer. Part 1 of a five part series.

If your application was developed in earlier releases of Oracle (v. 7 and earlier), chances are that your database is running in Rule based optimizer. This article will help you understand the Oracle Optimizer and various efficient ways of moving to Cost based Optimizer. This is Part 1 of a five part series.

Part 1
  1. What is Optimizer?
  2. Why Optimize?
  3. Available Optimizers
  4. Why is RBO being removed?
  5. Why move to CBO?
Part 2
  1. Initialization parameters that affect CBO
  2. Internal Oracle parameters that affect CBO
Part 3
  1. Setup changes for migrating to CBO
  2. Generating Statistics
  3. DML Monitoring
Part 4
  1. Hints
  2. Stored outlines
  3. Statistics for SYS schema
Part 5
  1. New Privileges
  2. How to analyze execution plans in CBO?
  3. Oracle Applications 11i specific information for CBO
  4. Conclusion

1. What is Optimizer?

In Oracle, a query may be executed in more than one way. The execution plan that has the best ranking or the lowest cost is the one that will return output with the fastest rate and optimal utilization of resources. The execution plan is generated by the Optimizer. Optimizer is an 'engine' running in the database that is dedicated to deriving a list of execution paths based on various conditions and then choosing the most efficient for running a query. Once an execution plan choice is made, it is then carried out to arrive at the output.

In Oracle, Optimizer relates to DML statements.

2. Why Optimize?

You know it! Optimizing a query aims at executing it in the shortest time and with optimal use of resources, thus making it fast and efficient. By resources, here I mean CPU utilization, hard disk I/O, memory consumption and to some extent, network operations. Irrespective of how big or rich your server is in terms of these resources, improper or sub-optimal queries will always be expensive and may drag your session or impact other process on the server.

The extent to which a query is expensive will depend on lot of factors, including the size of the result set to be fetched, the size of the data being scanned to retrieve the result set and the load on the system at that point in time. Proper optimization of statements will save your users lot of runtime wastage and unwanted resource utilization.

3. Available Optimizers

Oracle has two modes for Optimizer to decide on the best execution plan, Rule based and Cost based. This article concentrates on Cost Based Optimizer and Rule based is described in brief.

3.1 Rule Based Optimizer (RBO)

RBO follows a simple ranking methodology. Fifteen ranking points are designed in this optimizer. When a query is received, the optimizer evaluates the number of points that are satisfied. The execution path with the best rank (lowest number) is then chosen for executing the query. The fifteen-point ranking is mentioned below.

  1. Single row by ROWID
  2. Single row by cluster join
  3. Single row by hash cluster with unique or primary key
  4. Single row by unique or primary key
  5. Cluster join
  6. Hash cluster key
  7. Indexed cluster key
  8. Composite key
  9. Single column indexes
  10. Bounded range on index columns
  11. Unbounded range on indexed columns
  12. Sort merge join
  13. MAX or MIN on indexed column
  14. ORDER BY on indexed columns
  15. Full table scan

For example, If I fire a query on a table that has two columns that are searched for exact match (equal-to) in the where clause condition, one being the primary key and the other column has a non-unique key, RBO will prefer the primary key (rank 4) to the non-unique key (rank 9).

When more than one table is accessed in a query, the optimizer needs to decide which should be the driving table. The RBO generates a set of join orders, each with a different table as the first table. Then the most optimal plan is chosen from the resulting set of execution plans.

The optimizer evaluates the execution plans for various conditions such as (fewest nested-loop, fewest sort-merge joins, table with the best ranking access path, etc.). If there is still a tie, the optimizer chooses the execution plan for which the first table appears later in the query's FROM clause. Hence, it is a conventional coding practice to put the driving table at the extreme right, followed by other tables in order of access in the FROM clause, i.e., the ordering of tables based on their access is from right to left.

Please note that the operators being used for searching the columns also play a role in deciding the ranking. Sometimes even the age of an index is considered for ranking!

For example the below table shows what index is used if column1 and column2 have indexes on them and if both are being referred in the where clause with "=" operator.

Example:

 select * from am79 where col1 = 1 and col2 = 'amar';
      -- here both col1 and col2 are indexed. 

 -------------------------------------------------------------------------------------
              Normal index types                   |     Index used in RBO
 column1(a)     column2(b)    column1+column2(c)   |     
 -------------------------------------------------------------------------------------
 non-unique                       non-unique             c
 non-unique       non-unique                             a + b
 non-unique       non-unique      non-unique             c
 unique           non-unique                             a
 unique                           non-unique             a
 unique           unique                                 b (the most recent index created)
 unique           unique          unique                 c
 -------------------------------------------------------------------------------------
 -The above is tested on Oracle 8.1.7.1.
 -In case of non-unique single column indexes, both indexes are used.
 -In case of unique indexes, they are not combined for execution plan, any one is taken.
 -Preference is given to the index available with the "=" operator column, than with
 others operators.
 -Don't create bitmap & function-based indexes, these will not work in RBO.
 -------------------------------------------------------------------------------------

RBO was the preferred choice for most setups in earlier releases of oracle as the execution paths were consistent and uniform. Queries would behave the same way if run on different databases of the same application.

3.2 Cost based optimizer (CBO)

CBO follows Expense calculation methodology. All execution plans are tagged with a cost, the one with the lowest cost will be chosen. The higher the cost the more resources will be used by the execution plan, the lower the cost, the more efficient the query is.

CBO uses all available information-statistics and histograms stored in the dictionary, user provided hints and supplied parameter settings to arrive at the cost. CBO generates all possible permutations of access methods and then chooses what fits best. The number of permutations depends on the number of tables present in the query and can sometimes be around 80,000 permutations or even more! Please refer to the parameter section in part 2 of this series for setting related parameters.

CBO may also perform operations such as query transformation, view merging, OR transformation, push join predicates, etc. that would change the original statement and alter existing or add new predicates, all with the aim of deriving new access plans that could be better than the existing ones. Note that transformation does not affect the data that is returned, only the execution path. Please refer to the parameter section in part 2 of this series for information related to this.

3.2.1 Statistics

Statistics provide critical input in order for CBO to work properly; these are generated for data storing objects and include information such as the number of rows in a table, distinct values in a column, number of leaf blocks in an index, etc. The more accurate the statistics, the more efficient the results provided by Optimizer. Please refer to the Generating statistics section in part 3 of this series for how this information is generated and how best we can maintain it.

Statistics may be exact or estimated. Statistics generated with a COMPUTE clause analyzes all of the data in the object. This gives the optimizer accurate information to work on and arrive at a good execution plan.

Statistics generated with an ESTIMATE clause analyzes data in the object to the extent of sample size mentioned. Sample size may be specified as number of rows or percentage of rows that should be randomly analyzed to generate the statistics. Optionally block sampling may also be specified. This saves on time if there are many huge tables in the system. The guarantee of good execution plans will depend on how close the estimated value is to the exact values. You can try out your setup at different sample sizes to arrive at an appropriate figure or have different estimation levels for different types of tables, but the idea is to get as close to accuracy as feasible.

Statistics are stored in a data dictionary in tables owned by SYS user. The following views display the statistics collected for tables, columns and indexes.

For Tables

DBA_TABLES
NUM_ROWS - Number of rows.
BLOCKS - Number of used blocks.
EMPTY_BLOCKS - Number of empty blocks that have never been used.
AVG_SPACE - Average free space (in bytes) in blocks allocated to the table. All empty and free blocks are considered for this.
CHAIN_CNT - Number of chained or migrated rows.
AVG_ROW_LEN - Average row length in bytes.
LAST_ANALYZED - Date when the table was last analyzed.
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE.
GLOBAL_STATS - For partitioned tables, YES - statistics collected as a whole, NO - statistics are estimated from statistics.
USER_STATS - Set to YES if user has explicitly set the statistics for the table.

Statistics for individual partitions of a table can be seen from DBA_TAB_PARTITIONS. Cluster statistics is available from DBA_CLUSTERS. Object table statistics are present in DBA_OBJECT_TABLES.

For Columns

DBA_TAB_COLUMNS
NUM_DISTINCT - Number of distinct values.
LOW_VALUE - Lowest value
HIGH_VALUE - Highest value
DENSITY - Density of the column
NUM_NULLS - Number of records with null value for the concerned column.
NUM_BUCKETS - Number of buckets in histograms. Refer Histograms section.
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE.
LAST_ANALYZED - Date when the table was last analyzed.

DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS.

For Indexes

DBA_INDEXES
BLEVEL - Depth of the index, from root to leaf.
LEAF_BLOCKS - Number of leaf blocks.
DISTINCT KEYS - Number of distinct keys.
AVG_LEAF_BLOCKS_PER_KEY - Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes.
AVG_DATA_BLOCKS_PER_KEY - Average number of blocks in the table that are pointed to by a distinct key.
CLUSTERING_FACTOR - A count that determines the ordering of the index. Index is ordered if count is closer to the number of blocks, i.e., entries in single leaf tend to point to rows in same blocks in the table. Index is randomly ordered if closer to the number of rows, i.e., entries in single leaf are pointing to rows spread across multiple blocks.
NUM_ROWS - Number of rows indexed.
SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE..
LAST_ANALYZED - Date when the table was last analyzed.
GLOBAL_STATS - For partitioned indexes, YES - statistics collected as a whole, NO - statistics are estimated from statistics.
USER_STATS - Set to YES if user has explicitly set the statistics for the index.
PCT_DIRECT_ACCESS - For secondary indexes on IOTs, percentage of rows with valid guess.

Statistics for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS.

Dictionary tables related to Histogram information are discussed later.

3.2.2 Available CBO Modes

CBO has two available modes in which to run, ALL_ROWS and FIRST_ROWS.

FIRST_ROWS aims at returning the first row(s) of the statement as soon as possible. This mode tells optimizer to give response time prime importance. It prefers nested-loop joins. FIRST_ROWS uses cost as well as some thumb rules to process the first set of rows. Examples of thumb rules - Plans using indexes are preferred over plans having full table scans as access path, ORDER BY clause can induce index access, etc.

As of release 9i, the number of rows to be returned in the first hit can also be mentioned in the parameter, FIRST_ROWS_n (n could be 1, 10, 100 or 1000). This could be set as per the application requirements.

ALL_ROWS processes all rows for a given query before returning the output. It forces optimizer to consider minimal use of resources and best throughput. ALL_ROWS prefers sort-merge joins.

For an OLTP system, FIRST_ROWS would be the ideal option for fast response time. ALL_ROWS is meant for batch processing applications. Note, a plan that produces the first n rows with the fastest response time might not be an optimal plan if requirement is to obtain the entire result, so decide as per the need of the application.

CBO is dynamic and tunes its execution plans as the database grows in size. So do not be taken aback if the same query that works perfectly in one database setup is behaving badly in some other database of the same application. This would happen if the setup and statistics differ between the two databases. To prevent such behavior, you may consider using optimizer plan stability, which is covered later in this series.

3.2.3 Basic CBO Terms

The following terms will be used quite often when analyzing statements in CBO.

Cost
The COST computed in CBO is a unit of expense involved with each operation. The logic as to how the cost is actually derived is not documented or made external. Moreover, this may change across releases.

Cardinality
The number of rows in the table or number of distinct row links in the index. The cardinality of a query is the number of rows that is expected to be returned by it.

Selectivity
The number of distinct values. The distinct values of a column being indexed are known as its selectivity. For example, if a table has 10000 rows and an index is created on a column having 4000 distinct values, then the selectivity of the index is (4000/10000) * 100 = 40%. Unique index on not null columns have a selectivity of 100%.

Transitivity
It is a process of generating additional predicates for a query by CBO. This enables optimizer to consider additional execution paths. For example if predicates are provided in query of the type A=B and B=C, the optimizer may add an additional predicate that indicates A=C.

Statistics
Much required information gathered for various data holding objects. This information is vital for the CBO to decide on execution plans.

Join Methods
Oracle uses joins like Hash, sort-merge and nested loops. A query may run faster using one type of join as compared to other methods. This should be evaluated for individual queries.

FTS
FTS or Full Table Scan relates to a query sequentially scanning a table from the first block to the last allocated block. This could be very expensive for big tables and should be avoided.

Index scan
Relates to random access of a table by use of one or more indexes on the table.

3.2.4 Minimum requirement

To start using CBO the minimum requirement is to set the optimizer mode to FIRST_ROWS or ALL_ROWS (or CHOOSE) and generate statistics for the objects. However, this will not ensure that your system is working at its best. Please refer to part 2 (Initialization parameters) for information regarding related initialization parameters.

Irrespective of the Optimizer mode settings, CBO is automatically invoked if one of the following is satisfied:

  • If hints are used.
  • If table is partitioned.
  • If tables are set for parallel.

4. Why is RBO being removed?

Oracle 9i release 2 will be the last version that officially supports RBO. Oracle recommends all partners and customers to certify their applications with CBO before this version is no longer supported. Though RBO will be available in Oracle 10i, it will no longer be supported.

As per a published Oracle note, the existence of RBO prevents Oracle from making key enhancements to its query-processing engine. Its removal will permit Oracle to improve performance and reliability of the query-processing components of the database engine.

Presently, Oracle support for RBO is limited to bug fixes only and no new functionality will be added to RBO.

5. Why move to CBO?

Key benefits that come to mind:

1. Oracle stopped developing for RBO environment a long time back.

2. RBO will subsequently be removed from the Oracle database.

3. RBO has a limited number of access methods compared to CBO.

4. All the new features require CBO. CBO is enabled to identify these features, and how to evaluate their cost. Most of these features will be of importance for any setup; e.g. Index organized tables, bitmap indexes, Function-based indexes, reverse-key indexes, Partitioning, Hash joins, Materialized views, parallel query, star joins, etc.

5. Metalink Support.

Once RBO is no longer supported, Oracle support will not be available.

6. CBO has matured.

Prior to Oracle 7, RBO could outperform CBO in some situations. Moreover, CBO would not behave as expected and often choose bad execution plans. CBO has been improved across releases and today it is a much better alternative considering the benefits and advances towards new features.

7. Distributed and remote queries are more reliable.

In RBO, it was difficult to fine tune queries that used database links and has tables from both local and remote database. CBO outperforms RBO in this regard. In CBO, the local optimizer is aware of the statistics present in the remote table and is able to make better decisions on execution plans. RBO may not consider indexes on remote databases, but CBO has access to statistics and information regarding indexes on a remote database and can decide on an execution plan.

» See All Articles by Columnist Amar Kumar Padhi

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved