Re-thinking SQL Queries Using Sets in Oracle

Interesting questions can be posted in the Oracle database forums and one recent post generated a great deal of attention. A query and part of the execution plan were posted with the poster asking how to get rid of the table scan. The table has a primary key index so, on the face of it, it is interesting that the primary key wasn’t used. Of course to provide any sort of insight or solution the table and index definitions need to be provided. In a response from the original post the CREATE TABLE and CREATE INDEX statements were provided. Let’s take a look at these objects and see how this question was addressed. I am indebted to both Jonathan Lewis and Andrew Sayer for providing the various query re-writes offered here.

Looking at the definition for the ‘problem’ table (LF_HOTEL_TEMP) we find only two columns, both of which are in the primary key definition. The tables were created:


SQL> create table LF_HOTEL_TEMP
  2  (
  3    HOTEL_CODE VARCHAR2(4000),
  4    SERVICE_ID NUMBER(3)
  5  )
  6  /

Table created.

SQL> 
SQL> create table LF_TS_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL> 
SQL> create table LF_HB_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL> 
SQL> create table LF_GTA_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL> 

Then the primary key and additional indexes were created:


SQL> alter table lf_hotel_temp
  2    add constraint lf_hotel_temp_PK primary key (HOTEL_CODE,service_id)
  3  /

Table altered.

SQL> 
SQL> create  index LF_hb_roomtype_prop_IDX on lf_hb_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL> 
SQL> create  index LF_ts_roomtype_prop_IDX on lf_ts_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL> 
SQL> create index LF_gta_roomtype_prop_IDX on lf_gta_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL> 

The original poster reported 278,000 rows in the LF_HOTEL_TEMP table so data was generated to reproduce that number of rows. Next the remaining tables were populated so that none of the three remaining tables shared any data:


SQL> begin
  2  	     for i in 1..278000 loop
  3  		     insert into lf_hotel_temp
  4  		     values(i, mod(i,999)+1);
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> insert into lf_ts_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id < 511;

142058 rows created.

SQL> insert into lf_ts_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 313 and 642;

91740 rows created.

SQL> insert into lf_gta_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 271 and 823;

153743 rows created.

SQL> insert into lf_hb_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 571 and 999;

119262 rows created.

SQL> COMMIT;

Commit complete.

SQL> 

Let’s look at the original query and its execution plan:


SQL> set autotrace on linesize 140
SQL> 
SQL> -- original query
SQL> SELECT a.hotel_code
  2    FROM lf_hotel_temp a
  3  WHERE a.service_id = : p_service_id
  4  	    AND (NOT EXISTS (SELECT *
  5  	       FROM lf_ts_roomtype_properties b
  6  	      WHERE a.hotel_code = b.hotel_code)
  7  	     or NOT EXISTS (SELECT *
  8  	       FROM lf_gta_roomtype_properties b
  9  	      WHERE a.hotel_code = b.hotel_code)
 10  	    or	NOT EXISTS (SELECT *
 11  	       FROM lf_hb_roomtype_properties b
 12  	      WHERE a.hotel_code = b.hotel_code));

HOTEL_CODE                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------
1998                                                                                                                                        
999                                                                                                                                         
5994                                                                                                                                        
...
243756                                                                                                                                      
235764                                                                                                                                      
238761                                                                                                                                      
                                                                                                                                            

278 rows selected.


Execution Plan
----------------------------------------------------------                                                                                  
Plan hash value: 4111332730                                                                                                                 
                                                                                                                                            
-----------------------------------------------------------------------------------------------                                             
| Id  | Operation          | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |                                             
-----------------------------------------------------------------------------------------------                                             
|   0 | SELECT STATEMENT   |                          |     3 |    33 |   512   (4)| 00:00:01 |                                             
|*  1 |  FILTER            |                          |       |       |            |          |                                             
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            |   278 |  3058 |    94  (16)| 00:00:01 |                                             
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |     7 |     3   (0)| 00:00:01 |                                             
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |     7 |     1   (0)| 00:00:01 |                                             
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     1 |     7 |     1   (0)| 00:00:01 |                                             
-----------------------------------------------------------------------------------------------                                             
                                                                                                                                            
Predicate Information (identified by operation id):                                                                                         
---------------------------------------------------                                                                                         
                                                                                                                                            
   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE                                                              
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"                                          
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"                                         
              "B" WHERE "B"."HOTEL_CODE"=:B3))                                                                                              
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))                                                                                    
   3 - access("B"."HOTEL_CODE"=:B1)                                                                                                         
   4 - access("B"."HOTEL_CODE"=:B1)                                                                                                         
   5 - access("B"."HOTEL_CODE"=:B1)                                                                                                         
                                                                                                                                            


Statistics
----------------------------------------------------------                                                                                  
         11  recursive calls                                                                                                                
          0  db block gets                                                                                                                  
       2077  consistent gets                                                                                                                
        872  physical reads                                                                                                                 
          0  redo size                                                                                                                      
       4119  bytes sent via SQL*Net to client                                                                                               
        570  bytes received via SQL*Net from client                                                                                         
         20  SQL*Net roundtrips to/from client                                                                                              
          0  sorts (memory)                                                                                                                 
          0  sorts (disk)                                                                                                                   
        278  rows processed                                                                                                                 


SQL> 

All but LF_HOTEL_TEMP use the indexes to speed access to the data; this may be due to the fact that LF_HOTEL_TEMP is a two-column table and the primary key index will be larger than the table since it contains the table data and a rowid for each row. The main idea illustrated by this problem involves being able to find a logically equivalent way of writing the SQL. Jonathan Lewis decided that set operations might produce a ‘better’ plan and re-wrote the query as shown below, producing a plan accessing LF_HOTEL_TEMP by the primary key index:


SQL> -- JL solution
SQL> var p_service_id number
SQL> exec : p_service_id := 1

PL/SQL procedure successfully completed.

SQL> 
SQL> select  /*+ dynamic_sampling(0) */
  2  	     hotel_code
  3  from    lf_hotel_temp
  4  where   service_id = :p_service_id
  5  minus   (
  6  	     select  hotel_code
  7  	     from    lf_ts_roomtype_properties
  8  	     where   hotel_code is not null
  9  	     intersect
 10  	     select  hotel_code
 11  	     from    lf_gta_roomtype_properties
 12  	     where   hotel_code is not null
 13  	     intersect
 14  	     select  hotel_code
 15  	     from    lf_hb_roomtype_properties
 16  	     where   hotel_code is not null
 17  	     )
 18  ;

HOTEL_CODE                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------
100899                                                                                                                                      
101898                                                                                                                                      
102897                                                                                                                                      
...
999                                                                                                                                         
9990                                                                                                                                        
99900                                                                                                                                       
                                                                                                                                            

278 rows selected.


Execution Plan
----------------------------------------------------------                                                                                  
Plan hash value: 775735246                                                                                                                  
                                                                                                                                            
--------------------------------------------------------------------------------------------------                                          
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |                                          
--------------------------------------------------------------------------------------------------                                          
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     4 (100)| 00:00:01 |                                          
|   1 |  MINUS                |                          |       |       |            |          |                                          
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     1 (100)| 00:00:01 |                                          
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     0   (0)| 00:00:01 |                                          
|   4 |   INTERSECTION        |                          |       |       |            |          |                                          
|   5 |    INTERSECTION       |                          |       |       |            |          |                                          
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     1 (100)| 00:00:01 |                                          
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     0   (0)| 00:00:01 |                                          
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     1 (100)| 00:00:01 |                                          
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     0   (0)| 00:00:01 |                                          
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     1 (100)| 00:00:01 |                                          
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     0   (0)| 00:00:01 |                                          
--------------------------------------------------------------------------------------------------                                          
                                                                                                                                            
Predicate Information (identified by operation id):                                                                                         
---------------------------------------------------                                                                                         
                                                                                                                                            
   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))                                                                                        
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))                                                                                        
   7 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
   9 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
  11 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
                                                                                                                                            


Statistics
----------------------------------------------------------                                                                                  
          1  recursive calls                                                                                                                
          0  db block gets                                                                                                                  
       5996  consistent gets                                                                                                                
       2416  physical reads                                                                                                                 
     234680  redo size                                                                                                                      
       4119  bytes sent via SQL*Net to client                                                                                               
        570  bytes received via SQL*Net from client                                                                                         
         20  SQL*Net roundtrips to/from client                                                                                              
          0  sorts (memory)                                                                                                                 
          0  sorts (disk)                                                                                                                   
        278  rows processed                                                                                                                 

SQL> 

Andrew Sayer then provided another pair of re-writes, shown below with the plans returned by Oracle 12.1.0.2:


SQL> -- Andrew Sayer
SQL> 
SQL> with existance_check as (select hotel_code
  2  			     from  (select distinct hotel_code
  3  				     from  lf_ts_roomtype_properties
  4  				     where  hotel_code is not null
  5  				   union all
  6  				     select distinct hotel_code
  7  				     from  lf_gta_roomtype_properties
  8  				     where  hotel_code is not null
  9  				   union all
 10  				     select distinct hotel_code
 11  				     from  lf_hb_roomtype_properties
 12  				     where  hotel_code is not null
 13  				   )
 14  			     group by hotel_code
 15  			     having count(*) = 3
 16  			     )
 17  SELECT a.hotel_code
 18    FROM lf_hotel_temp a
 19  WHERE a.service_id = : p_service_id
 20  	   AND NOT EXISTS (SELECT *
 21  	       FROM existance_check b
 22  	     WHERE a.hotel_code = b.hotel_code)
 23  /

HOTEL_CODE                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------
252747                                                                                                                                      
131868                                                                                                                                      
118881                                                                                                                                      
...
73926                                                                                                                                       
197802                                                                                                                                      
192807                                                                                                                                      
                                                                                                                                            

278 rows selected.


Execution Plan
----------------------------------------------------------                                                                                  
Plan hash value: 1887592732                                                                                                                 
                                                                                                                                            
---------------------------------------------------------------------------------------------------------------                             
| Id  | Operation                | Name                       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                             
---------------------------------------------------------------------------------------------------------------                             
|   0 | SELECT STATEMENT         |                            |  2899 |  5738K|       |  3454   (9)| 00:00:01 |                             
|*  1 |  HASH JOIN ANTI          |                            |  2899 |  5738K|  5744K|  3454   (9)| 00:00:01 |                             
|*  2 |   TABLE ACCESS FULL      | LF_HOTEL_TEMP              |  2899 |  5704K|       |    98  (20)| 00:00:01 |                             
|   3 |   VIEW                   |                            |   611K|  7162K|       |  2843   (9)| 00:00:01 |                             
|*  4 |    FILTER                |                            |       |       |       |            |          |                             
|   5 |     HASH GROUP BY        |                            |   611K|  7162K|       |  2843   (9)| 00:00:01 |                             
|   6 |      VIEW                |                            |   611K|  7162K|       |  2736   (5)| 00:00:01 |                             
|   7 |       UNION-ALL          |                            |       |       |       |            |          |                             
|   8 |        HASH UNIQUE       |                            |   312K|  3660K|  6136K|  1387   (5)| 00:00:01 |                             
|*  9 |         TABLE ACCESS FULL| LF_TS_ROOMTYPE_PROPERTIES  |   312K|  3660K|       |    74  (15)| 00:00:01 |                             
|  10 |        HASH UNIQUE       |                            |   176K|  2063K|  3464K|   796   (5)| 00:00:01 |                             
|* 11 |         TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |   176K|  2063K|       |    55  (13)| 00:00:01 |                             
|  12 |        HASH UNIQUE       |                            |   122K|  1438K|  2416K|   553   (5)| 00:00:01 |                             
|* 13 |         TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES  |   122K|  1438K|       |    37  (14)| 00:00:01 |                             
---------------------------------------------------------------------------------------------------------------                             
                                                                                                                                            
Predicate Information (identified by operation id):                                                                                         
---------------------------------------------------                                                                                         
                                                                                                                                            
   1 - access("A"."HOTEL_CODE"="B"."HOTEL_CODE")                                                                                            
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))                                                                                    
   4 - filter(COUNT(*)=3)                                                                                                                   
   9 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
  11 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
  13 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
                                                                                                                                            
Note                                                                                                                                        
-----                                                                                                                                       
   - dynamic statistics used: dynamic sampling (level=2)                                                                                    
                                                                                                                                            


Statistics
----------------------------------------------------------                                                                                  
         37  recursive calls                                                                                                                
          0  db block gets                                                                                                                  
       2664  consistent gets                                                                                                                
       1056  physical reads                                                                                                                 
      37440  redo size                                                                                                                      
       4119  bytes sent via SQL*Net to client                                                                                               
        570  bytes received via SQL*Net from client                                                                                         
         20  SQL*Net roundtrips to/from client                                                                                              
          0  sorts (memory)                                                                                                                 
          0  sorts (disk)                                                                                                                   
        278  rows processed                                                                                                                 

SQL> 
SQL> with existance_check as (select hotel_code
  2  			     from   lf_ts_roomtype_properties
  3  			     where  hotel_code is not null
  4  			   intersect
  5  			     select  hotel_code
  6  			     from   lf_gta_roomtype_properties
  7  			     where  hotel_code is not null
  8  			   intersect
  9  			     select hotel_code
 10  			     from   lf_hb_roomtype_properties
 11  			     where  hotel_code is not null
 12  			     )
 13  SELECT a.hotel_code
 14    FROM lf_hotel_temp a
 15  WHERE a.service_id = : p_service_id
 16  	   AND NOT EXISTS (SELECT *
 17  	       FROM existance_check b
 18  	     WHERE a.hotel_code = b.hotel_code)
 19  /

HOTEL_CODE                                                                                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------
252747                                                                                                                                      
131868                                                                                                                                      
118881                                                                                                                                      
...
73926                                                                                                                                       
197802                                                                                                                                      
192807                                                                                                                                      
                                                                                                                                            

278 rows selected.


Execution Plan
----------------------------------------------------------                                                                                  
Plan hash value: 1347650198                                                                                                                 
                                                                                                                                            
-------------------------------------------------------------------------------------------------------------                               
| Id  | Operation              | Name                       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |                               
-------------------------------------------------------------------------------------------------------------                               
|   0 | SELECT STATEMENT       |                            |  2899 |  5738K|       |  3048   (6)| 00:00:01 |                               
|*  1 |  HASH JOIN RIGHT ANTI  |                            |  2899 |  5738K|  2880K|  3048   (6)| 00:00:01 |                               
|   2 |   VIEW                 |                            |   122K|  1438K|       |  2736   (5)| 00:00:01 |                               
|   3 |    INTERSECTION        |                            |       |       |       |            |          |                               
|   4 |     INTERSECTION       |                            |       |       |       |            |          |                               
|   5 |      SORT UNIQUE       |                            |   312K|  3660K|  6136K|            |          |                               
|*  6 |       TABLE ACCESS FULL| LF_TS_ROOMTYPE_PROPERTIES  |   312K|  3660K|       |    74  (15)| 00:00:01 |                               
|   7 |      SORT UNIQUE       |                            |   176K|  2063K|  3464K|            |          |                               
|*  8 |       TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |   176K|  2063K|       |    55  (13)| 00:00:01 |                               
|   9 |     SORT UNIQUE        |                            |   122K|  1438K|  2416K|            |          |                               
|* 10 |      TABLE ACCESS FULL | LF_HB_ROOMTYPE_PROPERTIES  |   122K|  1438K|       |    37  (14)| 00:00:01 |                               
|* 11 |   TABLE ACCESS FULL    | LF_HOTEL_TEMP              |  2899 |  5704K|       |    98  (20)| 00:00:01 |                               
-------------------------------------------------------------------------------------------------------------                               
                                                                                                                                            
Predicate Information (identified by operation id):                                                                                         
---------------------------------------------------                                                                                         
                                                                                                                                            
   1 - access("A"."HOTEL_CODE"="B"."HOTEL_CODE")                                                                                            
   6 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
   8 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
  10 - filter("HOTEL_CODE" IS NOT NULL)                                                                                                     
  11 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))                                                                                    
                                                                                                                                            
Note                                                                                                                                        
-----                                                                                                                                       
   - dynamic statistics used: dynamic sampling (level=2)                                                                                    
                                                                                                                                            


Statistics
----------------------------------------------------------                                                                                  
         14  recursive calls                                                                                                                
          0  db block gets                                                                                                                  
       1848  consistent gets                                                                                                                
          0  physical reads                                                                                                                 
          0  redo size                                                                                                                      
       4119  bytes sent via SQL*Net to client                                                                                               
        570  bytes received via SQL*Net from client                                                                                         
         20  SQL*Net roundtrips to/from client                                                                                              
          3  sorts (memory)                                                                                                                 
          0  sorts (disk)                                                                                                                   
        278  rows processed                                                                                                                 

SQL> 

Four approaches, four different plans, but all generating the same result set. In the reported statistics from Jonathan Lewis’ query the redo size is much larger than from the other plans; however the plan for Jonathan’s query is the only one using an index on LF_HOTEL_TEMP. It is possible that very small data sets will produce other execution plans, with more of them using the indexes; however using a data set sized the same as the one the original poster is using makes for more ‘honest’ comparisons. In general, the set operations appear to be more efficient than the non-set operations.

Using set operations in a relational database may not be the most obvious of choices, but it is good to remember that SQL is, at its heart, a set-based language. When thinking about alternative ways to solve a problem the best way may not be the one that is most commonly used; this problem brings that into focus. Many people don’t think about database data in terms of sets, but, really, that is what a SQL query generates, a result set. Thinking about data that way may open up more efficient ways to return those sets.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles