Function use is common in IBM DB2 SQL. However, use and misuse of functions can affect query access paths and performance. Here are a few tips to help you tune IBM DB2 queries and avoid using functions that consume precious CPU resources.
Most of the common functions fall into two groups: scalar functions and aggregate functions.
A scalar function is defined as: "An SQL operation that produces a single value from another value and is expressed as a function name, followed by a list of arguments that are enclosed in parentheses." Some common scalar functions are:
- Character Manipulation: SUBSTR, LTRIM, RTRIM, LOCATE, LOWER, UPPER, POSSTR, LEFT, RIGHT, STRIP
- Numeric Conversion: CAST, INTEGER, DECIMAL, CHAR, DIGITS
- Date / Time Calculation: DATE, TIME, TIMESTAMP, HOUR, MINUTE, DAYOFMONTH
Aggregate functions derive their results by using values from one or more rows. They are also referred to as set functions. Some common aggregate functions include MIN, MAX, SUM, and AVG.
Scalar Function Invocation Effects on Predicate Evaluation
During execution, WHERE clause predicates are evaluated in two stages:
- Stage 1
- Indexable predicates (those that can match index entries)
- Non-indexable predicates
When DB2 can evaluate certain predicates at stage 1, the query that contains the predicate takes less time to run.
Within each stage, predicates are evaluated by type:
- Equals (has an equal operator and no NOT operator; also includes predicates like C1 IS NULL)
- Range (contains one of the following operators: >, >=, <, <=, LIKE, or BETWEEN)
Here are some general performance rules that the DBA and developer should know. For more details about predicates, query tuning, database monitoring and database performance tuning see the DB2 Performance and Tuning manual reference at the end of this article.
- Use stage 1 predicates whenever possible. Stage 1 predicates are better than stage 2 predicates because they disqualify rows earlier and reduce the amount of processing that is needed at stage 2. In terms of resource usage, the earlier a predicate is evaluated the better.
- Write queries to evaluate the most restrictive predicates first. When predicates with a high filter factor are processed first, unnecessary rows are screened as early as possible, which can reduce processing cost at a later stage. However, a predicate's restrictiveness is only effective among predicates of the same type and at the same evaluation stage.
Another interesting factor is that within each type predicates are evaluated in the order encountered in the SQL statement; so predicate order may affect performance. For multiple predicates of the same stage and the same type you should code the more restrictive predicate first. This allows DB2 to eliminate rows quickly from consideration; in addition, you can reduce unnecessary CPU usage by avoiding complex function invocations. Last, for predicates that are equally restrictive, you can order them so that complex and CPU-intensive functions are executed last (if at all).
Consider the following predicates, both Stage 2, both equals predicates:
SUBSTR(TYPE_CODE,2,1) = '5' AND MONTH(PER_DATE) = MONTH(START_DATE)
You should code the most restrictive predicate first. However, if these two predicates are equally restrictive, then code them in the order shown. In general, a single invocation of the simple SUBSTR function will consume less CPU than the two invocations of MONTH.
Another alternative would be to re-code the first predicate as follows:
TYPE_CODE LIKE '_5%'
This is because the LIKE operator is Indexable and Stage 1.
Re-Coding Predicates in a More Efficient Form
The previous example of re-coding a SUBSTR predicate with LIKE shows that there may be multiple ways of coding the identical predicate logic. Here are a few ideas.
Re-code initial substrings as either LIKE or BETWEEN to remove function invocation overhead and make predicate stage 1:
old SUBSTR(TYPE_CODE,2,1) = '2' new TYPE_CODE LIKE '2%' or, TYPE CODE BETWEEN '2 ' and '2999'
Re-code arithmetic expressions involving column values so that the column name appears alone on the left side of the expression. In the example below, :hv1 is a host variable containing a percentage:
old SALARY + :hv1 * SALARY > 50000 new SALARY > 50000 + 50000 / :hv1 or, SALARY > 50000 / (1 + :hv1)
Although numerically equivalent to the first predicate, the first change above will cause the SQL statement to fail if the value of :hv1 is equal to zero; hence, the second form would be preferred.
old HIRE_DATE - 1 DAYS < CURRENT DATE new HIRE_DATE < CURRENT DATE + 1 DAYS
Here, the date comparison remains the same, but the HIRE_DATE column appears by itself on the left of the "<" operator, making this a stage 1 predicate.
One last point on using SUBSTR. A major disadvantage of using this function in a predicate is that DB2 cannot use data distribution statistics to estimate the number of rows that may qualify. The optimizer will use a default filter factor, perhaps leading to an inefficient access path.
In cases where you are able to develop multiple equivalent predicates, I recommend executing Explains for all of them to determine whether the access path chosen by DB2 is efficient.
Aggregate Function Evaluation
If your query involves aggregate functions such as MAX and SUM, you can improve query performance by coding the SQL so that the functions are evaluated during data access, rather than later after sorting or data retrieval.
Use EXPLAIN to determine when DB2 evaluates the aggregate functions. In the PLAN_TABLE, column COLUMN_FN_EVAL shows when an SQL aggregate function is evaluated:
- R While the data is being read from the table or index
- S While performing a sort to satisfy a GROUP BY clause
- blank After data retrieval and after any sorts
Code the query so that every aggregate function that it contains meets the following criteria:
- Ensure that no sort is needed due to a GROUP BY. Execute an Explain, see columns SORTN_GROUPBY and SORTC_GROUPBY.
- Ensure that there are no stage 2 predicates.
- Do not use the DISTINCT function.
- For a Join, ensure that the aggregate function is on the last table joined. Execute an Explain, see column PLANNO.
- The aggregate function(s) are executed on single columns, with no arithmetic expressions.
- The aggregate function may not be one of the following: STDDEV, STDDEV_SAMP, VAR, VAR_SAMP.
Functions provide a convenient way to manipulate data. However, they also consume CPU time and sometimes prevent the optimizer from selecting an efficient access path. Develop alternatives and use the Explain facility to avoid wasting resources.
IBM • DB2 V9.1 Performance Monitoring and Tuning Guide SC18-9851; (see Chapter 16. Tuning your queries)
Sheryl Larsen Top Ten SQL Performance Tips
IBM developerworks Tuning DB2 SQL Access Paths
MyDeveloperConnecton SQL Performance and Tuning
IBM Systems Magazine An Intuitive Approach to DB2 for z/OS SQL Query Tuning