Synopsis. Oracle 10g has extended the already robust features of Structured Query Language (SQL) with a plethora of intriguing new capabilities. This article the final in this series reviews the new features that Oracle 10g provides for advanced data modeling and inter-row calculations via the new MODEL clause.
The previous article in this series reviewed Oracle 10g's enhancements to analytical functions, including the use of partitioned outer joins to densify data within multiple dimensions. As that article promised, I will now delve into the new MODEL clause that grants any Oracle developer or DBA the power to produce complex inter-row calculations using just the database engine, thus eliminating the need to massage data output via a third-party tool.
The Scenarios: Oh, What A Tangled Web Our Users Weave ...
As I mentioned in my previous article, these two scenarios were among the more complex I had seen recently:
- A user in Marketing needed a report that broke down sales within time periods, product types, and regions, including percentages of the total sales calculated at different control breaks.
- An Accounting department user needed a report that calculated the eventual value of a company's outstanding cash balance if particular assets were invested at different rates of interest, or over different time periods.
My first inclination would have been to tell my developers that they should only use Oracle to aggregate the required data, then export the data and whip up the final results through a third-party tool - perhaps nothing more complex than a Microsoft Excel spreadsheet - to accomplish these analyses. However, that was before I explored Oracle 10g's new MODEL clause.
Overview: The MODEL Clause
Let's explored how I used the MODEL clause to solve the first scenario above. I will modify the SH.SALES_MIDWEST_ONLY view created for the previous article so that view now selects a slightly different subset of products, but still within the Midwestern states. Listing 3.1 shows the revised view.
Next, I will construct some sample SQL to demonstrate the new MODEL clause:
SELECT state, prod, total_sales FROM sh.sales_midwest_only WHERE prod BETWEEN 125 AND 130 GROUP BY state, prod MODEL PARTITION BY (state) DIMENSION BY (prod) MEASURES (SUM(sold) AS total_sales) IGNORE NAV UNIQUE DIMENSION RULES UPSERT ( total_sales = total_sales[prod=125] + total_sales[prod=126], total_sales = total_sales[prod=127] + total_sales[prod=128] + total_sales[prod=129] + total_sales[prod=130], total_sales = total_sales - total_sales ) ORDER BY state, prod;
When this SQL statement is executed, Oracle retrieves the selected result set, places the result set into memory and then allows the MODEL clause to divvy up all returned rows into a multi-dimensional array. For example, a three-dimensional array can be visualized as a cube that has been cut into several horizontal slices, vertical slices, and layers. Continuing this cubist analogy (apologies to Pablo Picasso!), the MODEL clause also lets me apply rules that perform inter-row calculations on the data that is stored at each intersection of these slices and layers. (For the sake of simplicity and my own sanity, I promise to keep all examples under four dimensions.)
But wait it gets better! MODEL also lets me create additional sets of values for example, brand new rows that include the same columns as the other rows in the result set by direct assignment of values to the new row's cells. I find it useful to picture this feature just as if I'd inserted a new row into an Excel spreadsheet, and then added a formula, perhaps a SUM() or just simple addition or subtraction, to calculate values into that new row's cells based on the contents of other cells in the spreadsheet.
Since this may be the first time that you have seen the MODEL clause in action, I will break down this rather complex statement into its components. Let's first take a look at the statements that classify the returned columns of this query into three distinct types:
- Partition Columns. The PARTITION BY statement tells Oracle which column(s) will be used to split up the result set into partitions. The partitions are defined by the values present for each column specified in this clause. In the example above, I am using the values returned for the STATE column to partition my result set.
- Dimension Columns. The DIMENSION BY statement tells Oracle which column(s) will be used to split up the result set into dimensions, which are used to break out rows within a partition. I am using the values returned for just one column the Product Identifier to create my dimension values for the result set.
- Measure Columns. Finally, the MEASURES statement tells Oracle the column(s) on which inter-row calculations may be performed. It helps me to visualize the result set if I think of the values in the partition and dimension columns as subscripts for each intersecting cube in the result set, and the values stored within each cube as the candidates for the measurements I am going to be taking.
Enforcing Result Set Uniqueness. Two statements let me control how to handle uniqueness within the returned result set. UNIQUE DIMENSION, the default, tells Oracle that the columns specified in the PARTITION BY and DIMENSION BY columns must form a unique key for the query. In this example, however, I have specified UNIQUE SINGLE REFERENCE to tell Oracle to only check single-cell references on the right-hand side of the rule instead of the entire result set.
Handling Missing Values. What happens when there are no values present at an intersection? MODEL also provides the IGNORE_NAV statement to tell Oracle to ignore missing (i.e. null and absent) values. Since I specified IGNORE_NAV in this example, Oracle will place a zero in any numeric columns, 01-JAN-2000 for dates, an empty string (not a NULL!) for character datatypes, and a NULL value for any other datatypes. If I had not specified IGNORE_NAV, Oracle would instead assume the default, KEEP_NAV, was in force and would instead return NULLs for both null and absent values.
Rules and Cell Assignments
Next, I will turn my attention to how the MODEL clause defines the business rules for calculating values, and how those rules will be applied to the result set:
Applying Rules to Measure Columns. Each value that is found in a MEASURE column can be operated upon by one or more rules. Much like a simple algebraic equation, a rule has two components:
- The left-hand side of the rule simply specifies which cells are to be updated (or created see the next sections!) as a result of the rule's right-hand side instructions.
- The right-hand side evaluates to the values to be assigned to the cells specified on the left-hand side of the rule.
Rule Processing Order. MODEL also lets me define the order in which rules can be applied to the result set. Since I specified the default method, SEQUENTIAL ORDER, in this query, Oracle will apply the rules to the result set in the order that the rules have been specified. However, if AUTOMATIC ORDER is picked instead, Oracle will select which rule is to be processed based on the order of the individual rules' dependency. It is important to note that if AUTOMATIC ORDER is selected, the end result is that a cell will be assigned a value just once. On the other hand, when SEQUENTIAL ORDER is specified, a cell could be assigned a value several times based on the order of the rules being applied, and the value based on the last applied rule becomes the final value.
Processing Rule Assignments. MODEL provides two methods for processing these value assignments:
- The UPDATE directive (not to be confused with the UPDATE DML statement) tells Oracle to only apply the rules to the cells referenced on the left hand side of the rule that it finds in the multi-dimensional result set. If no match is found, nothing happens.
- On the other hand, the UPSERT directive instructs Oracle to apply the rules to all cells that match the left-hand side of the rule specification, as well as insert new rows for any that do not exist in the result set. It is important to note that this is the default behavior, but that it only matters when positional referencing is being used on the left-hand side of the rule, and one cell is being referenced.
Cell Assignments. The most complex set of statements describe which cells should participate in the rules, so naturally I have saved them for last. Fortunately, there are just two types, symbolic and positional.
- A symbolic reference uses a Boolean condition to tell the rule which specific single-dimension column the rule should be applied to. For example, in our sample SQL statement, this rule:
total_sales = total_sales[prod=125] + total_sales[prod=126],
tells Oracle to create a new row in the total_sales dimension that's indexed by value 99910, and add all values in the rows indexed by Product ID 125 and Product ID 126 into the columns in that new row.
- As its name implies, a rule with a positional reference determines which column in the dimension to apply the rule based on the column's relative position in the array described by the DIMENSION BY clause. For example, I can tell a rule to calculate a value in a target cell in the current row based on the difference between two source cells. However, one of the source cells could come from the row above the current row and the other source cell could be two rows below the current row. I will demonstrate positional reference in a later example.
Listing 3.2 shows the original SQL statement, and the results returned from its execution.
Another important difference in symbolic versus positional referencing is the way NULL values are treated. For example, if I was referencing a single cell in a two-dimensional array and I used a symbolic reference, the rule would not be applied; but if I used a positional reference for that same single cell, the rule would be applied. A slightly modified version of the previous query that uses positional rather than symbolic references is shown in Listing 3.3, and it effectively demonstrates this difference.
It Never Ends: Performing Multiple Calculations Repeatedly With ITERATE
Finally, the MODEL clause offers the ITERATE directive to perform the same calculation a specific number of times until either the maximum iterations have been completed, or a threshold value has been reached that forces the iteration to be terminated. To complete the example solution for my second scenario calculating outstanding cash balances based on the value of particular assets -- I have constructed a new table, SH.BALANCE_SHEETS, and populated it with some sample data representing several balance sheet accounts (see Listing 3.4).
Here are the business rules my users have supplied for projecting cash balances:
- Compute the value of Cash Discounts (Account #2020) as the sum of Cash on Hand (#1010) plus Accounts Receivable (#2010), then multiply by 30% to reflect Net 30 as our typical Cash Discount terms.
- Calculate the value of Earned Interest (Account #2030) by adding together Cash on Hand plus Accounts Receivable, then subtracting Cash Discounts, and then multiplying by 7.5% to reflect our expected interest rate.
- Derive Taxable Dividends (Account #3020) by adding together Cash on Hand and Accounts Receivable, subtracting Cash Discounts, adding Earned Interest, and then multiplying by 15% to reflect our expected tax rate.
- Finally, feed these resulting balances back into the calculation as the beginning balance for each account during the next iteration of the calculation.
This would be relatively simple to do in an Excel spreadsheet, of course, but with ITERATE I can now also perform it right on the Oracle server. Here is the SQL statement I constructed against SH.BALANCE_SHEETS to accomplish this:
SELECT acct#, fb FROM sh.balance_sheets MODEL DIMENSION BY (acct#) MEASURES (balance fb) RULES ITERATE (3) ( fb['3020'] = ((fb['1010'] + fb['2010'] - fb + fb) * 0.15) ,fb['2030'] = ((fb['1010'] + fb['2010'] - fb) * 0.075) ,fb['2020'] = (fb['1010'] + fb['2010'] * 0.30) );
Listing 3.5 shows the results of executing this query for one, two, and three iterations.
Oracle 10g's new inter-row calculation capabilities significantly expand the already-powerful suite of analytical functions so that any developer or Oracle DBA can perform complex, advanced data modeling and reporting in multiple dimensions, all without the use of third-party software to perform these analyses. Oracle 10g's extensions to Structured Query Language have insured it is among the most robust, flexible, and fully featured of any of the relational premier database management systems.
References and Additional Reading
While there is no substitute for direct experience, reading the manual is not a bad idea, either. I have drawn upon the following Oracle 10g documentation for the deeper technical details of this article:
B10736-01 Oracle Database Data Warehousing Guide, Chapter 22
B10750-01 Oracle Database New Features Guide
B10759-01 SQL Reference