MDX in Analysis Services: Optimizing MDX: More on Location, and the Importance of Arrangement

Monday Jan 26th 2004 by William Pearson
Share:

Bill Pearson continues the Optimizing MDX sub-series with more on the use of processing location to optimize our queries. He then leads an exploration of the optimization of set operations and expression management to help us assemble efficient MDX queries.

About the Series ...

This is the eleventh tutorial article of the series, MDX in Analysis Services. The series is designed to provide hands-on application of the fundamentals of MDX from the perspective of MS SQL Server 2000 Analysis Services ("Analysis Services,"); our primary focus is the manipulation of multidimensional data sources, using MDX expressions, in a variety of scenarios designed to meet real-world business intelligence needs.

For more information on the series, as well as the hardware / software requirements to prepare for the tutorials we will undertake, please see the first lesson of this series: MDX Concepts and Navigation.

Note: At the time of writing, Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples. The screen shots that appear in this article were taken from a Windows 2003 Server, and may appear somewhat different from coinciding views in other operating systems.

Introduction

In our last tutorial, Control Location of Processing, we began a three-part mini-series, Optimizing MDX. We stated that the objective of this "triptych" is to explore methods for optimizing the performance we obtain from our MDX queries. We introduced types of intervention, or classifications of approaches, that we can take, to enhance the performance of MDX, with the intent being to cover several of the types as we progress through the mini-series, while introducing practice examples to reinforce an awareness that transfers to our daily work with MDX. In the first lesson of the Optimizing MDX set, we introduced the first intervention type, control of location of query execution.

We focused on two of the main ways to effect control of location, the Large Level Threshold property and the Execution Location parameter, and performed hands-on practice with the settings involved with each. In this lesson, we will pick up where we left off, and:

  • Continue in our considerations of the types of intervention we can use to optimize MDX queries;
  • Explore optimization of set operations and syntax arrangement considerations.

More on Location

In our last lesson, we focused on two of the main ways to control processing location, the Large Level Threshold property and the Execution Location parameter, as a part of our discussion of the first intervention type, control of location of query execution. While these two options provide perhaps the most straightforward ways to control where a query is evaluated and executed, there are additional, less direct ways to force processing in a desired location. We will consider some of these approaches to conclude our discussion of the location control intervention type.

Other Means of Influencing Execution Location

At the individual query level, no means is readily available for a client application to direct execution location. We can, however, mandate that large-level operations execute at the client through the use of indirect means. The specification of a named set for use within the query will force processing at the client level. We can, therefore, create a named set (using either of the CREATE SET or WITH SET clauses), containing members of a large level, at the client, and then use the same named set within a query to force client-based execution.

Other options for indirect control of the processing location include calculated members and calculated cells. The manner of creation of a calculated member is important in determining its location-fixing effects. Using the CREATE MEMBER or WITH MEMBER clauses, within a query to define a calculated member at the server, will produce a calculated member that can be processed at the server or client equally successfully. By contrast, using CREATE MEMBER to produce a calculated member within a session will result in forced client-based execution of the query that houses it.

Calculated cells may also force client-based processing. Again, the manner in which the calculated cells are defined is important in determining their location-fixing effects. A calculated cell that is created with the CREATE CELL CALCULATION clause, at either the client or the server, can be processed at the server. By contrast, the use of the WITH CELL CALCULATION clause at the client will result in a query whose processing will be client-based.

The existence of two conditions can force a query to process on the server: a reference to a filter operation within the query, and (consistent with our discussion regarding large levels in our last lesson) a large dimension level. Let's take a look at the mechanics behind this in a little more detail.

We will first fire up the MDX Sample Application, having seen in past lessons how it provides an excellent platform from which to learn about MDX and, as in this case, about the data and the metadata in our cube. Many of the MDX operations that might be performed from a client application can be simulated here or elsewhere, as we demonstrate in an article in our DatabaseJournal Analysis Services series, Drilling Through to Details: From Two Perspectives. The Sample Application affords us another point of view of the interplay of the OLAP data source and MDX.

NOTE: It is important to remember that client applications will differ in many ways. Individual settings, design characteristics, capabilities, and other considerations will likely mean differences in operation and performance using the techniques we describe in our lessons, as in other references.

1.  Start the MDX Sample Application.

We are initially greeted by the Connect dialog, shown in Illustration 1.


Illustration 1: The Connect Dialog for the MDX Sample Application

The illustration above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2.  Click OK.

The MDX Sample Application window appears.

3.  Clear the top area (the Query pane) of any remnants of queries that might appear.

4.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

5.  Select the Warehouse cube in the Cube drop-down list box.

The MDX Sample Application window should resemble that depicted in Illustration 2, complete with the information from the Warehouse cube displaying in the Metadata tree (left section of the Metadata pane).


Illustration 2: The MDX Sample Application Window (Compressed View)

We will create an MDX query that helps us to "qualify" a second query as containing elements that cause it to fall within the two conditions we have exposed above that force a query to process on the server.

6.  Type the following query into the query pane of the Sample Application:


-- MXAS11-1:  Qualification through Count
WITH
   MEMBER[Measures].[Count] AS 
   'COUNT({ [Product].[Product Name].Members})'
SELECT
{[Measures].[Count]}ON COLUMNs
FROM Warehouse

Again, our intent here is to ascertain that an upcoming example expression ([Product].[Product Name].Members) will "qualify" as meeting conditions that would force it to process on the server. Therefore, in our first query, we are simply obtaining a count of the members of the given level.

7.  Execute the query using the Run Query button

The results dataset appears as shown in Illustration 3.


Illustration 3: Results Datset, Count Query

8.  Save the query in a convenient location as MXAS11-1.

We see that [Product].[Product Name].Members refers to a genuine Large Level, because the number of members in the Product Name level (1,560) of the Product dimension exceeds the Large-Level Threshold we set in our last lesson (750). For that matter, it exceeds even the default threshold that existed before our modifications (1000). (The number of members is also verifiable at the RDBMS level in the FoodMart2000.mdb sample that is installed with MSSQL Server 2000 Analysis Services).

Let's use the level whose population we have just quantified in the COUNT query above to illustrate. The inclusion of a filter within our query will also be a driver for server-based processing, as we shall see.

9.  Create the following new query:

-- MXAS11-2:  Qualification through Count & Filter
SELECT
{[Measures].[Units Shipped]} ON COLUMNS,
TopCount ([Product].[Product Name].Members, 
7, [Measures].[Units Shipped]) ON ROWS
FROM Warehouse

The use of TopCount() above provides an instance where server-based execution is likely to be appropriate, for the reasons we have already exposed. Most of the large level will be pruned away by the filter action before returning the result. As we noted earlier, existence of a filter operation within the query is another driver for server processing.

10.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 4.


Illustration 4: Results Set, TopCount() Query

11.  Save the query as MXAS11-2.

While level-size and filter requirements are good criteria to use in most cases for determining the likelihood of forced server-based processing, there are scenarios where even meeting or exceeding the parameters of these two criteria will not force a query to execute on the server. Examples of these situations include the presence of a function or functions within the query (say a user-defined function that is registered solely on the client) that cannot execute on the server. Also, as is somewhat obvious, a query that is executed against a local cube will not process on a remote server.

We have seen that options exist in the form of the Execution Location and Large Level Threshold properties to influence the physical location in which a query is processed. These two options within the query processing location intervention type could allow us to take advantage of performance gains based upon our being able to assign resources appropriately. While the Analysis Server provides for tuning in its provision for the establishment of a Large Level Threshold, as we saw in our previous session together, optimal syntax arrangement in our queries is perhaps a more important consideration. Before we attempt to optimize performance in this way, we need to gain a solid understanding of how the PivotTable Service works.

The Importance of Optimal Arrangement

To paraphrase some of what we have learned so far in the Optimizing MDX mini-series, queries that contain levels whose populations of members exceed the Large Level parameter will be processed by the server; queries with levels whose member populations do not exceed the Large Level Threshold in number are within the processing capabilities of the client-based PivotTable Service operation.

The manner in which the PivotTable Service processes queries can become a consideration within the determination of the location of processing. The service acts in such a way that each set (and every combination existing therein) defined in the query is fully incarnated in memory before proceeding with operations. Unsurprisingly, the demands on resources can be crippling for even the most robust machines.

Let's consider an example:

12.  Create the following new query:


-- MXAS11-3:  Pre-optimized Set Operation
SELECT
{[Measures].[Units Shipped]} ON COLUMNS,
TopCount (
   CrossJoin(
      [Warehouse].[City].Members,
      [Product].[Product Name].Members
),
7,
[Measures].[Units Shipped]
) ON ROWS 
FROM Warehouse

We have expanded our existing query well beyond its last incarnation, adding another dimension and a CrossJoin() function, as if to enhance it to meet (as an illustration) a specific need of an information consumer. We know that the query will be processed on the client if the population of the [Product].[Product Name] level is less than the Large Level Threshold (1,000 default / 750 the setting from our last lesson).

An important fact to remember is that the threshold refers to the level's members; it does not refer to the tuples that exist within a given set within a query. Say our Large Level Threshold has been set at 2200. We already know that we have 1560 named products (from the count we did above); we also have 23 warehouse cities. While the "memberships" of both levels fall well below the threshold of 2200, and thus qualify for client processing, and while our results dataset is the top seven city-product tuples from the perspective of units shipped, we have far more combinations, in the way of resource requirements, with which to contend from the scope of our query.

Our query will assemble over 35,000 combinations (1560 x 23) - all to arrive at the small results dataset that we obtain in the next step.

13.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 5.


Illustration 5: Results Dataset, CrossJoin() added to Query

Our query executes in a matter of moments, even though it is completing myriad more steps than we may have considered. However, things might have turned out quite differently, within the scope of the realities of similar - but much larger - scenarios that we might have encountered in the real world.

14.  Save the query as MXAS11-3.

Not taking the real cost of the actual combinations, and relying upon the large level and other level-member-based safeguards to protect us, may actually place a load upon the client that would challenge even a robust server. At the heart of this resource intensive situation lies our old friend the CrossJoin() function, although there are many other similar potential participants in inefficient query construction and operation. We will examine ways to manage these scenarios in the next section.

The Role of Expression Arrangement in Optimization

As many of us have learned, both in working with SQL and with MDX, the greatest strides in performance tuning can often be gained by tuning the expressions themselves. The simple rearrangement of a query can produce dramatic performance gains, as we will see in this section. We have discussed the control of resource use, to some extent, through the direction of a query's location of execution. We also exposed a scenario in the previous section where, even though the query might contain level populations that fall below the threshold for server processing, client processing can be more expensive than is apparent.

Let's return to our example above, and see what we can do in the way of arranging expressions to make our query less resource intensive. To do so, we will resume where we left off within the Sample Application:

1.  Create a new query to rephrase the last (saved as MXAS11-3), arranged as follows:

-- MXAS11-4:  Optimized Set Operation
SELECT
{[Measures]. [Units Shipped]} ON COLUMNS,
TopCount (
   Generate ( [Warehouse].[City].Members,
      TopCount ( CrossJoin ({ [Warehouse].CurrentMember}, 
         [Product].[Product Name].Members
          ), 7,  [Measures].[Units Shipped] 
       )
   ), 7,  [Measures].[Units Shipped] 
) ON ROWS 
FROM Warehouse

In our rearrangement of query MXAS11-3, we are exploiting the GENERATE function to rephrase our query to "determine the top seven tuples that we obtain from cross-joining the warehouse city and named products, for the current warehouse city. Then marry the results with the rest of the members of the warehouse city level."

The number of combinations that result within this query is 1,721 (1 x 1,560 named products, plus 7 x 23 cities). This represents a significant reduction over the number of combinations in the pre-optimized query, and results in a corresponding reduction of over 95 percent.

As we will see from the next step, the results will be identical to those in the pre-optimized version.

2.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 6.


Illustration 6: Results Dataset, Optimized CrossJoin() Query

3.  Save the query as MXAS11-4.

Through this small exercise, we have illustrated the dramatic performance gains that can be obtained by simply rephrasing a query. To summarize the approach, we might keep in mind that we can replace complex CROSSJOIN() statements with a GENERATE() and CROSSJOIN() combination, as we have demonstrated above. The following general rule summarizes the approach:

CrossJoin (X,Y) = 
GENERATE(X, CrossJoin(X.CurrentMember, Y)

Or

CrossJoin (X,Y) = 
GENERATE (CrossJoin(X,Y.CurrentMember), Y)

Let's take a look at another scenario, where the objective of our query is the summation of two measures, Units Shipped and Warehouse Sales, for a given set of products, stores and warehouses (all products, in all non-U.S. stores and warehouses, in our example). We can approach the query in a couple of ways, using calculated measures to effect the summations.

Let's create a straightforward query first, where we create calculated members that we combine with the measures we seek to obtain the correct calculations.

1.  Create the following new query:

-- MXAS11-5:  Pre-optimized Set Operation # 2
WITH 
MEMBER [Product].[Non-US Total] AS
'SUM ({[Product].[All Products].[Drink],  
[Product].[All Products].[Food]})'
MEMBER [Store].[Non-US Total] AS
'SUM ({[Store].[All Stores].[Canada],   
[Store].[All Stores].[Mexico]})'
MEMBER [Warehouse].[Non-US Total] AS
'SUM ({[Warehouse].[All Warehouses].[Canada],  
[Warehouse].[All Warehouses].[Mexico]})' 
SELECT
{[Measures].[Units Shipped],  
[Measures].[Warehouse Sales]} ON COLUMNS,
{[Time].[1998]} ON ROWS
FROM Warehouse
WHERE ( [Product].[Non-US Total],  
[Store].[Non-US Total], [Warehouse].[NON-US Total] )

Now, let's run the query and note the results.

2.  Execute the query using the Run Query button.

The results dataset appears as depicted in Illustration 7.


Illustration 7: Results Dataset, Pre-Optimized Summation Query

3.  Save the query as MXAS11-5.

Now let's consider another approach to obtaining the same objective. This time, we will place each set of calculation logic into a respective calculated measure.

4.  Create the following new query:


-- MXAS11-6:  Optimized Set Operation # 2
WITH 
MEMBER [Measures].[Total Units Shipped] AS
   'SUM (
      {[Warehouse].[All Warehouses].[Canada], 
[Warehouse].[All Warehouses].[Mexico]},
      SUM (
         {[Store].[All Stores].[Canada],  
[Store].[All Stores].[Mexico]},
            SUM (
              {[Product].[All Products].[Drink], 
[Product].[All Products].[Food]},
              [Measures].[Units Shipped] 
)))'
MEMBER [Measures].[Total Warehouse Sales] AS
 'SUM (
      {[Warehouse].[All Warehouses].[Canada], 
[Warehouse].[All Warehouses].[Mexico]},
      SUM (
         {[Store].[All Stores].[Canada] ,[Store].[All Stores].[Mexico]},
            SUM (
              {[Product].[All Products].[Drink],[Product].[All Products].[Food]},
              [Measures].[Warehouse Sales] 
)))'
SELECT
{ [Measures].[Total Units Shipped], 
[Measures].[Total Warehouse Sales] } ON COLUMNS,
{[Time].[1998]} ON ROWS
FROM Warehouse

Again, we will run the query and observe the results.

5.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 8.


Illustration 8: Results Dataset, Optimized Summation Query

6.  Save the query as MXAS11-6.

We note that the results are identical for the two approaches to the query (with the exception of the rounding of the Warehouse Sales amounts and the measure names, either of which could be adjusted, were they significant concerns). The second approach, however, performs appreciably better, as would be evident were the query extracting a much larger amount of data from a real-world data source. The following general rule summarizes the use of the commutative nature of the SUM function (and applies equally to the MAX, MIN, and TopCount() functions):


SUM (CrossJoin (X,Y), b) = 
SUM(X, SUM(Y, b))

In both the examples of performance enhancements we have explored above, and in any other potential performance enhancement scenario we might contemplate, we would need to consider the cost / benefit tradeoffs, actual performance metrics against different sources, and whether the differences in performance are too negligible to devote much time to attempting to increase performance further. The point is that tuning of MDX expressions can be accomplished, both in the manner suggested above or through other approaches, to maximize performance, and that beginning with the MDX statements themselves is usually the best approach to any optimization effort. Once the base query is tuned to an optimal state, we can always consider client / server balancing, hardware upgrades, and perhaps further measures to increase performance.

Summary and Conclusion ...

In this lesson, we continued our focus on the use of control of location as a primary intervention type within the realm of options available to us for MDX query optimization. We performed a practice exercise to reinforce the concepts exposed. We then extended our considerations of the available types of intervention to include the optimization of set operations and syntax arrangement considerations. Within our exploration of the importance of the optimization of set operations, we undertook practice examples that illustrated some ways we can rearrange queries to enhance performance, often significantly.

In our next lesson, the final article of the Optimizing MDX mini-series, we will, expose methods of caching to load a commonly used slice of a cube into memory, making for faster retrieval in prospective operations. Our discussion will include various aspects of cache creation, and uses of caching within MDX. In addition, we will touch upon other performance enhancement options, including external functions and cube design modifications and augmentation.

» See All Articles by Columnist William E. Pearson, III

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved