MDX Essentials: The CROSSJOIN() Function: Breaking Bottlenecks

Monday Feb 7th 2005 by William Pearson
Share:

Explore CROSSJOIN() considerations that arise when dealing with medium-to-large datasets. MSAS Architect Bill Pearson leads a practice session that explores ways to speed processing in queries where CROSSJOIN() tends to present bottlenecks.

About the Series ...

This article is a member of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements for getting the most out of the lessons included, please see my first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

Overview

In this article, we will return to a function we introduced in an earlier article, CROSSJOIN(), whose mission it is to combine two sets of dimension members into one. We will discuss how CROSSJOIN() works in general, and then discuss how its performance can become an issue in medium to large-sized data sets. We will then undertake a multiple-step practice example intended to assist us in gaining an understanding of the issues, before exposing approaches to minimizing the performance overhead that we can encounter in using CROSSJOIN() to reach our reporting and analysis needs.

To accomplish this objective, we will undertake the following steps in this article:

  • Create a copy of the Warehouse sample cube for use in our practice exercise;
  • Prepare the cube further by processing;
  • Discuss the operation of the CROSSJOIN() function, and factors that may make its use suboptimal in our queries;
  • Enhance a suboptimal query employing a CROSSJOIN() function by substituting NONEMPTYCROSSJOIN();
  • Provide further enhancement by leveraging the set count parameter in the NONEMPTYCROSSJOIN() function;
  • Explain the results we obtain from the steps we take to accomplish the solution.

The CROSSJOIN() Function: Breaking Bottlenecks

Introduction

As many of us know, the CROSSJOIN() function is highly useful anytime we wish to generate a cross-product of members in two different sets. The ability to specify "all possible combinations" is convenient - indeed, the most straightforward way to perform such a combination of two sets. Unfortunately, the indiscriminate use of the CROSSJOIN() function, like many other MDX functions, can slow reporting and analysis dramatically. This is often due to a failure to understand how the function performs set combinations, and how its action can lead to huge results datasets when applied to large cubes.

NOTE: For a detailed introduction to the CROSSJOIN() function, see my Database Journal article Basic Set Functions: The CrossJoin() Function.

In combining two sets, CROSSJOIN() combines every member of the first set (all from a single dimension) with every member of the second, creating a "Cartesian" effect as a result. Combining two sets, for example, with the following query will illustrate a scenario, on the scale of a decidedly small sample cube (the Warehouse sample that installs with MSAS), where we experience a taste of the consequences when the number of members in set 1, times the number of members in set 2, times the member population in set 3, results in many combinations.


SELECT 
    {[Measures].[Warehouse Profit]} ON COLUMNS,
   {CROSSJOIN([Warehouse].[Warehouse Name].Members,
        CROSSJOIN([Store].[Store Name].Members,
           [Product].[Product Name].Members))} ON ROWS
FROM  
   [WAREHOUSE]

This query would generate 898,560 combinations (24 individual Warehouses times 24 Stores times 1,560 distinct Products). A similar scenario, with the cube sizes we see these days, could be crippling to performance. This is aggravated by the fact that sparsity is common enough in large cubes, and CROSSJOIN() results on those cubes would likely have a much higher sparsity factor. (The results dataset produced by the above query yields only a tiny fraction of combinations (tuples) with non-empty measures.)

The process of generating all possible combinations, empty or not, lies behind the performance drag, and becomes even more pronounced when we go a step further and attempt to perform query operations, as we shall see in the practice example, that must wait for the combinations to be assembled, and then be applied to the resulting dataset. The time consumed in assembling a large number of empty combinations has been wasted when they are "tossed" in a subsequent step in the march toward the ultimate results.

For purposes of our practice procedure, we will assume that we have been asked by management of a hypothetical client to investigate degradation in performance of a query. The query was originally constructed at the request of a group of information consumers in the Corporate Planning department, shortly after the implementation of MSAS at the FoodMart organization. The creator of the query, who initially wrote the MDX in a way that seemed intuitive, intended to optimize it later. Unfortunately, he was laid off in the wake of sudden moves by management to move all developer functions offshore.

Attempts to communicate with the offshore support team were abandoned when it was learned that the building housing the group had been destroyed in a natural disaster that had swept the region.  (The inability to obtain assistance with this issue paled, we are told, with the loss of
documentation, code, and other collateral for a host of enterprise projects in process at the time of the disaster.  Corporate financial information has also "gone missing" - a fact that has been seized upon by the media in short order.)

We listen closely to the requirement. We then develop a plan to examine the query under consideration, before offering options for improving its performance. As is typically the case, we decide to work with a copy of the affected cube (in this case, the Warehouse cube) to allow the original to remain isolated.

Practice

Preparation

Create a Clone Cube

Let's get started by creating a clone of the Warehouse sample cube, which, along with the FoodMart database that contains it, accompanies an MSAS installation. This will allow us to keep the original sample cube intact for other uses.

1.  Open Analysis Manager, beginning at the Start menu.

2.  Expand the Analysis Servers folder by clicking the "+" sign to its immediate left.

Our server(s) appear.

3.  Expand the desired server.

Our database(s) appear, in much the same manner as shown in Illustration 1.

Click for larger image

Illustration 1: A Sample Set of Databases Displayed within Analysis Manager

4.  Expand the FoodMart 2000 database.

5.  Expand the Cubes folder.

The sample cubes appear, as shown in Illustration 2.


Illustration 2: The Sample Cubes in the FoodMart 2000 Database

NOTE: Your local databases / cube tree will differ, depending upon the activities you have performed since the installation of MSAS (and the simultaneous creation of the original set of sample cubes). Should you want or need to restore the cubes to their original state, simply restore the database under consideration. For instructions, see the MSSQL Server 2000 Books Online.

6.  Right-click on the Warehouse sample cube.

Again, we are making a copy of the Warehouse cube to isolate it. Our lesson will involve the execution of demanding queries upon the cube we use within the practice example. Our intention is to work with an isolated cube to leave the original fully available to other users.

7.  Select Copy from the context menu that appears.

8.  Right-click on the Cubes folder.

9.  Select Paste from the context menu that appears.

The Duplicate Name dialog appears.

As noted in previous articles, we cannot have two cubes of the same name in a given MSAS database.

10.  Type the following into the Name box of the Duplicate Name dialog:

MDX27_CROSSJOIN_TESTING

The Duplicate Name dialog appears, with our modification, as depicted in Illustration 3.


Illustration 3: The Duplicate Name Dialog, with New Name

TIP: As I have mentioned elsewhere in this and other series, the foregoing is also an excellent way of renaming a cube, (a "rename" capability is not available here, as it is in many Windows applications). Simply create a duplicate, give it the name to which you wish to rename the old cube, and then delete the old cube, as appropriate. This also works for MSAS databases, dimensions and other objects.

11.  Click OK to apply the name change, and create the cube.

The new cube, MDX27_CROSSJOIN_TESTING, appears in the cube tree, among those already in place. We now have a copy of the Warehouse cube, within which we can perform the steps of our practice exercise. Let's process the new cube to "register" it with Analysis Services, and to reach the "processed" state required for querying.



Process the Clone Cube

1.  Right-click the new MDX27_CROSSJOIN_TESTING cube.

2.  Select Process... from the context menu that appears, as depicted in Illustration 4.



Click for larger image

Illustration 4: Select Process... from the Context Menu

The Process a Cube dialog appears, as depicted in Illustration 5, with the processing method defaulted to Full Process (as this is the first time the cube has been processed).




Illustration 5: Full Process Selected in the Process a Cube Dialog

3.  Click OK to begin processing.

Processing begins. The Process viewer displays various logged events, then presents a green Processing completed successfully message, as shown in Illustration 6.


Illustration 6: Indication of Successful Processing Appears (Compact View)

4.  Click Close to dismiss the viewer.

We are now ready to work with the MDX27_CROSSJOIN_TESTING cube in the MDX Sample Application.

5.  Exit Analysis Manager when ready.

Procedure

We now have a clone cube with which we can pursue our examination of CROSSJOIN() optimization. Let's initialize the MDX Sample Application, as a platform from which to perform our practice exercises, taking the following steps:

1.  Start the MDX Sample Application.

We are initially greeted by the Connect dialog, depicted in Illustration 7.

Click for larger image

Illustration 7: The Connect Dialog for the MDX Sample Application

The illustration above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2.  Click OK.

The MDX Sample Application window appears.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the new MDX27_CROSSJOIN_TESTING cube in the Cube drop-down list box.

5.  Click File --> New to open a blank Query pane.

The MDX Sample Application window should resemble that shown in Illustration 8, complete with the information from the MDX27_CROSSJOIN_TESTING cube displaying in the Metadata tree (left section of the Metadata pane).


Illustration 8: The MDX Sample Application Window (Compressed View)

We will begin creating our query with a focus on returning results efficiently. We request the specific requirements for the query from the information consumers. They explain that they have requested to see Warehouse Profit, based upon individual Store (denoted in the cube as Store Name), Product (denoted as Product Name), and Store Type ("Store Type") combinations, for Supermarket Store Types specifically, for operating year 1997. They want to see the top fifteen combinations only, in descending order, as the final presentation.

We obtain the original query, to which we have referred above, from the MSAS instance on the computer abandoned by the developer upon his lay off. The query appears as follows:


SELECT 
    {[Measures].[Warehouse Profit]} ON COLUMNS,
   TOPCOUNT( {CROSSJOIN([Warehouse].[Warehouse Name].Members,
CROSSJOIN([Store].[Store Name].Members, 
       [Product].[Product Name].Members))}, 15, ([Store Type].
           [All Store Type].[Supermarket], [Measures].[Warehouse Profit]))
               ON ROWS
FROM  
   [MDX27_CROSSJOIN_TESTING]
WHERE
    ([Time].[1997]) 

The consumers with whom we are interacting tell us that the query does, indeed, give them the results they want, in the appropriate layout. We determine that we will create an identical query in the Sample Application, upon which we will apply enhancements to tune its performance. We will save each step as a separate query to allow us to "fall back," if necessary, to a previous step, as we incrementally modify the query.

1.  Create the following new query (identical, except for comment line, to the original):


--MDX28-1:  Original Query (Suboptimal)
SELECT 
    {[Measures].[Warehouse Profit]} ON COLUMNS,
   TOPCOUNT( {CROSSJOIN([Warehouse].[Warehouse Name].Members,
CROSSJOIN([Store].[Store Name].Members, 
       [Product].[Product Name].Members))}, 15, ([Store Type].
           [All Store Type].[Supermarket], [Measures].[Warehouse Profit]))
               ON ROWS
FROM  
   [MDX27_CROSSJOIN_TESTING]
WHERE
    ([Time].[1997]) 

2.  Execute the query using the Run Query button.

After running for a few seconds, the results dataset appears as depicted in Illustration 9.


Illustration 9: The Results Dataset - Original Approach

3.  Save the query as MDX28-1.

Based upon what we know about the CROSSJOIN() function, we can readily see that the query above can be optimized. First, we note that the query as originally written creates every Warehouse Name (24), Store Name (24), and Product Name (1,560) combination. This means, once again, that the query generates 898,560 combinations, many of which (as we mentioned above) are empty, a frequent occurrence in CROSSJOIN() results. Furthermore, in addition to taking the time and resources to generate all combinations, we are next performing our TOPCOUNT() function on all combinations, most of which, again, are empty in the first place.

Let's take the first step to make the query more efficient. We will substitute the NONEMPTYCROSSJOIN() function for the CROSSJOIN(). Our objective will be to filter out the empties. This alone will give us a significant boost in processing speed.

4.  Create the following new query:


--MDX28-2:  Introduce NONEMPTYCROSSJOIN() as initial improvement
SELECT 
    {[Measures].[Warehouse Profit]} ON COLUMNS,
   TOPCOUNT( {NONEMPTYCROSSJOIN([Warehouse].[Warehouse Name].Members,
        [Store].[Store Name].Members, 
                [Product].[Product Name].Members)} , 15, 
                   ( [Store Type].[All Store Type].[Supermarket], 
                       [Measures].[Warehouse Profit])) ON ROWS
FROM  
   [MDX27_CROSSJOIN_TESTING]
WHERE
   ([Time].[1997])

In this version of the query, we employ the formidable NONEMPTYCROSSJOIN() function to improve the query speed dramatically. As many of us are aware, NONEMPTYCROSSJOIN() returns the "non-empty" crossjoin of multiple sets. In accepting two or more sets for its input, nesting of the function is not required, unlike CROSSJOIN(), which can handle only two sets per function, and thus does require nesting (as we saw in the original query above). NONEMPTYCROSSJOIN() filters out calculated members in this case, as well, and so it might not be the best approach in a scenario where calculated members are to be returned (we will examine the optimization of CROSSJOIN() under such a scenario in another article).

5.  Execute the query using the Run Query button.

The query executes, this time palpably faster, and the results dataset appears as shown in Illustration 10.

Click for larger image

Illustration 10: The Results Dataset - Substitution of NONEMPTYCROSSJOIN()

6.  Save the query as MDX28-2.

Next, we will take advantage of another feature of the NONEMPTYCROSSJOIN() function, the set count parameter. We will do so within a modification to our last query.

7.  Change the comment line in the query to read as follows:

-- MDX28-3: Enhance NONEMPTYCROSSJOIN() through use of set-count parameter

8.  Save the query as MDX28-3, to protect MDX28-2.

9.  Place the cursor between .Members, in the sixth line of the query - counting the comment line - and the ")" character, as depicted in Illustration 11 (partial view).


Illustration 11: Placing the Cursor ... (Partial View of the Query)

10.  Press the ENTER key four times.

The query, with our inserted space, resembles that shown in Illustration 12.


Illustration 12: Query with Inserted Space (Partial View of the Query)

11.  Insert a comma (",") after .Members (at the point where we placed the cursor in Step 9).

12.  Insert the following between the line containing .Members, and the following line (containing " )}, 15, ").

{[Store Type].[All Store Type].[Supermarket]}, 3)  

The query, with the syntax inserted into the space we created above, resembles that depicted in Illustration 13.


Illustration 13: Query with Inserted Syntax (Partial View of the Query)

13.  Remove the entire line originally containing the following:

( [Store Type].[All Store Type].[Supermarket], 

The line to be removed appears circled in Illustration 14.


Illustration 14: Line to be Removed ... (Partial View of the Query)

14.  Remove one of the two parentheses (")") to the right of the following:

[Measures].[Warehouse Profit]

which appears just before the ON ROWS keywords, as shown in Illustration 15.


Illustration 15: Remove the Extra Parenthesis (Partial View of the Query)

15.  Remove the parenthesis (")") at the left of the following line:

) } , 15,

which appears just below the line containing

{[Store Type].[All Store Type].[Supermarket]}, 3) 

which we added above. The location of the character to remove is shown in Illustration 16.


Illustration 16: Remove the Left Parenthesis (Partial View of the Query)

The query, with modifications made in the last several steps, resembles that depicted in Illustration 17.


Illustration 17: Complete Query with Modifications

16.  Execute the query, using the Run Query button, once again.

The query executes, once again, and the results dataset appears as shown in Illustration 18 (identical to our last results dataset).


Illustration 18: The Results Dataset - After Final Modifications

17.  Resave the query as MDX28-3.

NOTE: Because I am aware that making modifications in stages as we did above might prove tedious for some, the full syntax appears below, for cut and paste transfer, if desired. If you cannot obtain the results shown above, you might find it easier to work with the version below (I have tidied it up slightly, as well, but the syntax is identical).


-- MDX28-3: Enhance NONEMPTYCROSSJOIN() through use of set-count parameter
SELECT 
    {[Measures].[Warehouse Profit]} ON COLUMNS,
    TOPCOUNT( {NONEMPTYCROSSJOIN
        ([Warehouse].[Warehouse Name].Members,
           [Store].[Store Name].Members, 
               [Product].[Product Name].Members, 
                    {[Store Type].[All Store Type].[Supermarket]}, 3)   } ,
                         15, [Measures].[Warehouse Profit])
                             ON ROWS
FROM  
   [MDX27_CROSSJOIN_TESTING]
WHERE
([Time].[1997])

To summarize our latest set of enhancements, we have taken the NONEMPTYCROSSJOIN() version to which we had evolved in our last query, and then restructured it to take advantage of the set count parameter. As we noted earlier, NONEMPTYCROSSJOIN() accepts two or more sets for crossjoining purposes. NONEMPTYCROSSJOIN() also allows for an optional set count parameter, which is used to determine the sets, beginning with the left most set we supply ("Set 1"), that will be included in the results dataset that appears.

In our first use of the NONEMPTYCROSSJOIN() function in this lesson, we placed the [Store Type].[All Store Type].[Supermarket] set in the Numeric Expression position of the TOPCOUNT() function, which is laid out as shown below:

TopCount(«Set», «Count»[, «Numeric Expression»])

for the following syntax (excerpted from our second query, MDX28-2):


         ( [Store Type].[All Store Type].[Supermarket], 
             [Measures].[Warehouse Profit])) ON ROWS

In this query, we enforced our "Supermarkets only" requirement by combining [Store Type].[All Store Type].[Supermarket] with the desired measure, [Measures].[Warehouse Profit] in the Numeric Expression portion of the function. While the overall approach in the second query resulted in dramatically better performance than the original query, we went it one better in the third approach.

In our third query, MDX28-3, we transferred the "filter" set, [Store Type].[All Store Type].[Supermarket] to the group of sets included within the NONEMPTYCROSSJOIN(), to boost performance even more. This gives us the same effect from a filter standpoint; the only way it might not work for us in meeting the consumers' requirement would be in the fact that we already have a result dataset that appeals to them from a presentation perspective.

In fact, we do not want to change the presentation at this point. Simply including [Store Type].[All Store Type].[Supermarket] in the NONEMPTYCROSSJOIN() sets, and leaving all else unchanged, would result in the results dataset depicted in Illustration 19.


Illustration 19: Results with Simple Transfer of "Filter" Set to NONEMPTYCROSSJOIN()

Fortunately, the NONEMPTYCROSSJOIN() function comes to our rescue with the set count parameter. This allows us to have the best of both worlds: enhanced performance and the presentation the information consumers have requested. By placing a "3" in the set count parameter, we are crafting the function to employ all four sets in "determining emptiness," but to return (in the results dataset) sets 1, 2, and 3 - and therefore to leave [Store Type].[All Store Type].[Supermarket] out of the picture. We are thus able to meet the business requirement.

As we mentioned earlier, NONEMPTYCROSSJOIN() may not be the best approach in scenarios where calculated members are involved. The function's inherent removal of calculated members might rule out its selection as an option. I will address alternatives for these scenarios in a later article.

18.  Exit the MDX Sample Application when ready.

Summary ...

In this article, we examined the use of the CROSSJOIN() function, and factors that can render this otherwise powerful tool suboptimal within our queries. We discussed a business need as defined by a hypothetical group of information consumers, in which we were asked to tune an MDX query for more optimal performance. As in the other articles of this series, after introducing our topic, we prepared for our practice exercise by creating and processing a "clone" of a sample cube (in this case the Warehouse cube.)

We next proceeded to examine the query under consideration, and determined that it contained a CROSSJOIN() function that formed the nucleus of the performance issues noted by the information consumers. We enhanced the query in successive steps, discussing the reasons for our modifications as we applied each. We substituted NONEMPTYCROSSJOIN() for the original CROSSJOIN() to make significant performance gains, and then provided further enhancement by leveraging the set count parameter in the NONEMPTYCROSSJOIN() function. Throughout our practice exercise we explained the results we obtained from the steps we took to accomplish the solution.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved