MDX Essentials: Basic Set Functions: The CrossJoin() Function

Monday Apr 12th 2004 by William Pearson
Share:

MSAS Architect Bill Pearson introduces the CrossJoin() function in a hands-on tutorial. Learn how to return the cross product of two sets as a single set.

About the Series ...

This is the eighteenth article of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements needed for getting the most out of the lessons included, please see the first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

What We Accomplished in our Last Article

In the last article of the series, Basic Numeric Functions: The Count() Function, we took a break from purely set-related functions to focus on the numerical Count() function, as it is applied to sets. We introduced the Count() function, discussing its purpose, to return the number of cells in a specified set. We then exposed options within the syntax for overriding the default behavior of the function with regard to its handling of empty cells.

Along with an introduction to the purpose of the Count() function, we examined the syntaxes surrounding the function. We next undertook illustrative examples where we used Count(), both with and without the EXCLUDEEMPTY keyword, to meet a hypothetical business need. Finally, throughout the steps of our examples, we discussed the results we obtained using MDX.

As a part of the practice exercises in our last article, we performed a query using the CrossJoin() function. Our purpose was to use CrossJoin() as a "proofing" procedure, to ensure the accuracy and completeness of the results we obtained within our practice with the Count() function. As part of this exercise, we were afforded a preview of the CrossJoin() function, upon which we will focus this article.

Introduction

In this lesson, we will expose another highly popular function in the MDX toolset, the CrossJoin() function. The general purpose of the CrossJoin() function is to allow us to produce all combinations of two sets. A primary driver for wanting to do this is to "collapse" two (or perhaps more) dimensions onto a single axis.

Uses of the CrossJoin() function can be quite sophisticated, and, as we shall learn, its use should be tempered with an understanding of its potential impact upon performance. We will introduce the function, commenting upon its operation and touching upon performance considerations at a general level, and then we will:

  • Examine the syntax surrounding the function;
  • Undertake illustrative examples of the uses of the function in practice exercises;
  • Briefly discuss of the results datasets we obtain in the practice examples.

The CrossJoin() Function

According to the Analysis Services Books Online, the CrossJoin() function "returns the cross product of two sets." CrossJoin() is particularly handy as it allows us to prepare data for presentation in a matrix format - the need to return "all possible combinations" of the members / tuples of two or more sets is common.

CrossJoin() is powerful, as it allows us, in effect, to nest one dimension inside another, but in a direct manner that enables the avoidance of a great deal of extra coding that might be required to obtain the same result through other means. A sample CrossJoin() result set, used together with the NON EMPTY keyword (discussed later in this article) in our last lesson as a preview, appears in Illustration 1.

Click for larger image

Illustration 1: Sample Result Dataset of a CrossJoin() Function

We will examine the syntax for the CrossJoin() function, mention a means for removing empty rows / columns from the results dataset it retrieves, and then undertake practice examples built around hypothetical business needs that illustrate logical uses for the function. We will extend the examples to perform exclusion of empty combinations in a query we construct, to afford us a scenario for examining an approach in that regard.

Discussion

As we have already discovered, the CrossJoin() function returns a cross product of the members of two different sets. The result of the use of such a cross product is the placement of two dimensions on the same axis, be it columns or rows, where CrossJoin() creates a new set made up of all possible combinations (after the manner of a "Cartesian product") of the original sets' members.

CrossJoin() handles only two dimensions at a time, so if we need to perform a cross product across more than two dimensions, we must nest the functions to realize our aims (we will see an example of this in our practice exercises). Furthermore, the two sets upon which we are performing a cross join must exist in separate dimensions. We can suppress empty columns / rows for the results dataset with the inclusion of NON EMPTY before the CrossJoin() function, which effectively transforms it to NonEmptyCrossJoin(), a function in its own right.

Because the CrossJoin() function returns a set that consists of the matches between every member of the first set and each member of the second set, it is important to realize that large datasets can be returned, and that performance can suffer as a result. If, for instance, we cross join two sets containing 200 members each, our results dataset will contain 40,000 tuples.

For many cubes, the CrossJoin() function returns datasets with empty measures associated with a significant number of the resulting tuples. We can often experience a performance boost with a NonEmptyCrossJoin(), whose action is to eliminate the tuples with which no data is associated from the results dataset. NonEmptyCrossJoin() also provides other features that can facilitate more efficient processing, as we shall see in a subsequent article that is devoted to NonEmptyCrossJoin().

Let's look at some syntax illustrations to further clarify the operation of CrossJoin().

Syntax

Syntactically, the sets between which we wish to return a cross product are specified within the parentheses to the right of CrossJoin. The syntax is shown in the following string.

CrossJoin(<<Set1>>, <<Set2>>)

Contrary to a couple of texts I have come across, the order in which the sets are placed in the CrossJoin() function is significant. Placing the sets in different orders often results in a materially different results dataset. The order of tuples in the resulting set depends on the order of +Set1; and +Set2;, and the order of the members within each set; the order should be considered in the light of its effects upon the presentation of final results.

An alternative syntax simply places an asterisk ("*") between the sets, as shown in the following string:

<<Set1>> * <<Set2>>

The alternate syntax is an extension to MDX in OLE DB 2.0 and greater, housed within MSSQL Server 2000 Analysis Service.

The following example expression illustrates a use of the Crossjoin() function. Let's say that a group of FoodMart information consumers, whose data is housed within the Warehouse cube, wish to see the total Units Shipped by U.S. Warehouse-State for each quarter of 1998. Furthermore, the consumers wish to see the Units Shipped in a single column in the report, and the Quarter / Warehouse-State information in the row axis of the presentation.

The basic CrossJoin() function, which would define the row axis of the result dataset, would be constructed as follows:

CROSSJOIN(

   {[Time].[Year].[1998].Children},
   {[Warehouse].[Country].[USA].Children}

This is simply expressing that we wish to display the "Warehouse-State by Quarter" in the row axis, and, assuming that the Units Shipped measure was specified for the column axis, our result dataset would be identical to that shown in Illustration 2.


Illustration 2: Results Dataset, Syntax Sample, CrossJoin() Function in Row Axis

The results appear to be as expected from a numerical standpoint, as well. Our expectation would be to see a total of twelve combinations, every possible combination of 1998 Quarters and the U.S. Warehouse-States, produced by the cross product of 1998's four quarters (the first of the sets in the syntax sample above), times the number of Warehouse-States (three). The twelve combinations are indeed the result, as we see in the illustration above.

NOTE: For information surrounding the .Children function, see MDX Member Functions: The "Family" Functions.

We will practice the use of the CrossJoin() function in the section that follows. Moreover, we will explore the use of the function when we need to create multiple cross products (that is, cross joining more than two sets). Finally, we will practice the use of NON EMPTY to activate the concept of its use in eliminating empty rows / columns from our presentation of CrossJoin() results. In each case, we will discuss briefly the results we obtain to reinforce our overall understanding of the CrossJoin() function.

Practice

The Basics

To reinforce our understanding of the basics we have covered so far, we will use the CrossJoin() function in a manner that illustrates its operation in the simplest of scenarios: We will construct a simple select query that creates an axis using CrossJoin().

As we have in many of the preceding articles of the series, will use the MDX Sample Application as a platform from which to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Start the MDX Sample Application.

2.  Clear the top area (the Query pane) of any queries or remnants that might appear.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Warehouse cube in the Cube drop-down list box.

Let's assume for our practice example that we have received a call from corporate Finance needing some statistics for an upcoming presentation to the stockholders. The Finance information consumers wish to know Warehouse Sales, Cost, and Profit balances for operating years 1997 and 1998, by U. S. States of Store operation.

To rephrase, the objective will be to present three annual measures, Warehouse Sales, Cost and Profit for each of the years 1997 and 1998, broken out by U. S. Store-State. While we can accomplish this by cross joining the Year and State dimensions on either the column or the row axis, the row axis will provide us a more compact presentation. We can easily see that three measures across the columns will result in a narrower matrix than to use the six combinations (two Years times three U.S. States of Store operation) that we would expect to be retrieved by an alternative arrangement of the proposed CrossJoin() function.

Let's construct a simple query, therefore, to return the measures as columns and the cross-joined sets as rows.

5.  Type the following query into the Query pane:


-- MXAS18-1, Basic CrossJoin() Query
SELECT
  { [Measures].[Warehouse Sales], [Measures].[Warehouse Cost],    
      [Measures].[Warehouse Profit]} ON COLUMNS,

   CROSSJOIN (
       {[Time].[Year].Members},
       {[Store].[Store Country].[USA].Children} ) ON ROWS
FROM
   [Warehouse]

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 3 appears.


Illustration 3: Result Dataset - Basic CrossJoin() Example

We see the Warehouse Sales, Cost and Profit measures populating the columns across, and the cross-product combinations of Year (from the Time dimension, naturally) and Store-States (from the Store dimension) appearing on the row axis.

7.  Select File -> Save As, name the file MDX18-1, and place it in a meaningful location.

8.  Leave the query open for the next section.

Multiple Cross Products

Next, let's explore the use of a CrossJoin() function in a scenario where a business need exists to cross join more than two sets. We will say, as an illustrative example, that we have been requested to present the measures we retrieved earlier, Warehouse Warehouse Sales, Cost and Profit, once again for operating years 1997 and 1998, by U. S. States of Store operation. The difference this time, however, is that the information consumers in Finance have requested that we present the existing row information by Store Type.

To rephrase, the objective will be to present the three annual measures, Warehouse Sales, Cost and Profit for each of the Store Types (there are six, including the Headquarters classification, which appears to amount to a "non-store" group), by years 1997 and 1998, broken out by U. S. Store-State. Again, a quick analysis of the multiples lead us to conclude that performing our cross joins on the row axis will provide us a more compact presentation. (Six Store Types times two Years times three U.S. States of Store operation is 36 possible combinations, as opposed to the three columns required by the measures, as we placed them in the columns before).

Let's construct the query to return the measures as columns and the cross-joined sets as rows. We can begin with our initial query.

1.  Within the query we have saved as MDX18-1, replace the top comment line of the query with the following:

-- MXAS18-2, Nested CrossJoin() Query

2.  Save the query as MDX18-2, to prevent damaging MDX18-1.

3.  After the following line:


{[Measures].[Warehouse Sales], [Measures].[Warehouse Cost], 
    [Measures].[Warehouse Profit]} ON COLUMNS,

And before the CrossJoin() expression already in place (shown next):


CROSSJOIN(
   {[Time].[Year].Members},
   {[Store].[Store Country].[USA].Children})) ON ROWS

Insert the following expression:


CROSSJOIN(
   {[Store Type].[Store Type].Members},

4.  Insert a close parenthesis ")" just before the ON ROWS keyword.

(This serves to close the "outer" CrossJoin(), within which our original CrossJoin() function is to be "nested.")

The Query pane appears as shown in Illustration 4, with the inserted coding circled in red.


Illustration 4: The Query with Modifications Circled

NOTE: Don't forget the extra closing parenthesis symbol ")".

5.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 5 appears.


Illustration 5: Result Dataset - Nested CrossJoin() Query

6.  Re-save the file as MDX18-2.

7.  Leave the query open for the next section.

We have achieved our objective of nesting the first function within the second, and thereby in cross-joining three sets from three separate dimensions. We note that there are numerous blank cells; it is easy to see that the reason for many of the "empties" is the fact that, as we have mentioned, the Headquarters Store Type is not a standard operating classification, as opposed to the other Store Types, which actually generate revenues, operating costs, and, therefore, profits / losses.

We could use a Filter() function to assist us, or we could simply stipulate the Store Types that we wish to show individually in our query, versus using the .Members function, but both these approaches would constitute more complication - particularly since the first would still leave some "blanks" even after removing Headquarters from the equation, and necessitate additional coding.

NOTE: See Basic Set Functions: The Filter() Function in this series for a detailed look at the Filter() function, and MDX Members: Introducing Members and Member for a hands-on introduction to .Members.

This offers us an opportunity to examine one final facet of the use of the CrossJoin() function, however, as a quick means to our end. We will make use of the NonEmptyCrossJoin() function to suppress empty rows and columns in our result dataset.

Excluding Empty Columns and Rows

While we will merely use it to practice suppressing empty rows and columns in a small practice example, NonEmptyCrossJoin() is a function in its own right in the MDX collection. Where we will simply insert the words Non Empty in front of an existing CrossJoin() function to see the results, the NonEmptyCrossJoin() function will warrant more in-depth examination. We will devote an article to it exclusively in the near future.

For now, let's simply explore a way we can use the operation of the function to clean up our result dataset a bit. We will reuse the query from our last step, as it presents an ideal scenario from which to bring about the suppression of empty columns and rows.

8.  Within the query we have saved as MDX18-2, insert the following:

NON EMPTY

Just after the ON COLUMNS keyword in the query. That is, place it between:


ON COLUMNS,
and 
CROSSJOIN(
   {[Store Type].[Store Type].Members},
   CROSSJOIN(
   {[Time].[Year].Members},
   {[Store].[Store Country].[USA].Children})
       ) ON ROWS

9.  Replace the top comment line of the query with the following:

-- MDX18-3:  Nested CrossJoin() Query  w NON EMPTY

The Query pane appears as shown in Illustration 6.


Illustration 6: The Query with Modifications Circled

10.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 7 appears.


Illustration 7: Result Dataset - Empty Rows and Columns Eliminated

The results are compacted - the empty rows and columns have been eliminated.

11.  Save the file as MDX18-3.

We have reached our objectives, both in meeting the consumer requirements and in suppressing the empty rows and columns to make the presentation of our dataset more compact. As we have stated, we will return to the NonEmptyCrossJoin() in a later article, where we can devote more time to its additional characteristics and considerations.

Summary ...

In this lesson, we explored the highly popular CrossJoin() function. We discussed the general purpose of the function, to allow us to produce all possible combinations of two sets, and cited its common use to "collapse" two or more dimensions onto a single axis for purposes of presenting multidimensional data in a two-dimensional matrix.

We introduced CrossJoin(), commenting upon its operation and touching upon performance considerations at a general level. Next, we examined the syntax surrounding the function. We then began practice exercises to illustrate the operation of the function, within hypothetical scenarios where we described a business need, and then set about solving the need with a straightforward MDX query that exploited the CrossJoin() function. Finally, we briefly discussed the results datasets we obtained in the practice examples.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved