MDX Essentials: Basic Set Functions: The Union() Function

Monday Nov 10th 2003 by William Pearson
Share:

Combine two sets with the valuable Union() function. Join Bill Pearson in a hands on session that centers around Union() and its alternates.

"So we grew together,
Like to a double cherry, seeming parted,
But yet an union in partition"
William Shakespeare,
A Midsummer Night's Dream, Act III, Sc. 2

About the Series ...

This is the thirteenth article of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements needed for getting the most out of the lessons included, please see the first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

What We Accomplished in our Last Article

In the last article of the series, Basic Set Functions: The Order() Function, we explored the versatile, Order() function, which we find to be useful in many queries - queries that rank from the simplest to the most advanced. The Order() function provides the sorting capabilities we need within MDX; its support of the two general types of sort order, hierarchized and nonhierarchized, make it an important part of our analysis toolsets.

In addition to discussing the purpose of the Order() function, we examined the syntax surrounding its uses, and illustrated its application in practice exercises, providing hands-on exposure to both hierarchized and nonhierarchized order types. Finally, we discussed the results we obtained in each exercise, remarking on the distinguishing characteristics of each.

Introduction

In this lesson, we will focus our attention on another commonly used MDX tool, the Union() function. The rather straightforward purpose of the Union() function is to combine two sets into one, but the direct and indirect uses to which the function can be put are legion. Suffice it to say that Union() provides important capabilities within MDX, and is thus another valuable part of our analysis toolsets.

Along with an introduction to the Union() function, this lesson will include:

  • an examination of the primary and alternate syntaxes surrounding the function;
  • illustrative examples of the uses of the function in practice exercises;
  • a brief discussion of the MDX results we obtain in the practice examples.

The Union() Function

The Union() function, according to the Analysis Services Books Online, "returns the union of two sets, eliminating duplicates by default." The function provides, moreover, a means of retaining duplicates for those scenarios where this might be desirable. As we shall see, alternative ways of accomplishing unions of sets, both with and without duplicates, are possible in Analysis Services using its extensions to OLE DB MDX.

We will examine the syntax for the Union() function in general, building to the union of two sets after creating an individual query for each, to show that the Union() function does, indeed, generate the results we might expect within the context of an illustrative business need. Next, our practice example will be modified to introduce the possibility of duplication within a Union() function, so that we can examine the means that MDX affords us for managing duplication. Finally, we will present alternative syntax where appropriate, illustrating its use in practice examples that parallel uses for the primary syntax we present, so as to make the comparisons more meaningful.

This will give us a richer understanding of the options we have within the Union() function, together with a feel for the similarities, as well as for the differences, of the results those options afford.

Discussion

Union() allows us to combine sets. The primary syntax we use to take advantage of Union() affords us an optional flag to force the presentation of duplicates that might occur within the sets we are subjecting to the Union() function. The default is the elimination of duplicates, through the removal of duplicate members from the end ("tail") of the unioned set, as we shall see.

Analysis Services also presents a couple of alternate syntax structures that achieve the effects of Union(), and that carry implicit handling of duplicates within their definitions. As we mentioned earlier, the alternate syntax structures comprise extensions to MDX in OLE DB, and are thus relevant specifically to Analysis Services implementations.

The Union() function acts as a vehicle to allow us to meet myriad common and uncommon business needs, including the obvious need to bring together two sets into a single set, for presentation, computation, and perhaps other purposes. Additional uses I have seen include forced elimination of duplicates between two sets, in scenarios where this default behavior can accomplish an elimination where other attempts fail, or are more costly with regard to operating overhead and so forth.

Let's look at some syntax illustrations to further clarify the operation of Union().

Syntax

Syntactically, the sets to be combined in the Union() function are placed within the parentheses to the right of Union, and separated by a comma. The optional ALL flag overrides the default behavior of the function with regard to duplicates, and allows duplicates to remain in the set that is produced. The syntax is shown in the following string:

Union(<<Set1>>, <<Set2>>[, ALL])

Alternate forms of the function also exist. These are shown in the following strings:

Alternate Syntax 1:

{<<Set1>>, <<Set2>>}

Alternate Syntax 2:

<<Set1>> + <<Set 2>>

Remarks

The Union() function (using either primary or alternative syntax) returns the combination of two sets' data. Use of the ALL flag in the primary syntax allows duplicate members to remain within the newly produced set, as we shall see in a step in the practice example.

We can also combine sets via one of the alternate syntaxes shown above. We can accomplish the union by enclosing the sets in a list-like manner, separating them with a comma; additionally, we can simply place a plus ("+") operator between the sets we intend to join.

According to the SQL Server 2000 Books Online, the use of either alternative approach is equivalent to the primary approach with an ALL flag in place; that is, anytime one of the alternate syntaxes is chosen, duplicates within the newly created set are retained in the set. (My experience appears to differ: I find that, for one of the alternate syntaxes, duplicates are not retained. We shall see an example of this in the practice set of the next section.)

The following example expression illustrates a use of the primary Union() function. Suppose we are asked by a group of FoodMart information consumers to present total Warehouse Sales for the state of Washington, by child city, together with the total sales of one city in Oregon, Portland, for reasons known only to management. We might approach this need with an expression similar to this:

UNION(

{[Store].[All Stores].[USA].[WA].Children},

{[Store].[All Stores].[USA].[OR].[Portland]})

This expression in a proper query, for the measure Warehouse Sales, would result in the return of the set depicted in Table 1.

Warehouse

Sales

Bellingham

11,509.54

Bremerton

13.530.31

Seattle

921.39

Spokane

981.81

Tacoma

2,294.52

Walla Walla

3,249.29

Yakima

4,454.60

Portland

12,335.21



Table 1: Results of a Union, Selecting Warehouse Sales as the Measure

In the expression above, we use the Union() function, in combination with the .Children member function (See MDX Member Functions: The "Family" Functions for a tutorial on this and related family functions) to enumerate the Washington, and Oregon (in this case, one specific city, Portland) child cities.

We will practice the use of the Union() function in the section that follows. Moreover, we will explore the use of the alternate syntaxes we have discussed, after the steps of our practice with the primary syntax, to consolidate our focus and activate the concepts that parallel those of the primary syntax.

Practice

The Basics

Let's reinforce our understanding of the basics we have covered so far, and extend those concepts to the "alternate sisters" of the primary Union() syntax. We will use the Union() function in a manner that illustrates its operation in a multi-step example that builds from a pair of simple select queries, to a combination of the two in a Union() illustration. From this point, we will investigate how to obtain similar results with the alternate syntaxes. We will call upon the MDX Sample Application again, as our tool for constructing and executing the MDX we examine, and for viewing the result datasets we obtain.

1.  Start the MDX Sample Application.

2.  Clear the top area (the Query pane) of any queries or remnants that might appear.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Warehouse cube in the Cube drop-down list box.

Let's assume for our practice example that we have been asked to supply data for a similar need as that we instanced in the example used in the syntax section. To reiterate, we are asked to present total Warehouse Sales, a value that is captured monthly within the FoodMart organization and which is stored in the Warehouse cube, for the state of Washington, by child city, together with the total sales of one city in Oregon, Portland. We will begin by composing a simple query to select the sales figure for the Washington cites, followed by a query to select the same information for the single Oregon city. We will then combine the two with the Union() function and its alternates.

5.  Type the following query into the Query pane:

-- MDX13-1:  Tutorial Query Step 1
SELECT
{[Measures].[Warehouse Sales]} ON COLUMNS,
{[Store].[All Stores].[USA].[WA].Children} ON ROWS
FROM Warehouse

The purpose of this query is to simply generate the "first half" of the pair of sets that we intend to join together in the Union() query that will follow.

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 1 appears.


Illustration 1: Result Dataset - Initial Set to be "Unionized"

We see the total Warehouse Sales for Washington returned (actually for all years in the cube, as we do not specify that we want further detail).

7.  Select File -> Save As, name the file MDX13-1, and place it in a meaningful location.

8.  Select File --> New from the main menu.

A new, blank Query pane appears.

9.  Type the following query into the Query pane:

-- MDX13-2:  Tutorial Query Step 2

SELECT

{[Measures].[Warehouse Sales]} ON COLUMNS,

{[Store].[All Stores].[USA].[OR].[Portland]} ON ROWS

FROM Warehouse

The purpose of this query is to simply generate the "second half" of the pair of sets that we intend to join together in the Union() query that we will construct next.

10.  Execute the query by clicking the Run Query button in the toolbar.

Analysis Services populates the Results pane, and the dataset shown in Illustration 2 appears.


Illustration 2: Result Dataset - The "Second Half" of the Pair of Candidate Sets

11.  Save the file as MDX13-2, and place it in a meaningful location.

We have now seen the data populating the two sets that we will next combine via the Union() function. This should make the operation of the function clearer in our next step.

12.  Select File --> New from the main menu.

A new, blank query pane appears.

13.  Type the following query into the Query pane:

-- MDX13-3:  Tutorial Query Step 3

SELECT

{[Measures].[Warehouse Sales]} ON COLUMNS,

UNION(

{[Store].[All Stores].[USA].[WA].Children},

{[Store].[All Stores].[USA].[OR].[Portland]})

   ON ROWS

FROM Warehouse

The purpose of this query is to combine the pair of sets we have examined individually, using the Union() function.

14.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 3 appears.


Illustration 3: Our First Union() Function Results

At last we see the effects of the Union() function using the primary syntax. Note that the single Oregon city appears along with the Washington child cites, just as the information consumers have stated that they wanted. This simple example makes clear the operation of the Union() function in the context of its primary syntax.

15.  Save the file as MDX13-3.

Next, we will examine the operation of the function from the standpoint of how it handles duplicate members in the sets it is being asked to combine.

Dealing with Duplication

Let's establish a scenario whereby we can explore the handling of duplicates by the Union() function. We will accomplish this by creating a query whose objective is to combine two sets that we know to contain a duplicate member, and will thus perhaps step out of the realm of a credible request from our information consumers. This is to make the operation of the function clear, however, and we can certainly rely upon the fact that there are many real-world scenarios where dealing with duplicates is a fact of life.

1.  Select File --> New from the main menu.

The blank query pane appears.

2.  Type the following query into the Query pane:

-- MDX13-4:  Tutorial Query Step 4

SELECT

{[Measures].[Warehouse Sales]} ON COLUMNS,

UNION(

{[Store].[All Stores].[USA].[WA].Children},

{[Store].[All Stores].[USA].[WA].[Bellingham]})

ON ROWS

FROM Warehouse

The Union() function is called into action this time to perform a combination between two sets that we know to contain an identical member; we are asking that the complete set of the Washington city-children, which includes (see Step 1 of the practice exercise above) the city of Bellingham, be combined with the set of the single city of Bellingham. The purpose, again, is to illustrate clearly the default behavior of the Union() function, primary syntax, with regard to the handling of such duplicates.

3.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 4 appears.


Illustration 4: Union() Function Results - Duplicates Eliminated

4.  Save the file as MDX13-4.

The behavior of the Union() function is demonstrated in its elimination of the duplicate Bellingham member - a characteristic that is the default of the primary syntax with which we have been working. We note, too, that the duplicate is deleted from the "tail"; the last of the duplicate pair is the one that is "dropped."

We can prove this to ourselves by overriding the default behavior, and using the ALL flag to allow the duplicate to be retained. To do so, we will make an adjustment to the query we created in Step 4 above.

5.  Insert a comma, and then the word "ALL" into the second set specification within the Union() function, that is:

{[Store].[All Stores].[USA].[WA].[Bellingham]})  

between the right-most curly brace and the closing parenthesis, as shown:

{[Store].[All Stores].[USA].[WA].[Bellingham]},ALL)

6.  Modify the comment line to read:

-- MDX13-5:  Tutorial Query Step 5

The Query pane appears as shown in Illustration 5.


Illustration 5: The Query with Modifications Circled

7.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 6 appears. We see the "tail" instance of the duplicate Bellingham set is retained, thanks to the insertion of the ALL flag into our Union() function.


Illustration 6: Union() Function Results - Duplicates Retained via the ALL Flag

8.  Save the file as MDX13-5.

And so we see how to retain duplicates in a simple example that illustrates the influence of the ALL flag upon the Union() function. Let's turn now to the alternate syntaxes, both to examine them in general as options we might use, and to explore their implicit default behavior with regard to handling duplicates.

9.  Select File --> New from the main menu of the Sample Application.

A blank query pane appears.

10.  Type the following query into the Query pane:

-- MDX13-6:  Tutorial Query Step 6

SELECT

{[Measures].[Warehouse Sales]} ON COLUMNS,

({[Store].[All Stores].[USA].[WA].Children} +

{[Store].[All Stores].[USA].[WA].[Bellingham]})

ON ROWS

FROM Warehouse

The Union() function is called into action this time using one of two alternate syntaxes. In this approach, we use the "+" operator to effect the Union() function, with a noteworthy characteristic of this arrangement being that the UNION keyword is not used at all. Let's take a look at the results we obtain through the use of this alternate approach.

11.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 7 appears.


Illustration 8: Union() Function Results - Alternate Using "+" Operator

12.  Save the file as MDX13-6.

Although the Books Online state that "Duplicated members are always retained when an alternate syntax is used," we see clearly in the foregoing results that the duplicate has been eliminated, just as it was using the primary syntax (default scenario.) Now let's attempt the same operation with the other alternative option in a final query.

13.  Select File --> New from the main menu.

A blank query pane appears.

14.  Type the following query into the Query pane:

-- MDX13-7:  Tutorial Query Step 7

SELECT

{[Measures].[Warehouse Sales]} ON
COLUMNS,

{{[Store].[All Stores].[USA].[WA].Children},

{[Store].[All Stores].[USA].[WA].[Bellingham]}}

ON ROWS

FROM Warehouse 

The Union() function is enacted in this query using the second of two alternate syntaxes. In this approach, we simply list the sets inside the braces, separated by a comma. Again, we note that the UNION keyword is not used at all. Let's see how duplicates are handled when performing a union via this alternate option.

15.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 8 appears.


Illustration 8: Union() Function Results - Alternate Using the "List" Approach

16.  Save the file as MDX13-7.

In this case, we note that the duplicate, tail instance of Bellingham is retained, much as it was when using the ALL flag within the primary syntax example above.

In conclusion, we can see that the Union() function, both in its primary and alternate syntax forms, provides us the significant capability of performing combinations of sets. Our selection of an alternate syntax, or of the use of the optional ALL flag in the primary syntax, can also afford us flexibility in the handling of duplicates in queries we construct that use this function.

Summary ...

In this lesson, we explored the commonly used Union() function, whose purpose it is to combine two sets into one. We discussed the Union() function in general, and emphasized its provision for important capabilities within MDX, and its value within our analysis toolsets.

In addition to discussing the purpose and operation of the Union() function, we examined both primary and alternate syntaxes involved in its use, illustrating the application of each in practice exercises. Focusing on the treatment of duplicates in each of the syntactical approaches, we discussed the results we obtained in each exercise, remarking on distinguishing characteristics of each.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved