# MDX Essentials: Basic Set Functions: Subset Functions: The Subset() Function

Monday Jul 12th 2004 by William Pearson
Share:

Return a subset based upon a specified beginning point within a set. MSAS Architect Bill Pearson introduces the Subset() function, in the last of three articles surrounding Subset functions.

This is the twenty-first article of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements needed for getting the most out of the lessons included, please see the first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

### What We Accomplished in our Last Article

In the last article of the series, Subset Functions: The Tail() Function, we continued our group of three articles surrounding functions whose primary purpose is to perform operations on subsets. We introduced the Tail() function, with which we can return a subset from the end of a set. We commented upon the operation of the function, and then examined its syntax. Next, we undertook practice examples with the function, based upon hypothetical business requirements, following the approach we have used throughout the series.

In our practice set, we intentionally replicated the requirements we had simulated in working with the Head() function in the article that preceded it, so as to compare the Tail() and Head() functions, and to note their similarities in operation, as well as to contrast the results datasets they returned. Throughout the practice examples, we briefly discussed the results datasets we obtained with regard to the Tail() function, together with other surrounding considerations.

### Introduction

In this lesson, we will conclude our "triptych" of articles exposing set functions that deal specifically with subsets. As we have noted, each function returns a subset of a larger set, as part of its operation. We began the subset functions articles with an examination of the Head() function, then explored Tail() in the last. As we mentioned in our last session, these three functions have much in common in the context of usage and operation; covering them in close proximity allows us to more finely distinguish among them, as well as to become aware of their similarities, and to better exploit the attributes we can leverage to meet specific business needs.

In this article, we will introduce and overview the Subset() function. The general purpose of the Subset() function is to return a subset of tuples from a specified set. We will first comment upon the operation of Subset(), and then we will:

• Examine the syntax surrounding the function;
• Undertake illustrative examples of the uses of the function in practice exercises;
• Briefly discuss of the results datasets we obtain in the practice examples.

### The Subset() Function

According to the Analysis Services Books Online, the Subset() function "returns «Count» tuples from «Set» as a set, starting at position «Start». Once we recover from the seemingly redundant explanation that is, in fact, a pretty clear representation of the operation of the Subset() function, we can see that Subset() works a little like the substring functionality that appears in various programming environments, query languages and other places. We are focusing on tuples and their positions relative to each other, as opposed to characters, but the similarities in concept are perhaps easy to recognize.

As we shall see, the order of the set elements remains intact within the operation of the function. We control the "range" of the function by providing a count, similar to the way we control the "reach" we obtain in other MDX functions - and similar to the way we use the numeric expression in the Head() and Tail() functions that we explored in our previous two articles. The difference is that we do not begin our "starting point" from either the left/beginning or right/ending "side" of the set, as do the Head() and Tail() functions, respectively (and a bit like LTRIM and RTRIM, we might note, in the string-based analogy we cited earlier). We can tell Subset() with which exact position to begin its work, and the number of elements to capture, by providing the associated «Start» and «Count» specifications.

We will examine the syntax for the Subset() function, then look at its behavior based upon different «Start» and «Count» input we might provide. Next, we will undertake practice examples constructed to support hypothetical business needs that illustrate uses for the function. This will allow us to activate what we explore in the Discussion and Syntax sections, by getting some hands-on exposure in creating expressions that leverage the function.

#### Discussion

To restate our initial explanation of its operation, the Subset() function iterates through the elements of the specified set and constructs a set by adding the members in the directed range to the new set. The Subset() function starts at a point, or an index («Start» in the syntax model we show in the Syntax section below) that we designate within a set. The function acts to return a range of m tuples from a specified set. We specify m via the «Count» input we provide. The function "counts over" this number of members, "lassoing" them into selection for the new set it creates.

In a manner dissimilar to what we saw for the Head() and Tail() functions in the two immediately previous articles, Subset() manages the absence of a specified numeric expression for «Count» by "defaulting" to include all elements from the «Start» position to the end of the set. (Recall that the Head() and Tail() functions handled the absence of a specified numeric expression by substituting "1" as the range of elements "over" from the beginning and end of the specified set, respectively.)

Let's look at some syntax illustrations to further clarify the operation of Subset().

#### Syntax

Syntactically, the set upon which we seek to perform the Subset operation is specified within the parentheses to the right of Subset, just as we saw with the Head() and Tail() functions in our previous articles. The syntax is shown in the following string.

```Subset(<< Set >>, << Start >> [,<< Count >>])
```

We follow «Set», the set specification with a comma, which is followed by «Start», the starting position for the operation. «Start» is, in turn, followed by «Count», the count of members in the selection range. As we have mentioned, the omission of the count value means that the function simply selects all tuples from «Start», which is "position zero," to the end of the set. In specifying «Count», "0" represents the first member in the set, "1" the second, and so forth.

Within a scenario where the specified «Count» is greater than the number of tuples in the set we specify, the complete set, beginning from the «Start» position, is returned. Moreover, the input of a number less than 1 as the «Count» results in an empty set (indicated, for example, by a message in the MDX Sample Application that, because "the cellset ... contains no positions," it is unable to display a results dataset.

The following example expression illustrates the use of the Subset() function, within a context similar to that of an expression we used in discussing the syntax of the Head() and Tail() functions in the immediately preceding two articles. This will illustrate the similarities in the construction of the functions, while exposing the differences in the datasets that they return.

Let's say, again, that a group of corporate-level information consumers within the FoodMart organization wish to see the total Profits by U.S. Warehouse-Country for the last three Quarters of 1998. While we could easily accomplish this with the Tail() function, whose specialty is, after all, returning the "last of" anything, we can accomplish the same results with the Subset() function.

The basic Subset() function, which would specify the "last three Quarters" (the "children" of year 1998) portion of the required result dataset, would be constructed as follows:

`Subset([1998].Children, 1, 3)`

This expression would be equivalent to the expression from our last article, Tail([1998].Children, 3), and would return an identical result dataset. Assuming that we placed the Subset() function above within the column axis definition of a query, and the Warehouse-Country information defined the row axis, our returned dataset would resemble that shown in Table 1.

 Q2 Q3 Q4 Canada 4,949.88 4,196.32 3,645.54 Mexico 19,625.45 16,477.01 14,509.69 USA 26,093.90 24,912.75 29,348.79

Table 1: Results Dataset, with Subset() Defining Columns

Just as we saw with the Tail() function in our previous session, Subset() has the effect of compactly expressing that we wish to display the Quarters as shown. The "starting point" is Q2 (position "1", as Q1 would be position "0" to the zero-based «Start» value), from which we derive the set (the Quarters of 1998), in their natural order, for three elements "distance."

The primary difference in the two functions, as we can readily see, is that the Subset() function can be used a bit more flexibly. It allows us to specify "starting point" in a given set, together with a "range" of selection, as opposed to the same selection capability, with fixed starting point at the beginning or end of the set, that we obtain using the Head() and Tail() functions, respectively.

As was the case with the Tail() and Head() functions, Subset() can be particularly useful in working with the Time dimension. Moreover, the same efficiencies we saw with the other subset functions can be obtained when Subset() is used in conjunction with "family" functions, as with the .Children function above. More compact, reusable coding is often the result.

NOTE: For information surrounding the .Children function, see MDX Member Functions: The "Family" Functions.

We will practice the use of the Subset() function in the section that follows.

#### Practice

Preparation

To reinforce our understanding of the basics we have covered so far, we will use the Subset() function in a manner that illustrates its operation. We will do so in a simple scenario that places Subset() within the context of meeting a business need.

To begin, we will construct a SELECT query with a clearly defined set, then put Subset() to use in limiting that set to meet an illustrative need for a group of hypothetical information consumers. The intent is, of course, to demonstrate the operation of the Subset() function in a straightforward manner.

Let's return to the MDX Sample Application as a platform from which to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Start the MDX Sample Application.

2.  Clear the top area (the Query pane) of any queries or remnants that might appear.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Sales cube in the Cube drop-down list box.

Let's assume, for our practice example, that we have received a call from the Marketing department of the FoodMart organization, requesting some information surrounding sales promotions that have been conducted. The Marketing information consumers specifically wish to know the Unit Sales figures attributed to each of the promotions, broken out by gender of the purchasers, from which to derive a recurring report that is more filtered.

To rephrase, the objective will be to present a single measure, Units Sales, for "all time" within the context of the FoodMart Sales cube. (For our exercise, the cube can be assumed to represent the current year-plus activity of the organization.) We wish to return data showing Unit Sales broken out by male and female purchasers, for each of the promotions that we have conducted within the time frame represented by the Sales cube. It is from the results dataset that is returned that the consumers want to narrow their request, once they get a look at overall figures, to a compact, recurring report.

Let's construct a simple query, therefore, to return the Unit Sales information, presented by gender (as columns) and the promotion name (as rows).

5.  Type the following query into the Query pane:

```
-- MDX021-1, Preparation for Use of Subset() Function in a Basic Query

SELECT
{[Gender].Members} ON COLUMNS,
{[Promotions].[Promotion Name].Members} ON ROWS
FROM
[Sales]
WHERE ([Measures].[Unit Sales])
```

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 1 appears.

Illustration 1: Result Dataset Preparation for Use of Subset() Function

We see Male, Female, and All Gender populating the columns across, and the Promotion Name (from the Promotions dimension) appearing on the row axis.

7.  Select File -> Save As, name the file MDX021-1, and place it in a meaningful location.

8.  Leave the query open for the next section.

Next, let's say that our information consumers are provided with the somewhat raw Promotion-by-Gender metrics we have generated. They state that they need the data in a slightly different presentation, before determining the thresholds for the ultimate recurring report.

The department has recently decided to emphasize its focus on the purchasing activities of female purchasers, while perusing the corresponding activities of male purchasers, in an attempt to identify patterns. More specifically, they want the same information that we have provided, but sorted by Unit Sales values, from highest sales promotion to lowest, from the perspective of female shoppers.

We can accomplish this re-sort using the Order() function that we explored in Basic Set Functions: The Order() Function, as we shall see in the following steps.

9.  Within the query we have saved as MDX021-1, replace the top comment line of the query with the following:

`-- MDX021-2, Preparation for Use of Subset() Function -Ordered Query`

10.  Save the query as MDX021-2, to prevent damaging MDX021-1.

11.  Change the following line of the query (the rows axis definition):

`{[Promotions].[Promotion Name].Members} ON ROWS`

to the following

```
{ORDER([Promotions].[Promotion Name].Members, ([Gender].[All Gender].[F],
[Measures].[Unit Sales]), BDESC)} ON ROWS
```

12.  Remove the following line (the slicer at the bottom) from the MDX query:

`WHERE ([Measures].[Unit Sales])`

The Query pane appears as shown in Illustration 2.

Illustration 2: The Query with Ordering Enhancement

13.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset depicted in Illustration 3 appears.

Illustration 3: Result Dataset - Ordered Core Query

14.  Re-save the file as MDX021-2.

15.  Leave the query open for the next step.

We have used the Order() function, with the BDESC keyword in place, to obtain the sorted core dataset that the Marketing department wants to see. This allows the information consumers to narrow even further their requirements for a recurring report on the promotions activity by gender. In our next section, we will use the Subset() function to provide for these narrowed, more informed requirements.

NOTE: For details concerning our use of the Order() function above, see my article Basic Set Functions: The Order() Function.

Limiting the Initial Dataset with the Subset() Function

Having provided the Marketing team with a "big picture" idea of promotions activity from the Sales cube, we have equipped them to ask for data within a narrower scope, to eliminate outliers such as promotions that fall below thresholds of interest for various reasons. For purposes of our practice example, we will say that the Marketing information consumers respond to our sorted results dataset within a short period, as we expected, requesting that we provide the report, exactly as it currently appears, on a monthly basis, but that the No Promotion group be excluded (it is of little value in the current context of specific promotion analysis), and that only the top twelve (on the basis of female patronage) promotions be presented in the recurring report.

There are numerous ways to approach this with MDX functions, but we know that Subset() will handle the requirement, particularly in a scenario where we have a sort in place for the dimension member under examination, females.

Let's use the Subset() function to meet the business requirement with precision.

1.  Within the query we have saved as MDX021-2, replace the top comment line of the query with the following:

`-- MDX021-3, Use of Subset() Function within the Ordered Query`

2.  Save the query as MDX021-3.

3.  Within the query, click to the far right of "ON COLUMNS," in the following line:

`{[Gender].Members} ON COLUMNS,`

4.  Press the Enter key a couple of times to create space between the line and the line that follows it.

5.  Type the following into the new line:

`SUBSET(`

6.  Place the cursor to the immediate right of the right curly brace ("}") in the following line of the query:

`[Measures].[Unit Sales]), BDESC)} ON ROWS`

7.  Type a comma (" , "), a space, and then the following:

`1, 12)`

then another space.

The Query pane appears as shown in Illustration 4.

Illustration 4: The Query with Subset() Function in Place

Note that we set "1" as «Start», because, conveniently enough, we wish to exclude the "0" position (the No Promotions line item) anyway, based upon the request of the Marketing consumers who have defined the business requirement. We set "12" as the «Count», because the same information consumers have requested that we provide the metrics for the range of the top twelve promotions in the final version of this recurring report.

8.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 5 appears.

Illustration 5: Result Dataset - The Subset() Function in Action

9.  Re-save the file as MDX021-3.

We have thus provided the Marketing department with the requested analytical data. Because we have built in, via the Order() function, the automatic sorting on the criteria requested, we can be confident that any future generation of the data via this query will provide the appropriate selection, together with the order that reflects the sort of the core dataset. Should the consumers return with a request to change the number of promotions to which they want to narrow their focus, we can accomplish this with a simple adjustment to the «Count» specification within the Subset() function we have placed into our query.

10.  Close the Sample Application when ready.

### Summary ...

This article served as the conclusion of a group of three articles surrounding subset-related functions. We introduced the Subset() function, whose general purpose is to return a specified number of elements in a set, beginning at a point in the set that we designate via the «Start» value, and extending for a range of «Count» tuples. We commented upon the operation of the function, and then examined its syntax.

We undertook a multi-step practice example whereby we created a core query, then limited the results that the query returned through the use of the Subset() function, within the context of meeting an illustrative business requirement. We demonstrated the manner in which the Subset() function uses the «Start» and «Count» values we input to generate the precise results that we wish to obtain. We briefly discussed the results dataset we obtained with the Subset() function, together with other surrounding considerations. Throughout our examination of the Subset() function, we compared and contrasted the Subset() and the Head() and Tail() functions, from the perspective of usage and operation, in order to finely distinguish among them for the particular characteristics we need to meet specific business needs.

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site