MDX Set Functions: The Distinct() Function

Monday Jun 5th 2006 by William Pearson
Share:

Remove duplicate tuples from a set with the DISTINCT() function. Architect Bill Pearson leads hands-on practice with this basic, but useful, MDX Set function.

About the Series ...

This article is a member of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each session progressively adding features and capabilities designed to meet specific real-world needs.

Virtually all of the MDX we have constructed in earlier articles can now be used in the SQL Server Management Studio, SQL Server Business Intelligence Studio, and various other areas within the Microsoft integrated Business Intelligence solution, and much of what we construct going forward can be executed in the Analysis Services 2000 MDX Sample Application (assuming connection to an appropriate Analysis Services data source). MDX as a language continues to evolve and expand: we will focus on many new features in articles to come, while still continuing to examine business uses of MDX in general. The use of MDX to meet the real-world needs of our business environments will continue to be my primary concentration within the MDX Essentials series.

For more information about the series in general, as well as the software and systems requirements for getting the most out of its member lessons, please see Set Functions: The DrillDownMember() Function, where important information is detailed regarding the applications, samples and other components required to complete our practice exercises.

Overview

Microsoft Analysis Services ("Analysis Services"), as most of us know, leads the enterprise business intelligence arena with its rich set of analytical and reporting tools. Within the sphere of analysis and reporting with OLAP data sources, most of these tools rely upon functions based in the MDX query language. MDX is integrated not only within Analysis Services, but also throughout the entire Microsoft integrated Business Intelligence solution, in applications that include MSSQL Server, Analysis Services and Reporting Services, and that extend throughout Microsoft Office and other applications. This integration provides a distinct advantage for users of the platform over those who are limited to the offerings of the expensive, once-dominant enterprise BI solutions (few of which even accommodate direct editing of MDX within their "drag and drop" interfaces), and, particularly in the case of numerical and set functions, allows for easy, consistent application of built-in logic.

In this article, we will extend our examination of MDX functions to concentrate upon the basic, but useful, Distinct() function. We will discuss the straightforward purpose of the function, to return a set without duplicates from a set we specify within the function, as well as the manner in which the function manages to do this.

Along with an introduction to the Distinct() function, this lesson will include:

  • an examination of the syntax comprising the function;
  • illustrative examples of uses of the function in practice exercises;
  • a brief discussion of the MDX results obtained within each of the practice examples.

The Distinct() Function

Introduction

According to the Books Online, the Distinct() function "returns a set, removing duplicate tuples from a specified set." Moreover, the Books Online state that, in cases where the Distinct() function finds duplicate tuples within the specified Set Expression, only the first instance of the duplicate tuple is retained within the returned results dataset.

Although Distinct() eliminates duplicate tuples within the specified Set Expression, the function leaves the order of the original set intact. Distinct() is useful in many applications, and, as is the case with most MDX functions, pairing it with other MDX functions can help us to leverage its power even further.

We will examine in detail the syntax for the Distinct() function after our customary overview in the Discussion section that follows. Following that, we will conduct practice examples within a couple of scenarios, constructed to support hypothetical business needs that illustrate uses for the function. This will afford us an opportunity to explore some of the delivery options that Distinct() can offer the knowledgeable user. Hands-on practice with Distinct(), where we will create queries that employ the function, will help us to activate what we have learned in the Discussion and Syntax sections.

Discussion

To restate our initial explanation of its operation, the Distinct() function removes duplicates that occur within a specified Set. If the specified Set contains duplicates, all except the first instance of the duplicated tuples are discarded – that is, duplicates are removed from the tail of the Set. The first instance (or only instance, if there are no duplicates) is returned within a Set that is ordered just as the Set specified within the function. (As we might expect, specification of an empty Set within the Distinct() function results in the return of an empty set).

Let's look at syntax specifics to further clarify the operation of Distinct().

Syntax

Syntactically, anytime we employ the Distinct() function to return the distinct tuples of a specified set, we specify the Set Expression within the parentheses to the right of the Distinct keyword. The general syntax is shown in the following string:

Distinct(Set_Expression)

Let's take a look at an illustration. The following snippet employs the Distinct() function:

DISTINCT(
  {[Geography].[Geography].[State-Province].[Georgia].CHILDREN,
     [Geography].[Geography].[City].[Atlanta],
        [Geography].[Geography].[City].[McDonough]}
            )
                ON AXIS(1)

This rows-axis specification, within a query executed against the Adventure Works sample cube that, say, specified the Reseller Sales Amount measure on the columns ( Axis(0) ), and which contained a Calendar Year 2004 slicer, would produce a results dataset similar to that partially depicted in Illustration 1.


Illustration 1: Results Dataset – Distinct() Function with Specified Set Containing Duplicates

In the example dataset, we see that the Cities of the State of Georgia appear in the order in which they would have appeared had we simply defined the row axis as [Geography].[Geography].[State-Province].[Georgia].CHILDREN. We have intentionally specified duplicates (the Cities of Atlanta and McDonough) within our query to illustrate the fact that the first instance of the duplicated Cities is retained, and the second instance discarded. This illustrates the manner in which duplicates are removed from the tail of the set within the results dataset.

Practice

Preparation: Access SQL Server Management Studio

To reinforce our understanding of the basics we have covered so far, we will use the Distinct() function in a couple of queries that illustrate its operation. We will do so in simple scenarios that place Distinct() within the context of meeting basic requirements similar to those we might encounter within our respective daily environments. The intent, as in all the practice sessions of this series, is to demonstrate the operation of the Distinct() function in a straightforward, memorable manner.

We will turn to the SQL Server Management Studio as a platform from which to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Click the Start button.

2.  Select Microsoft SQL Server 2005 within the Program group of the menu.

3.  Click SQL Server Management Studio, as shown in Illustration 2.


Illustration 2: Opening SQL Server Management Studio

The Connect to Server dialog appears, after the brief Management Studio splash screen.

4.  Select Analysis Services in the Server type selector.

5.  Type / select the server name (server name / instance, if appropriate) in the Server name selector.

6.  Supply authentication information, as required in your own environment.

7.  Click the Connect button to connect with the specified Analysis Services server.

The SQL Server Management Studio opens.

8.  In the Object Explorer pane (it appears by default on the left side of the Studio), expand the Databases folder (click the "+" sign to its immediate left), appearing underneath the Analysis Server with which we are working.

The Databases folder opens, exposing the detected Analysis Services database(s), as depicted in Illustration 3.


Illustration 3: Exposing the Analysis Services Databases in the Object Browser ...

NOTE: The Analysis Services databases that appear will depend upon the activities that have taken place in your own environment, and will likely differ from those shown in Illustration 3 above. For purposes of this practice session, the Adventure Works DW database must be present. If this is not the case, consult the Books Online for the installation / connection procedures, and complete these procedures before continuing.

9.  Expand the Adventure Works DW database.

The Database expands, exposing the folders for the various objects housed within the Analysis Services database, as shown in Illustration 4.


Illustration 4: Exposing the Object Folders in the Database ...

10.  Expand the Cubes folder within the Adventure Works DW database.

The Cubes folder opens. We see two cubes, the first of which, Adventure Works, is the sample cube with which we will be conducting our practice exercises. The cubes appear as depicted in Illustration 5.


Illustration 5: The Cubes Appear ...

11.  Click the Adventure Works cube to select it.

12.  Click the New Query button just under the main menu, in the upper left corner of the Management Studio, as shown in Illustration 6.


Illustration 6: Click the New Query Button with the Adventure Works Cube Selected

The Metadata pane for the Adventure Works cube appears, along with the Query pane to its right, as depicted in Illustration 7.


Illustration 7: Adventure Works Cube Metadata Appears ...

We will be using the Query pane in the practice session that follows, to construct and execute our MDX queries.

As we discover in articles throughout my Introduction to MSSQL Server Analysis Services series, among my other series' at Database Journal, the SQL Server Management Studio serves us in providing a point of interface with all server types in the SQL Server family, including Analysis Services, Reporting Services and Integration Services servers, as well as supporting many additional functions. Among those functions, I find the capabilities to easily browse data, and to issue queries, highly convenient. We can accomplish querying in several other ways within the integrated Microsoft BI solution, but this is certainly one of the most direct. For more information on the use of the Query Editor within SQL Server Management Studio for issuing MDX queries within the practice exercises of the MDX Essentials series, see Set Functions: The DRILLDOWNMEMBER() Function. (Articles within my other series' explore other capabilities and features of the SQL Server Management Studio, as well as the SQL Server Business Intelligence Studio).

Procedure: Satisfy Business Requirements with MDX

Let's assume, for purposes of our practice example, that we have received a request from representatives of our client, the Adventure Works organization. As we have noted in other articles of the series, the Reporting department, a group of client-facing authors and developers, often requests assistance such as this. As a part of our relationship with Adventure Works, as well as with other clients, we provide on-site augmentation for business requirements gathering and training; we perform workshops, in many cases, that illustrate approaches to meeting specific needs. These combined development workshops / "train the trainer" events have worked well in the past for all concerned.

As usual, the authors and developers in the group are aware that the particular need that they are currently expressing will manifest itself in recurring situations as they work to meet the daily requirements of the Adventure Works information consumers. This particular request for assistance involves scenarios where they feel that the Distinct() function might be highly useful.

In a brief discussion with members of the Reporting department, we learn that the need has arisen, in crafting MDX queries for various analysis and reporting needs, for an understanding of the Distinct() function. Among the drivers for this requirement is the fact that, in several recent instances, inexperienced report developers have executed new queries only to find that, while the results datasets seem to be correct from other perspectives, the displayed row and / or axis dimension members have contained duplicates – a situation that is undesirable in most cases (for example, when the MDX query has been created to support a report picklist).

After gaining an understanding of the need, we explain to the developers the reasons that this might have occurred. We also ascertain that the requirement, at least at present, does not include a need to perform distinct counts, for which we state that different remedies exist. We then agree to demonstrate the use of an MDX set function, Distinct(), to eliminate the duplicates in the returned datasets.

NOTE: For detailed articles surrounding distinct counts in Analysis Services 2000, see Distinct Count Basics: Two Perspectives and Manage Distinct Count with a Virtual Cube, both members of my Introduction to MSSQL Server Analysis Services series.

We convince the authors that they might best become familiar with the Distinct() function by examining a couple of cases in which we induce, and thus expect, duplication similar to that they have described in the results datasets. Once we have created a scenario where duplication is clearly in evidence, we will exploit Distinct() to eliminate duplication, not only to demonstrate the application and effectiveness of the function, but also to illustrate the rule to which Distinct() adheres in selecting the tuples for elimination. The client representatives with which we are working agree that, by creating a duplication scenario as an initial step, we can more effectively demonstrate examples of the straightforward operation of Distinct() within a meaningful context.

Procedure: Use the Distinct() Function to Eliminate Duplication in MDX Query Results Datasets

Let's construct a simple query to provide the "starting point" for our subsequent work with the Distinct() function. Our intent here, again, is to return a results dataset that contains duplication, and then to demonstrate the removal of duplicates, using Distinct() within the original query.

1.  Type (or cut and paste) the following query into the Query pane:


-- MDX044-001-1 Simple "Duplication" Scenario
SELECT 
   {[Measures].[Internet Sales Amount], [Measures].[Internet Order Count]} 
      ON AXIS (0),
   NON EMPTY
      {[Product].[Product Categories].[Category].[Bikes].CHILDREN, 
         DESCENDANTS([Product].[Product Categories].[Category].[Bikes], 
         
             [Product].[Product Name], SELF_BEFORE_AFTER)} ON AXIS (1)
FROM
   [Adventure Works]
WHERE 
               ([Date].[Calendar].[Calendar Year].[CY 2003])

The Query pane appears, with our input, as shown in Illustration 8.


Illustration 8: Our Initial Query in the Query Pane ...

The above query sets the stage for "applying distinction" (or, in other words, for the "elimination of duplication"). We have a simple case, within the row axis, from which a redundant tuple is, unsurprisingly, generated.

2.  Execute the query by clicking the Execute button in the toolbar, as depicted in Illustration 9.


Illustration 9: Click Execute to Run the Query...

The Results pane is populated by Analysis Services, and the dataset, partially shown in Illustration 10, appears.


Illustration 10: Results Dataset (Partial View) – Initial "Duplication" Scenario

In the partial view of the returned dataset, duplication appears within the top rows – as we see circled in the illustration above. We could, of course, easily modify the query in other ways to remove the duplicate, but our focus here is the use of Distinct() to accomplish this action.

3.  Select File -> Save MDXQuery1.mdx As ..., name the file MDX044-001-1, and place it in a meaningful location.

4.  Leave the query open for the next step.

Our developer / author colleagues express satisfaction with the contextual backdrop we have established for introducing the Distinct() function. We will undertake using the function in our next steps, first with the foregoing example, and then within a "fresh" query we will construct.

5.  Replace the comment line in query MDX044-001-1 with the following:

-- MDX044-001-2 Using DISTINCT() to Remove Duplication

6.  Select File -> Save MDX044-001-1.mdx As ..., name the file MDX044-001-2.mdx, and place it in the same location as its predecessor, to protect the former query.

7.  Place the cursor to the right of the NON EMPTY keyword on the fifth row of the query.

8.  Press the Enter key twice to create a new line between the line of the query on which we have placed the cursor and the line that currently follows it, namely:

{[Product].[Product Categories].[Category].[Bikes].CHILDREN,

9.  Type the following syntax into the new row:

 DISTINCT(

10.  On what is now the ninth row, place the cursor to the immediate right of the following:

 [Product].[Product Name], SELF_BEFORE_AFTER)}

11.  Add another right parenthesis ( ")" ) to the right of the existing right curly brace, which immediately precedes "ON AXIS (1)" within the row.

The effect, of course, is to enclose the set specified on the rows axis within the newly added Distinct() function. Once we have accomplished this simple modification, the Query pane appears as depicted in Illustration 11.


Illustration 11: "Adjusted" Query in the Query Pane (Modifications Circled)

12.  Execute the query by clicking the Execute button in the toolbar, as before.

The Results pane is populated by Analysis Services, and the dataset partially shown in Illustration 12 appears.


Illustration 12: Results Dataset (Partial View) – Distinct() at Work

We see that Distinct() has had the expected effect: It has eliminated the duplicate tuple (labeled "Mountain Bikes") that we saw in the earlier results dataset. Further, we can also discern, because the first Mountain Bikes row is preserved in the dataset whose query has employed the Distinct() function to remove duplicates, that the function has removed "all except the first instance" of the once duplicated tuple, as we might have predicted based upon our explanation of the function in earlier sections.

13.  Select File -> Save MDX044-001-2.mdx to ensure that that the file is saved.

The client developers and report authors express satisfaction with the results, and confirm their understanding in the operation of the Distinct() function. They then present an additional case where they wish to employ Distinct() to eliminate duplicates from a slightly more elaborate scenario, which they outline as follows: Using the Adventure Works cube as a data source, they have already constructed a query which, like the previous query we examined, produces duplicates within its results dataset. This query employs a group of MDX functions to provide a listing, assuming a given individual within the sales organization, of the organizational hierarchy within which that individual functions. The hierarchy presented displays employees, both above and below the level of the specified employee, within the hierarchy, along with each employee's order count. The hierarchy, they explain, is defined by "sales activity rollup" relationships, meaning that each level within the sales hierarchy will include totals for the members of the level underneath. The query as it is currently written concerns itself with Calendar Year 2004.

The report authors / developers have a new appreciation for the fact that, given the current core query, the capability to perform ad hoc hierarchy generation, based upon the individual selected at runtime, becomes a matter of parameterizing a key component of the rows-axis specification of the query, the unique name for the employee. Because we have demonstrated to the developers, that parameterization of this sort becomes easily attainable within Reporting Services, assuming that sufficiently sophisticated queries are put in place to support it (within either the Reporting Services or Analysis Services layers), the core query is deemed valuable to the team.

While we will not get into the parameterization aspects of query design to make this happen, we will concentrate upon one undesirable characteristic of the query as it is currently written; the developers tell us that all seems to be functioning properly, except for duplication within the results dataset that the query produces. We will use this scenario as a vehicle to once again illustrate the usefulness of the Distinct() function. Moreover, showing the unintended duplication in the initial scenario, and then applying Distinct() to the query to eliminate the duplicates, will provide a hands-on, "before and after" look at how Distinct() behaves in another example.

To outline the requirement further, the developers present a draft of the intended objective within an MS Excel spreadsheet (with an additional column showing Employee level, for illustration purposes only). The draft is intended to serve as our "confirmation of understanding draft" of the desired dataset, and appears in MS Excel as depicted in Illustration 13.


Illustration 13: "Confirmation Draft" of Intended Hierarchical Results Dataset

Having obtained consensus on the proposed target dataset, we are ready to set about constructing the query.

14.  Select File --> New from the main menu.

15.  Select Query with Current Connection from the cascading menu that appears next, as shown in Illustration 14.


Illustration 14: Create a New Query with the Current Connection ...

A new tab, with a connection to the Adventure Works cube (we can see it listed in the selector of the Metadata pane, once again) appears in the Query pane.

16.  Type (or cut and paste) the following query into the Query pane:


-- MDX044-002-1 Duplication Appears within the Use of Combined Functions
SELECT 
   {[Measures].[Reseller Order Count]} ON AXIS(0),
   HIERARCHIZE( 
      {DESCENDANTS([Employee].[Employees].[Stephen Y. Jiang]), 
   
          ASCENDANTS([Employee].[Employees].[Stephen Y. Jiang])}) 
              DIMENSION PROPERTIES [Employee Department].[Department Name] 
             
                  ON AXIS(1)
                    
FROM
    [Adventure Works]
WHERE
    ([Date].[Calendar Year].[CY 2004]) 

The Query pane appears, with our input, as depicted in Illustration 15.


Illustration 15: Our Initial Query in the Query Pane ...

17.  Execute the query by clicking the Execute button in the toolbar.

The Results pane is, once again, populated by Analysis Services. This time, the dataset shown in Illustration 16 appears.


Illustration 16: Results Dataset –Duplicate in Evidence (Circled)

In the returned dataset, we see that, except for the duplicate seen in the above illustration, the query appears to deliver the intended results.

18.  Select File -> Save MDXQuery2.mdx As ..., name the file MDX044-002-1.mdx, and place it in the same location used to store the earlier queries.

19.  Leave the query open for the next step.

As we can see, the initial query presents a scenario that will serve as a basis from which to employ Distinct() to eliminate duplication. The Distinct() function can be easily added to align the dataset completely with the "intended results" draft that the developers have provided, as we shall see next.

20.  Replace the comment line in query MDX044-002-1 with the following:

-- MDX044-001-2 Using DISTINCT() to Eliminate Duplication

21.  Select File -> Save MDX044-002-1.mdx As..., name the file MDX044-002-2.mdx, and save it with the other queries we have constructed.

22.  Click to the immediate right of HIERARCHIZE( - on the fourth row from the top in the existing query - to place the cursor there.

23.  Press the Enter key on the PC twice, to create a space between the row and the row underneath it.

24.  Type the following syntax into the new row:

 DISTINCT(

25.  On what is now the seventh row, place the cursor to the immediate right of the following:

ASCENDANTS([Employee].[Employees].[Stephen Y. Jiang])})

26.  Add another right parenthesis ( ")" ) to the right of the right-most parenthesis.

The effect, once again, is to enclose the set involved within the Distinct() function. Once we have accomplished this simple modification, the Query pane appears as depicted in Illustration 17.


Illustration 17: "Adjusted" Query in the Query Pane (Modifications Circled)

27.  Execute the query by clicking the Execute button in the toolbar, as before.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 18 appears.


Illustration 18: Results Dataset – Distinct() in Action

Once again, we witness that Distinct() has eliminated the duplicate tuple, leaving only a single occurrence of manager Stephen Y. Jiang, for whom two occurrences appeared in the earlier results dataset. Further, while it might be impossible to ascertain precisely which of the duplicate rows remains, the fact is that the function has removed "all except the first instance" of the once duplicated tuple, as we discussed earlier, and then saw in our first practice example.

28.  Select File -> Save MDX044-002-2.mdx to ensure that that the file is saved.

29.  Select File -> Exit to leave the SQL Server Management Studio, when ready.

The client representatives inform us that their immediate goals have been met, and that the examples we have shared have illustrated the principles of operation behind Distinct().

Summary...

In this lesson, we continued our examination of MDX functions to concentrate upon the rudimentary, but useful, Distinct() function. We discussed the purpose of Distinct(), to return a set without duplicates from a set we specify within the function, as well as the manner in which the function manages to return a distinct set.

After introducing the Distinct() function, we examined the syntax with which we employ it. We next undertook illustrative examples whereby we put Distinct() to work, to gain some hands-on practice in its use. Throughout our practice session, we briefly discussed the results datasets we obtained from each of the queries we constructed or modified.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved