MDX in Analysis Services: Introducing DISTINCT COUNT

Monday Apr 26th 2004 by William Pearson
Share:

Perform distinct counts in your MSAS environment. MSAS Architect Bill Pearson introduces distinct count as a concept, discusses its uses, and leads practice in alternative implementations.

About the Series ...

This is the fourteenth tutorial article of the series, MDX in Analysis Services. The series is designed to provide hands-on application of the fundamentals of MDX from the perspective of MS SQL Server 2000 Analysis Services ("MSAS"); our primary focus is the manipulation of multidimensional data sources, using MDX expressions, in a variety of scenarios designed to meet real-world business intelligence needs.

For more information on the series, as well as the hardware / software requirements to prepare for the tutorials we will undertake, please see the first lesson of this series: MDX Concepts and Navigation.

Note: At the time of writing, Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples. The screen shots that appear in this article were taken from a Windows 2003 Server, and may appear somewhat different from coinciding views in other operating systems.

Introduction

In our last tutorial, Named Sets in MDX: An Introduction, we introduced named sets in MDX queries, focusing on their creation through use of the WITH clause, to allow us to gain an understanding of the general capabilities of static and dynamic named sets. We introduced the concepts behind named sets, and then examined the MDX syntax required to create them and to specify them for presentation in our results. Next, we discussed the nature of static and dynamic named sets, and then activated what we had learned through an illustrative practice example for each of the two types. Finally, we discussed the results we obtained in each hands-on example, illustrating the value that named sets can offer us.

In this article, we introduce the concept of distinct counts, discussing why they are useful - indeed, often required - in our organizational analysis efforts. Throughout our session, we will describe some of the challenges that are inherent in distinct counts, and then we will undertake practice exercises to illustrate solutions to meet our business needs. As a part of the practical exercises, built around a hypothetical business need, we will provide an introduction to the approach afforded us by the MSAS user interface, and then to an alternative approach we can take using MDX.

"The Need for Distinction"

As anyone in the realm of business intelligence and general analysis has probably come to realize, we often encounter the need to quantify precisely the members of various sets of data. Those of us who have become familiar with MSAS are aware of its capabilities when it comes to categorizing and aggregating data within the hierarchical contexts of dimensions and levels. We can, for the most part, readily tap these capabilities from the user interface that MSAS provides. Through the exploitation of more advanced approaches, including the use of calculated members / measures and MDX (multidimensional expressions), we can extend our analysis even further, and leverage MSAS to reach far more specific objectives.

NOTE: For more information on calculated members, see Calculated Members: Introduction, Calculated Members: Further Considerations and Perspectives, and Calculated Members: Leveraging Member Properties among numerous general references in the Database Journal MDX Essentials series.

One of the basic requirements that come into play, at least in some form, in virtually any analysis scenario, is the need to count the members of a set targeted for analysis. An example might be the need to count the number of products we shipped from a given warehouse, or group of warehouses, to a geographical location or group of stores. This can be accomplished readily enough with the Count() function (see Basic Numeric Functions: The Count() Function for details about using the MDX Count() function).

As many of us know, Count() does a fine job of giving us a total count. This would mean that the results we might achieve in using Count() with products, in the scenarios above, would represent total number of products shipped. What we would not get, and what we might find far more useful in some situations, would be a count of the different products that were shipped. Count(), in providing a total number, would also be providing multiple counts of the same products, because products will have been shipped multiple times, in many instances. To reach our objective of counting different products, then, we would need to count each different product shipped only once. To count them multiple times not only misstates the number of different products, but it also likely renders averages, and other metrics based upon the count value, meaningless or misleading.

The word "different" here is easily supplanted by "distinct." Moreover, as many of us know, performing distinct counts has historically presented a challenge in the OLAP world. Let's discuss an example that illustrates the challenge, and then convert that challenge to an opportunity to meet a business need using the distinct count capabilities found within MSAS.

Handling Distinct Counts via the MSAS User Interface

Let's take a look at a scenario that illustrates a need for a distinct count, using a hypothetical business need to add practical value. Let's say that a group of information consumers within the FoodMart organization have approached us with a need that they wish to meet within the Warehouse cube. The consumers want to be able to report upon number of products within various metrics without having to be concerned with an issue they faced with a previous system - a scenario of "double counting" in many inventory reports that concerned product-related transactions between warehouses and stores.

We might initially attempt to meet the needs of the consumers with somewhat advanced MDX, but let's try to minimize complication, while heading off many of the issues, with a straightforward approach from within the Cube Editor component of the MSAS user interface, Analysis Manager, first. This provides all that we need, in many cases. (We will examine an MDX approach in the next section of this article).

Let's start Analysis Services and proceed with the following steps:

1.  Open Analysis Manager.

2.  Expand the Analysis Servers folder by clicking the "+" sign to its immediate left.

Our server(s) appear (my server, MOTHER1, is depicted in some of the illustrations).

3.  Expand the desired server.

Our database(s) appear, in much the same manner as shown in Illustration 1.


Illustration 1: A Sample Set of Databases Displayed within Analysis Manager

4.  Expand the FoodMart2000 database.

5.  Expand the Cubes folder.

The sample cubes appear, as shown in Illustration 2.


Illustration 2: The Sample Cubes in the FoodMart2000 Database

NOTE: Your databases / cube tree may differ, depending upon the activities you have performed since the installation of MSAS (and the simultaneous creation of the original set of sample cubes). Should you want or need to restore the cubes to their original state, simply restore the database under consideration. For instructions, see the MSSQL Server 2000 Books Online.

6.  Right-click on the Warehouse sample cube.

7.  Select Edit from the context menu that appears, as shown in Illustration 3.


Illustration 3: Select Edit from the Context Menu

The Cube Editor opens. The Schema tab appears as depicted in Illustration 4.


Illustration 4: Cube Editor - Schema Tab for the Warehouse Sample Cube

We will be creating a measure in the Cube Editor to enable us to make our distinct Product counts. Distinct Count can only exist as a measure.

8.  Right-click the Measures folder in the Tree View to the left of the Schema tab.

A single-line context menu appears, as shown in Illustration 5.

9.  Select New Measure from the context menu.


Illustration 5: Select New Measure from the Context Menu

The Insert Measure dialog appears.

10.  Click-select product_id.

The Insert Measure dialog, selected measure circled in red, appears in Illustration 6.


Illustration 6: Select Product_Id from the Insert Measure Dialog

11.  Click OK to accept the selection.

The Insert Measure dialog closes, and we see the new measure appear (default name of Product_Id) in the Measures folder, as depicted in Illustration 7.


Illustration 7: Product_Id Appears in the Measures Folder (Circled)



12.  Click-select product_id in the Measures folder, if required.


13.  If necessary, click the downward arrow beneath the Cube Tree to open the Properties pane.


14.  Click the Basic tab.


15.  Modify the default Name of Product Id to the following:


Product Count

16.  Type the following into the empty Description box, just below the Name box:


Distinct Count - Products

17.  Click the box to the right of the Aggregate Function label, to enable the selector.


18.  Select Distinct Count in the Aggregate Function selector.


The Basic tab of the Properties pane appears as shown in Illustration 8.



Illustration 8: Product Count Measure - Properties Pane - Basic Tab

19.  Click the Data tab as if going to the Data View to perform a routine browse.

A warning briefly appears, indicating that sample data is being generated, and that the cube requires processing, as a result of our modifications. The sample data then appears, along with a static warning below it, to ensure that we are aware that the data is not what it might appear to be, and that the cube must be processed to make updated, actual data available, as partially depicted in Illustration 9.


Illustration 9: Data View (Partial and Compressed) - With "Staleness" Warning at its Foot

Let's process the cube to activate our changes.

20.  Select File --> Save to save the cube in its modified state.

21.  Select Tools --> Process Cube to initialize the processing steps.

A message box appears, stating that the cube has no aggregations, and asking if we wish to design them at this time, as shown in Illustration 10.


Illustration 10: Aggregations Message Box - Just Say "No"

NOTE: The message box may not appear, if the cube has been altered with regard to aggregations since its installation as an MSAS sample. If not, the next box will appear instead, skipping this one.

22.  Click No to skip designing aggregations at present.

The Select the Processing Method dialog appears, as depicted in Illustration 11.


Illustration 11: The Select the Processing Method Dialog

Full Processing is the default, and only option, as the Warehouse cube has not been processed since the structural change we have made to it.

23.  Leaving settings at default, click OK.

Processing begins, and runs rapidly, as evidenced by the Process viewer's presentation of processing log events in real time. The Processing cycle ends and the success of the evolution is indicated by the appearance of the Processing Completed Successfully message (in green letters) at the bottom of the viewer, as shown in Illustration 12.


Illustration 12: Indication of Successful Processing

24.  Click Close.

We are returned to the Cube Editor. We can now browse the data and see our new Distinct Count measure in action.

25.  Click the Data tab, if necessary.

The Data View refreshes and data appear in the default formation, ready for our manipulations review. A portion of the Data View, depicting the Warehouse Profit and new Product Count measures, appears in Illustration 13.


Illustration 13: Warehouse Profit and Product Count Measures in the Data View

Now that we have a credible result set with which to compare, let's take a look at replicating the same results using MDX. We can leave the Data View as it is, for easy referral against our next results dataset, which we will generate independently within the MDX Sample Application.

Using MDX to Render Distinct Counts

We now have a set of "answers" that we can attempt to replicate in direct MDX. Let's initialize the MDX Sample Application, as a platform from which to perform our practice exercises, taking the following steps:

1.  Start the MDX Sample Application.

We are initially greeted by the Connect dialog, shown in Illustration 14.

Click for larger image

Illustration 14: The Connect Dialog for the MDX Sample Application

The illustration above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2.  Click OK.

The MDX Sample Application window appears.

3.  Click File --> New.

A blank Query pane appears.

4.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

5.  Select the Warehouse cube in the Cube drop-down list box.

The MDX Sample Application window should resemble that depicted in Illustration 15, complete with the information from the Warehouse cube displaying in the Metadata tree (left section of the Metadata pane).


Illustration 15: The MDX Sample Application Window (Compressed View)

We will begin creating our query, with a focus on returning results in the same general formation as the Data View we left in the Cube Editor. We will retrieve the Warehouse Profit and Product Count measures, as pictured in Illustration 13 above. Next, we will attempt to add a calculated measure that we craft directly in MDX, to replicate the distinct count information we obtained with the Product Count measure that we created in Analysis Manager earlier.

1.  Create the following new query:


-- MXAS14- 1 Initial Attempt at Distinction
WITH MEMBER 
   [MEASURES].[ProdCount] 
AS  
   'DISTINCTCOUNT({[Product].MEMBERS})'
SELECT
   { [MEASURES].[Warehouse Profit], [MEASURES].[Product Count],
[MEASURES].[ProdCount] } ON COLUMNS,
   {[Product].CHILDREN} ON ROWS
FROM 
   [Warehouse]

The above represents an attempt to meet the information consumers' objectives with what appears to be the straightforward use of the DISTINCTCOUNT() function. This might represent an approach that seems intuitive to a practitioner who has given up on the handful of non-working or nebulous examples that can be found on the web, (and which happen to be about all we seem to have as a basis for learning MDX, in many instances). While it ultimately fails to provide the desired solution, as we shall see, it should not be surprising that we might attempt this, given the definition in the Books Online, not to mention the words used in the name of the function itself. (Most will agree, also, that it is better to attempt it now, than when under the gun of an employer or a hurried client.)

The calculated member ProdCount embodies the function. I named it ProdCount to distinguish if from Product Count, the measure we created while within the user interface in the earlier section, which I have also decided to present within the results dataset for comparison purposes. Warehouse Profit is also presented to align with our Data View as we left it in the last section.

2.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 16.


Illustration 16: The Results Dataset - DISTINCTCOUNT() Approach

3.  Save the query as MXAS14-1.

It does not require a huge leap of logic to conclude that the ProdCount calculated measure is generating a transaction count, which is probably correctly "distinct," within its own (actual) meaning, but not at all what the information consumers have requested in our practice example.

Bruised and humiliated (albeit briefly), let's resort to another, more cumbersome approach, whose issue is at least the distinct product values.

4.  Create the following new query:


-- MXAS14- 2  Distinction at its Finest 
WITH MEMBER
   [MEASURES].[CalcCount]
AS
   'COUNT(CROSSJOIN({[MEASURES].[Warehouse Profit]}, DESCENDANTS
    ([Product].CURRENTMEMBER, [Product].[Product Name])), EXCLUDEEMPTY)'
SELECT 
   {[MEASURES]. [Warehouse Profit], [MEASURES].[Product Count], [MEASURES].[CalcCount] } 
      ON COLUMNS,
   [Product].CHILDREN ON ROWS
FROM
   [Warehouse]

The next attempt at distinction is embodied by the calculated measure CalcCount, named, again, simply as a means of distinguishing it from the measure we created in the Cube Editor and which we include once again for comparison purposes.

The above approach may not have been the initial impulse that many of us had in tackling what seemed to be a straightforward replication of the Data View we saw earlier. What we are doing, in short, with the CrossJoin() function is marrying the Warehouse Profit values with the products, and returning (thanks to EXCLUDEEMPTY) a count of the non-empty pairings. The Descendants() function builds in flexibility, allowing us to apply the logic equally well to a group of products as to the full set of products. The key to this is the selection of the current member's descendents, adding the "relativity" that so pointedly underscores the power of the .CurrentMember function.

5.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 17.


Illustration 17: The Results Dataset - Distinction Attained

6.  Save the query as MXAS14-2.

The values for the new measure are in alignment with those of the measure we created in the Cube Editor.

NOTE: For a detailed introduction to most of the above functions, see the Database Journal MDX Essentials Series index page.

7.  Exit the MDX Sample Application and Analysis Manager when ready.

Summary and Conclusion ...

In this lesson, we introduced the concept of distinct counts, discussing why they are often a requirement in our analysis efforts and those of the information consumers whom we support. In our introduction, and throughout our examination of the MDX syntax we explored to achieve our illustrative ends, we highlighted the challenges that are inherent in distinct counts. We performed practice exercises, to illustrate solutions for hypothetical business needs that called upon the use of distinct count capability, obtaining exposure to the options afforded us by the MSAS user interface, as well the MDX syntax involved with using the alternative solutions that we proposed.

In future articles, we will examine the performance considerations inherent in the production of distinct counts, as well as options that are available to tune our efforts for more efficient operation. The need for distinct counts is a fact of business life, and mastery of the costs and results of this vital capability represent a unique opportunity to add another tool to our MSAS skill sets.

» See All Articles by Columnist William E. Pearson, III

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved