Introduction to MSSQL Server 2000 Analysis Services: Distinct Count Basics: Two Perspectives

Monday Jan 10th 2005 by William Pearson

MSAS Architect Bill Pearson explores DISTINCT COUNT concepts, then leads practice in putting these concepts to work from the tandem perspectives of Analysis Manager and MDX.

About the Series ...

This article is a member of the series Introduction to MSSQL Server 2000 Analysis Services. The series is designed to provide hands-on application of the fundamentals of MS SQL Server 2000 Analysis Services, with each installment progressively adding features and techniques designed to meet specific real - world needs. For more information on the series, as well as the hardware / software requirements to prepare for the exercises we will undertake, please see my initial article, Creating Our First Cube.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples. Images are from a Windows 2003 Server environment, upon which I have also implemented MS Office 2003, but the steps performed in the articles, together with the views that result, will be quite similar within any environment that supports MSSQL Server 2000 and MSSQL Server 2000 Analysis Services ("Analysis Services" or "MSAS"). The same is generally true, except where differences are specifically noted, when MS Office 2000 and above are used in the environment, in cases where MS Office components are presented in the article.


In this article, we will explore distinct counts, discussing why they are useful (and often required) within the design of any robust analysis effort. Throughout this session, we will describe some of the challenges that are inherent in distinct counts, and then we will undertake practice exercises to illustrate solutions to meet example business requirements. As a part of the practical exercises, built around a hypothetical business need, we will provide an approach afforded us by the MSAS user interface, and then we will offer an alternative approach using MDX.

We will revisit DISTINCT COUNT at various points in subsequent articles in our series, examining specifics with regard to appropriate use, and details of optimization within the perspective under examination in the article concerned. In this article, we will lay the framework for those specific scenarios, and discuss the basics of DISTINCT COUNT, together with considerations that surround its use.

Distinct Counts Concepts

Overview and Discussion

Anyone working within the realm of business intelligence and general analysis realizes, in short order, that we often encounter the need to quantify precisely the members of various sets of data. Those of us who have become familiar with MSAS are aware of its capabilities when it comes to categorizing and aggregating data within the hierarchical contexts of dimensions and levels. We can, for the most part, readily tap these capabilities from the user interface that MSAS provides. Through the exploitation of more advanced approaches, including the use of calculated members / measures, and multidimensional expressions ("MDX") in general, we can extend our analysis even further, and leverage MSAS to reach far more specific objectives.

One of the basic requirements that come into play, at least in some form, in many analysis scenarios, is the need to count the members of a set targeted for analysis. An example might be the need to count the number of products we have shipped from a given warehouse, or group of warehouses, to a given geographical location, or a specific group of stores. This can be accomplished readily enough with the Count() function, as most of us are aware.

Count() does a great job of giving us a total count. Of course, the results we would achieve in using Count() with products, in the scenarios above, would represent total number of products shipped. What we would not get, and what we might find far more useful in some situations, would be a count of the different products that were shipped. Count(), in providing a total number, would also be providing multiple counts of the same products, because products will have been shipped multiple times, in many instances. To reach our objective of counting different products, then, we would need to count each different product shipped, only once. To count them multiple times not only misstates the number of different products, but it also likely renders averages, and other metrics based upon the count value, meaningless or misleading.

The word "different" here is easily supplanted by "distinct." Moreover, as many of us are aware, the performance of distinct counts has historically presented a challenge in the OLAP world. Let's discuss an example that illustrates the challenge, and then transform that challenge to an opportunity to meet an illustrative business need, using the distinct count capabilities found within MSAS.

Considerations and Comments

For purposes of this exercise, we will be working with the Warehouse cube, within the FoodMart 2000 MSAS database; these working samples accompany a typical installation of MSAS. If the samples are not installed in, or have been removed from, your environment, they can be obtained from the installation CD, as well as from the Analysis Services section of the Microsoft website. If you prefer not to alter the structure of your sample cubes as they currently exist, make copies of the cube we reference in the article before beginning the practice exercises. For instructions on copying cubes, see the Preparation section of Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances.

Hands-On Procedure

Let's start with a look at a scenario that illustrates a need for a distinct count, using a hypothetical business need to add practical value. Let's say that a group of information consumers within the FoodMart organization has approached us with a need that they wish to meet within the Warehouse cube. The consumers want to be able to report upon number of products within various metrics without having to be concerned with an issue they faced with a previous system - a scenario of "double counting" in many inventory reports that concerned product-related transactions between warehouses and stores.

We might initially attempt to meet the needs of the consumers with somewhat advanced MDX, but let's try to minimize complication, while heading off many of the issues, with a straightforward approach from within the Cube Editor component of the MSAS user interface, Analysis Manager, first. This provides all that we need, in many cases. We will examine an MDX approach in the next section of this article.

Working with Distinct Counts in Analysis Manager

Let's start Analysis Services and proceed with the following steps:

1.  Open Analysis Manager.

2.  Expand the Analysis Servers folder by clicking the "+" sign to its immediate left.

Our server(s) appear (my server, MOTHER1, is depicted in some of the illustrations).

3.  Expand the desired server.

Our database(s) appear, in much the same manner as shown in Illustration 1.

Illustration 1: A Sample Set of Databases Displayed within Analysis Manager

4.  Expand the FoodMart2000 database.

5.  Expand the Cubes folder.

The sample cubes appear, as shown in Illustration 2.

Illustration 2: The Sample Cubes in the FoodMart2000 Database

NOTE: Your databases / cube tree may differ, depending upon the activities you have performed since the installation of MSAS (and the simultaneous creation of the original set of sample cubes). Should you want or need to restore the cubes to their original state, simply restore the database under consideration. For instructions, see the MSSQL Server 2000 Books Online.

6.  Right-click on the Warehouse sample cube.

7.  Select Edit from the context menu that appears, as shown in Illustration 3.

Illustration 3: Select Edit from the Context Menu

The Cube Editor opens. The Schema tab appears as depicted in Illustration 4.

Illustration 4: Cube Editor - Schema Tab for the Warehouse Sample Cube

We will be creating a measure in the Cube Editor to enable us to make our distinct Product counts. Distinct Count can only exist within the context of a measure.

8.  Right-click the Measures folder in the Tree View to the left of the Schema tab.

9.  Select New Measure from the context menu.

The single-line context menu appears, as shown in Illustration 5.

Illustration 5: Select New Measure from the Context Menu

The Insert Measure dialog appears.

10.  Click-select product_id.

The Insert Measure dialog, selected measure circled in red, appears in Illustration 6.

Illustration 6: Select Product_Id from the Insert Measure Dialog

11.  Click OK to accept the selection.

The Insert Measure dialog closes, and we see the new measure appear (default name of Product_Id) in the Measures folder, as depicted in Illustration 7.

Illustration 7: Product_Id Appears in the Measures Folder (Circled)

12.  Click-select Product_Id in the Measures folder, if required.

13.  If necessary, click the downward arrow beneath the Cube Tree to open the Properties pane.

14.  Click the Basic tab.

15.  Modify the default Name of Product Id to the following:

Product Count

16.  Type the following into the empty Description box, just below the Name box:

Distinct Count - Products

17.  Click the box to the right of the Aggregate Function property label (at the bottom of the Basic tab), to enable the selector.

18.  Select Distinct Count in the Aggregate Function selector.

The Basic tab of the Properties pane appears, with our modifications, as shown in Illustration 8.

Illustration 8: Product Count Measure - Properties Pane - Basic Tab

19.  Click the Data tab, as if going to the Data View to perform a routine browse.

A warning briefly appears, indicating that sample data is being generated, and that the cube requires processing, as a result of our modifications. The sample data then appears, along with a static warning below it, as partially depicted in Illustration 9. The warning ensures that we are aware that the data is not what it might appear to be, and that the cube must be processed to make updated, actual data available.

Illustration 9: Data View (Partial and Compressed) - With "Sample Data" Warning at Bottom

Let's process the cube to activate our changes.

20.  Select File --> Save to save the cube in its modified state.

21.  Select Tools --> Process Cube to initialize the processing steps.

A message box appears, stating that the cube has no aggregations, and asking if we wish to design them at this time, as shown in Illustration 10.

Illustration 10: Aggregations Message Box - Just Say "No"

NOTE: The message box may not appear, if the cube has been altered with regard to aggregations since its installation as an MSAS sample. If not, the next box will appear instead, skipping this one.

22.  Click No to skip designing aggregations at present.

The Select the Processing Method dialog appears.

Full Processing is the default, and only option, as the Warehouse cube has not been processed since the structural change we have made to it.

The Select the Processing Method dialog appears, as depicted in Illustration 11.

Illustration 11: The Select the Processing Method Dialog

23.  Leaving settings at default, click OK.

Processing begins, and runs rapidly, as evidenced by the Process viewer's presentation of processing log events in real time. The Processing cycle ends and the success of the evolution is indicated by the appearance of the Processing Completed Successfully message (in green letters) at the bottom of the viewer, as shown in Illustration 12.

Illustration 12: Indication of Successful Processing

24.  Click Close.

We are returned to the Cube Editor. We can now browse the data and see our new Distinct Count measure in action.

25.  Click the Data tab, if necessary.

On the refreshed Data View, data appears in the default formation, ready for our manipulations and review. A portion of the Data View, depicting the Warehouse Profit and new Product Count measures, appears in Illustration 13.

Illustration 13: Warehouse Profit and Product Count Measures in the Data View

Now that we have a credible result set with which to compare, let's take a look at replicating the same results using MDX. We can leave the Data View as it is, for easy referral against our next results dataset, which we will generate independently within the MDX Sample Application.

Rendering Distinct Counts Using MDX

We now have a set of "answers" that we can attempt to replicate in direct MDX. Let's initialize the MDX Sample Application, as a platform from which to perform our practice exercises, taking the following steps:

1.     Start the MDX Sample Application.

We are initially greeted by the Connect dialog, shown in Illustration 14.

Click for larger image

Illustration 14: The Connect Dialog for the MDX Sample Application

The illustration above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2.  Click OK.

The MDX Sample Application window appears.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Warehouse cube in the Cube drop-down list box.

5.  Click File --> New to open a blank Query pane.

The MDX Sample Application window should resemble that depicted in Illustration 15, complete with the information from the Warehouse cube displaying in the Metadata tree (left section of the Metadata pane).

Illustration 15: The MDX Sample Application Window (Compressed View)

We will begin creating our query with a focus on returning results in the same general formation as the Data View we left in the Cube Editor. We will retrieve the Warehouse Profit and Product Count measures, as pictured in Illustration 13 above. Next, we will attempt to add a calculated measure that we craft directly in MDX, to replicate the distinct count information we obtained with the Product Count measure that we created in Analysis Manager earlier. This will afford us a side-by-side comparison between our "MDX solution" and the "Analysis Manager" approach we took in the last section.

1.  Create the following new query:

-- ANSYS31-1 Initial Attempt at Distinction
   { [MEASURES].[Warehouse Profit], [MEASURES].[Product Count],
   {[Product].CHILDREN} ON ROWS

The above represents an attempt to meet the information consumers' objectives - with what appears to be the straightforward use of the DISTINCTCOUNT() function. This might seem intuitive to a practitioner who has given up on the handful of non-working or nebulous examples that can be found on the web, (and which happen to be about all we seem to have as a basis for learning MDX, in many instances). While this approach ultimately fails to provide the desired solution, as we shall see, it should not be surprising that we might attempt this, given the definition in the Books Online, not to mention the words used in the name of the function itself. (Most will agree, also, that it is better to attempt it now, than when under the gun of an employer or a hurried client.)

The calculated member ProdCount embodies the function. I named it ProdCount to distinguish if from Product Count, the measure we created while within the user interface in the earlier section, which I have also decided to present within the results dataset for comparison purposes. Warehouse Profit is also presented to align with our Data View as we left it in the last section.

2.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 16.

Illustration 16: The Results Dataset - DISTINCTCOUNT() Approach

3.  Save the query as ANSYS31-1.

It doesn't require a huge leap of logic to conclude that the ProdCount calculated measure is generating a transaction count. The count is correctly "distinct," within its own (actual) meaning, but not at all what the information consumers have requested in our practice example.

Having seen why the "intuitive" approach is lacking, let's resort to another, more cumbersome approach, which results in the distinct product values that we seek.

4.  Create the following new query:

-- ANSYS31-2  Distinction at its Finest 
    ([Product].CURRENTMEMBER, [Product].[Product Name])), EXCLUDEEMPTY)'
   {[MEASURES]. [Warehouse Profit], [MEASURES].[Product Count], [MEASURES].[CalcCount] } 

The above "attempt at distinction" is embodied by the calculated measure CalcCount, named, again, simply as a means of distinguishing it from the measure we created in the Cube Editor, and which we include once again for comparison purposes.

The above approach may not have been the initial impulse that many of us had in tackling what seemed to be a straightforward replication of the Data View we saw earlier. What we are doing, in short, with the CrossJoin() function is marrying the Warehouse Profit values with the products, and returning (thanks to EXCLUDEEMPTY) a count of the non-empty pairings. The Descendants() function builds in flexibility, allowing us to apply the logic equally well to a group of products as to the full set of products. The key to this is the selection of the current member's descendents, adding the "relativity" that so pointedly underscores the power of the .CurrentMember function.

5.  Execute the query using the Run Query button.

The results dataset appears as shown in Illustration 17.

Illustration 17: The Results Dataset - Distinction Attained

6.  Save the query as ANSYS31-2.

The values for the new measure are in alignment with those of the measure we created in the Cube Editor. (All that remains to make the measures identical is the addition of formatting syntax).

7.  Exit the MDX Sample Application and Analysis Manager when ready.


In this article, we introduced the concept of distinct counts, discussing why they are often a requirement in our multidimensional analysis efforts, and those of the information consumers whom we support. In our introduction and overview, and throughout our examination of the objects and MDX syntax we explored to achieve our illustrative ends, we highlighted some of the challenges that are inherent in distinct counts.

We performed practice exercises, to illustrate solutions for hypothetical business needs that called upon the use of a distinct count capability, obtaining exposure to the options afforded us by the MSAS user interface, as well the MDX syntax involved with using the alternative solutions that we proposed.

We now have a basis in distinct counts that will allow us to examine more detailed nuances surrounding the capability. In subsequent articles, we will examine specific performance considerations inherent in the production of distinct counts, as well as options that are available to tune our efforts for more efficient operation. The need for distinct counts is a fact of business life, and mastery of the costs and results of this vital capability represent a unique opportunity to add another tool to our MSAS skill sets.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Mobile Site | Full Site