Distinct Counts in Analysis Services 2005

Monday Jun 11th 2007 by William Pearson
Share:

Business Intelligence Architect Bill Pearson examines the expanded Distinct Count capabilities that debut with Analysis Services 2005.

About the Series ...

This article is a member of the series Introduction to MSSQL Server Analysis Services. The series is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this series.

Introduction

In a couple of earlier articles of this series, Introduction to MSSQL Server 2000 Analysis Services: Distinct Count Basics: Two Perspectives and Introduction to MSSQL Server 2000 Analysis Services: Manage Distinct Count with a Virtual Cube, I introduced the general concept of distinct counts, discussing why they are useful (and often required) within the design of any robust analysis effort. In these and other articles, I described some of the challenges that were inherent in their use in Analysis Services 2000, before undertaking practice exercises to illustrate solutions to meet example business requirements.

We have revisited distinct counts at points in other articles within both my Introduction to MSSQL Server Analysis Services and MDX Essentials series’, examining specifics with regard to appropriate use, and details of optimization within the perspective under examination in the article concerned. In this article, we will introduce distinct counts as they are managed in Analysis Services 2005. The redesign of the capability based upon the hierarchy and attribute structure that debuts in Analysis Services 2005 results in much more impressive performance and flexibility of deployment within our integrated business intelligence solutions, as many have come to report recently in blogs, forums, and other media outlets.

In this article, we will gain some hands-on exposure to distinct counts in Analysis Services 2005. Our examination of the expanded capability will include:

  • A discussion surrounding the general concepts of distinct counts, including why they are useful (and often required) within the design of any robust analysis effort.
  • An examination of some of the challenges inherent with using distinct counts in Analysis Services 2000, and how distinct counts have been redesigned in Analysis Services 2005 to overcome some of these shortcomings of the previous version.
  • Creation of a distinct count measure within a sample cube to demonstrate the ease with which we can add distinct count capabilities to the cubes in our individual business environments.
  • A discussion of other considerations that surround the use of distinct counts.

Distinct Counts in Analysis Services 2005

Anyone working within the realm of business intelligence and general analysis realizes, in short order, that we often encounter the need to quantify precisely the members of various sets of data. Those of us who have become familiar with Analysis Services are aware of its capabilities, when it comes to categorizing and aggregating data within the hierarchical contexts of dimensions and attributes. We can, for the most part, readily tap these capabilities from the user interface that Analysis Services provides. Through the exploitation of more advanced approaches, including the use of calculated members / measures, and multidimensional expressions (“MDX”) in general, we can extend our analysis even further, and leverage Analysis Services to reach far more specific objectives.

One of the basic requirements that comes into play, at least in some form, in many analysis scenarios, is the need to count the members of a set targeted for analysis. An example might be the need to count the number of products we have shipped from a given warehouse, or group of warehouses, to a given geographical location, or to a specific group of stores. This can be accomplished readily enough with the Count() function, as most of us are aware.

Count() does a great job of giving us a total count. Of course, the results we would achieve in using Count() with products, in the scenarios above, would represent total number of products shipped. What we would not get, and what we might find far more useful in some situations, would be a count of the different products that were shipped. Count(), in providing a total number, would also be providing multiple counts of the same products, because products will have been shipped multiple times, in many instances. To reach our objective of counting different products, then, we would need to count each different product shipped, only once. To count them multiple times not only misstates the number of different products, but it also likely renders averages, and other metrics based upon the count value, meaningless or misleading.

The word “different” here is easily supplanted by “distinct.” Moreover, as many of us are aware, the performance of distinct counts has historically presented a challenge in the OLAP world. Let’s introduce a simple example that illustrates the challenge, and then transform that challenge to an opportunity to meet an illustrative business need, using the newly expanded distinct count capabilities found within Analysis Services 2005.

Adding a Distinct Count Measure within the Cube

Let’s consider an example of a need for a distinct count, within the familiar context of a sample cube that is available to anyone who has installed Analysis Services 2005. For the purposes of our practice exercise, we will say that client colleagues at the Adventure Works organization have asked that we assist them in working with a cube they have inherited from the original implementation team. (The team has recently turned over the system to local employees and departed the scene.)

The client has asked us to assist with the creation of a distinct count of Products within the existing fact table, where the sales data for the organization is currently captured for reporting and analysis purposes. The client representatives with whom we are working tell us that the measure will be used in various averages and other calculations, among additional possible applications. Moreover, our giving them some guidance in how to accomplish such a distinct count will come in handy as the need arises elsewhere, in other data structures and so forth.

When we examine the schema for the fact table, FactInternetSales, we note that Products are represented in the table with the ProductKey column, as shown in Illustration 1.


Illustration 1: Product Keys in the Fact Table

Because every row of the table contains a Product Key, and because multiple sales of each product occur within the time period represented by the fact table data, it is relatively easy to see that a simple count of the keys would dramatically overstate the total number of distinct, or unique, Products. We explain to our client colleagues that we can create a distinct count measure to handle needs of this sort, and propose to demonstrate the steps in the section that follows.

Procedure: Establish a Distinct Count Capability in Analysis Services

We begin our efforts by opening the cube within a project in the Business Intelligence Development Studio. I like to set up a lab environment for each of my client or research projects where I have both the respective cubes and reports involved with the engagement within an integrated solution in Visual Studio. This ensures ease in testing cube modifications through to the report layer from a single, central location, as well as providing the advantage of effective source control and numerous other conveniences. (In this particular case, I have both a copy of the sample Adventure Works DW and the AdventureWorks Sample Reports projects added into a single solution within the Business Intelligence Development Studio, where I can access all member objects from one point, the Solution Explorer.)

1.  Open the Adventure Works cube from within the Solution Explorer.

2.  Once the Cube Designer opens, select the Cube Structure tab, as required.

3.  Select Cube -> New Measure ... from the main menu, as depicted in Illustration 2.


Illustration 2: Creating a New Measure in the Cube

4.  Select Distinct count in the Usage selector atop the New Measure dialog that appears.

5.  Leaving the Source table selected at the default of FactInternetSales, select ProductKey (in the Source column list) by clicking it, as shown in Illustration 3.


Illustration 3: Settings within the New Measure Dialog

6.  Click OK to accept the selection and create the new measure.

7.  Right-click the new Internet Sales 1 measure group that appears next, and rename it (via the Rename selection on the context menu that appears) to Product Distinct Count. Rename the measure that appears beneath it (expand the measure group, if necessary to see it) to the Distinct Products.

The renamed measure group and its sole member measure appear as depicted in Illustration 4.


Illustration 4: Newly Created Measure with Containing Measure Group ...

As we see through the results of our actions above, Analysis Services has created not only the distinct count measure itself, but it has also constructed a free-standing measure group to house it. Were we to add more distinct count measures, we would see that a separate measure group is created for each measure. Analysis Services allows only one measure with the aggregation function DISTINCT COUNT (a property which we set earlier via the New Measure dialog in our most recent steps above) in any single measure group. Moreover, because of the pronounced differences in how Analysis Services manages distinct count measures, Microsoft recommends that we avoid having a measure with any other aggregation function within a measure group containing a distinct count measure.

We can examine the properties for the new distinct count measure, and see the aggregation function setting to which I refer, by simply right-clicking the Product Distinct Count measure and selecting Properties from the context menu that appears (the Properties pane would appear within the Business Intelligence Development Studio anyway, assuming default settings, if we had selected the measure), as shown in the compressed view of Illustration 5.


Illustration 5: Properties Pane for the Newly Created Distinct Count Measure
(Compressed View of the Studio)

The next step will be to deploy the project, which we will do in preparation of verifying our handiwork via the Cube Browser.

8.  Deploy the Analysis Services project within which you're working.

Verify Operation of the Distinct Count Measure in the Browser

Let’s take a look at our new distinct count measure in the Cube Browser, so as to verify its operation.

1.  Click the Browser tab within the design environment.

2.  Click the Reconnect button in the Browser toolbar.

3.  Expand the Customer dimension in the cube metadata tree, and then expand the Customer Geography hierarchy, if necessary. Drag the Country level therein to the empty data grid, dropping it in the area marked Drop Row Fields Here, as depicted in Illustration 6.


Illustration 6: Adding the Country Level to the Browser Rows Axis ...

4.  Expand the Date dimension in the cube metadata tree, and then expand the Calendar folder, as necessary. Drag the Date.Calendar Year attribute hierarchy into the area marked Drop Column Fields Here.

5.  Expand Measures, and then the Internet Sales Measure Group. Drag the Internet Order Quantity measure to the area of the grid marked Drop Totals or Detail Fields Here.

6.  Again within Measures, expand the new Product Distinct Count Measure Group, if necessary. Drag the new Distinct Products measure to the area of the grid marked Drop Totals or Detail Fields Here, dropping it to the right of the first column containing the Internet Order Quantity measure.

A partial view of our newly assembled and populated browser grid is shown in Illustration 7.


Illustration 7: The Browser with our Viewer Settings (Partial View) ...


And so we see that our distinct count measure appears to operate as expected. Our verification is made easier by comparing a “pure count” to our distinct count measure, which will, of course, typically be significantly smaller than the gross count value.

We can leverage distinct count measures, and enjoy the multiple benefits we have enumerated, within our own business environments, as easily as we have done in our practice session. Distinct counts rank highly among a host of design improvements in Analysis Services 2005.

Conclusion

In this article, we examined distinct counts in Analysis Services 2005. We reviewed the general concept of distinct counts, discussing why they are useful (and often required) within the design of any robust analysis effort. In other articles, I have described some of the challenges that were inherent in their use in Analysis Services 2000, typically before undertaking a practice exercise whereby we constructed a distinct count measure to meet the business requirements of a hypothetical client.

As a part of introducing distinct counts as they are managed in Analysis Services 2005, we discussed various aspects of the redesign of the capability, based upon the hierarchy and attribute structure that debuts in Analysis Services 2005. We noted that the enhancement of distinct counts results in much more impressive performance, and flexibility of deployment, within our integrated business intelligence solutions. We contrasted the new capabilities with some of the challenges inherent with using distinct counts in Analysis Services 2000, focusing on how distinct counts have been expanded in Analysis Services 2005 to overcome some of the shortcomings of the previous version.

We then created a distinct count measure within the Adventure Works sample cube to demonstrate the ease with which we can add distinct count capabilities to the cubes in our individual business environments. Finally, we verified the operation of the distinct count measure using the browser within our freshly deployed cube. Throughout the steps of our practice exercise, we highlighted other considerations that surround the efficient use of distinct counts.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved