Business Intelligence Architect Bill Pearson examines the
expanded Distinct Count capabilities that debut with Analysis Services 2005.
About the Series ...
article is a member of the series Introduction to MSSQL Server Analysis
Services. The series is designed to provide hands-on application of
the fundamentals of MS SQL Server Analysis Services (Analysis
Services), with each installment progressively presenting features and
techniques designed to meet specific real-world needs. For more information on
the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools
needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this
In a couple of earlier articles of
this series, Introduction to MSSQL Server 2000 Analysis Services:
Distinct Count Basics: Two Perspectives and Introduction to MSSQL Server 2000 Analysis Services:
Manage Distinct Count with a Virtual Cube, I introduced the general concept of distinct
counts, discussing why they are useful (and often required) within the
design of any robust analysis effort. In these and other articles, I described
some of the challenges that were inherent in their use in Analysis Services
2000, before undertaking practice exercises to illustrate solutions to meet
example business requirements.
We have revisited distinct counts at points in other
articles within both my Introduction to MSSQL Server Analysis Services
and MDX Essentials series, examining specifics with regard to
appropriate use, and details of optimization within the perspective under
examination in the article concerned. In this article, we will introduce distinct
counts as they are managed in Analysis Services 2005. The redesign
of the capability based upon the hierarchy and attribute structure that debuts
in Analysis Services 2005 results in much more impressive performance
and flexibility of deployment within our integrated business intelligence
solutions, as many have come to report recently in blogs, forums, and other
In this article,
we will gain some hands-on exposure to distinct counts in Analysis
Services 2005. Our examination of the expanded capability will include:
surrounding the general concepts of distinct counts, including why they
are useful (and often required) within the design of any robust analysis effort.
of some of the challenges inherent with using distinct counts in Analysis
Services 2000, and how distinct counts have been redesigned in Analysis
Services 2005 to overcome some of these shortcomings of the previous
Creation of a
distinct count measure within a sample cube to demonstrate the ease with
which we can add distinct count capabilities to the cubes in our
individual business environments.
of other considerations that surround the use of distinct counts.
Distinct Counts in Analysis Services 2005
Anyone working within the
realm of business intelligence and general analysis realizes, in short order,
that we often encounter the need to quantify precisely the members of
various sets of data. Those of us who have become familiar with Analysis
Services are aware of its capabilities, when it comes to categorizing and
aggregating data within the hierarchical contexts of dimensions and attributes.
We can, for the most part, readily tap these capabilities from the user
interface that Analysis Services provides. Through the exploitation of
more advanced approaches, including the use of calculated members / measures,
and multidimensional expressions (MDX) in general, we can extend our analysis
even further, and leverage Analysis Services to reach far more specific
One of the basic
requirements that comes into play, at least in some form, in many analysis
scenarios, is the need to count the members of a set targeted for
analysis. An example might be the need to count the number of products we have
shipped from a given warehouse, or group of warehouses, to a given geographical
location, or to a specific group of stores. This can be accomplished readily
enough with the Count() function, as most of us are aware.
Count() does a great job of giving us a total
count. Of course, the results we would achieve in using Count()
with products, in the scenarios above, would represent total number of
products shipped. What we would not get, and what we might find far more
useful in some situations, would be a count of the different products
that were shipped. Count(), in providing a total number, would also be
providing multiple counts of the same products, because products will
have been shipped multiple times, in many instances. To reach our objective of
counting different products, then, we would need to count each
different product shipped, only once. To count them multiple times not
only misstates the number of different products, but it also likely
renders averages, and other metrics based upon the count value, meaningless or
The word different here
is easily supplanted by distinct. Moreover, as many of us are aware, the
performance of distinct counts has historically presented a challenge in
the OLAP world. Lets introduce a simple example that illustrates the
challenge, and then transform that challenge to an opportunity to meet an
illustrative business need, using the newly expanded distinct count
capabilities found within Analysis Services 2005.
Adding a Distinct Count Measure within the Cube
Lets consider an example of a need for a distinct count, within the familiar context of a sample cube that is available to anyone who has installed Analysis Services 2005. For the purposes of our practice exercise, we will say that client colleagues at the Adventure Works organization have asked that we assist them in working with a cube they have inherited from the original implementation team. (The team has recently turned over the system to local employees and departed the scene.)
The client has asked us to assist with the creation of a distinct count of Products within the existing fact table, where the sales data for the organization is currently captured for reporting and analysis purposes. The client representatives with whom we are working tell us that the measure will be used in various averages and other calculations, among additional possible applications. Moreover, our giving them some guidance in how to accomplish such a distinct count will come in handy as the need arises elsewhere, in other data structures and so forth.
When we examine the schema for the fact table, FactInternetSales, we note that Products are represented in the table with the ProductKey column, as shown in Illustration 1.
Illustration 1: Product Keys in the Fact Table
Because every row of the table contains a Product Key, and because multiple sales of each product occur within the time period represented by the fact table data, it is relatively easy to see that a simple count of the keys would dramatically overstate the total number of distinct, or unique, Products. We explain to our client colleagues that we can create a distinct count measure to handle needs of this sort, and propose to demonstrate the steps in the section that follows.
Procedure: Establish a Distinct Count Capability in Analysis Services
We begin our efforts by opening the cube within a project in the Business Intelligence Development Studio. I like to set up a lab environment for each of my client or research projects where I have both the respective cubes and reports involved with the engagement within an integrated solution in Visual Studio. This ensures ease in testing cube modifications through to the report layer from a single, central location, as well as providing the advantage of effective source control and numerous other conveniences. (In this particular case, I have both a copy of the sample Adventure Works DW and the AdventureWorks Sample Reports projects added into a single solution within the Business Intelligence Development Studio, where I can access all member objects from one point, the Solution Explorer.)
1. Open the Adventure Works cube from within the Solution Explorer.
2. Once the Cube Designer opens, select the Cube Structure tab, as required.
3. Select Cube -> New Measure ... from the main menu, as depicted in Illustration 2.
Illustration 2: Creating a New Measure in the Cube
4. Select Distinct count in the Usage selector atop the New Measure dialog that appears.
5. Leaving the Source table selected at the default of FactInternetSales, select ProductKey (in the Source column list) by clicking it, as shown in Illustration 3.
Illustration 3: Settings within the New Measure Dialog
6. Click OK to accept the selection and create the new measure.
7. Right-click the new Internet Sales 1 measure group that appears next, and rename it (via the Rename selection on the context menu that appears) to Product Distinct Count. Rename the measure that appears beneath it (expand the measure group, if necessary to see it) to the Distinct Products.
The renamed measure group and its sole member measure appear as depicted in Illustration 4.
Illustration 4: Newly Created Measure with Containing Measure Group ...
As we see through the results of our actions above, Analysis Services has created not only the distinct count measure itself, but it has also constructed a free-standing measure group to house it. Were we to add more distinct count measures, we would see that a separate measure group is created for each measure. Analysis Services allows only one measure with the aggregation function DISTINCT COUNT (a property which we set earlier via the New Measure dialog in our most recent steps above) in any single measure group. Moreover, because of the pronounced differences in how Analysis Services manages distinct count measures, Microsoft recommends that we avoid having a measure with any other aggregation function within a measure group containing a distinct count measure.
We can examine the properties for the new distinct count measure, and see the aggregation function setting to which I refer, by simply right-clicking the Product Distinct Count measure and selecting Properties from the context menu that appears (the Properties pane would appear within the Business Intelligence Development Studio anyway, assuming default settings, if we had selected the measure), as shown in the compressed view of Illustration 5.
Illustration 5: Properties Pane for the Newly Created Distinct Count Measure
(Compressed View of the Studio)
The next step will be to deploy the project, which we will do in preparation of verifying our handiwork via the Cube Browser.
8. Deploy the Analysis Services project within which you're working.
Verify Operation of the
Distinct Count Measure in the Browser
take a look at our new distinct count measure in the Cube Browser,
so as to verify its operation.
Click the Browser
tab within the design environment.
Click the Reconnect
button in the Browser toolbar.
Expand the Customer
dimension in the cube metadata tree, and then expand the Customer Geography
hierarchy, if necessary. Drag the Country level therein to the
empty data grid, dropping it in the area marked Drop Row Fields Here, as
depicted in Illustration 6.
6: Adding the Country Level to the Browser Rows Axis ...
Expand the Date
dimension in the cube metadata tree, and then expand the Calendar folder,
as necessary. Drag the Date.Calendar Year
attribute hierarchy into the area marked Drop Column Fields Here.
and then the Internet Sales Measure Group. Drag the Internet Order
Quantity measure to the area of the grid marked Drop Totals or Detail
within Measures, expand the new Product Distinct Count Measure Group,
if necessary. Drag the new Distinct Products measure to the area of the
grid marked Drop Totals or Detail Fields Here, dropping it to the right
of the first column containing the Internet Order Quantity measure.
partial view of our newly assembled and populated browser grid is shown in Illustration
7: The Browser with our Viewer Settings (Partial View) ...
we see that our distinct count measure appears to operate as expected.
Our verification is made easier by comparing a pure count to our distinct
count measure, which will, of course, typically be significantly smaller than
the gross count value.
and enjoy the multiple benefits we have enumerated, within our own business
environments, as easily as we have done in our practice session. Distinct
counts rank highly among a host of design improvements in Analysis
In this article, we examined distinct
counts in Analysis Services 2005. We reviewed the general concept
of distinct counts, discussing why they are useful (and often required)
within the design of any robust analysis effort. In other articles, I have described
some of the challenges that were inherent in their use in Analysis Services
2000, typically before undertaking a practice exercise whereby we
constructed a distinct count measure to meet the business requirements
of a hypothetical client.
As a part of introducing distinct counts as they are
managed in Analysis Services 2005, we discussed various aspects of the
redesign of the capability, based upon the hierarchy and attribute structure
that debuts in Analysis Services 2005. We noted that the enhancement of
distinct counts results in much more impressive performance, and
flexibility of deployment, within our integrated business intelligence
solutions. We contrasted the new capabilities with some of the challenges inherent with using distinct
counts in Analysis Services 2000, focusing on how distinct counts
have been expanded in Analysis Services 2005 to overcome some of the shortcomings
of the previous version.
We then created a distinct count measure within the
Adventure Works sample cube to demonstrate the ease with which we can
add distinct count capabilities to the cubes in our individual business
environments. Finally, we verified the operation of the distinct count
measure using the browser within our freshly deployed cube. Throughout the
steps of our practice exercise, we highlighted other considerations that
surround the efficient use of distinct counts.
See All Articles by Columnist William E. Pearson, III
Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.