MDX Essentials: Basic Set Functions: The TopCount() Function, Part II

Monday Jun 6th 2005 by William Pearson
Share:

Join MSAS Architect Bill Pearson in an extended examination of the TopCount() function. Discover, through hands-on practice exercises, more sophisticated business uses for TopCount(), in combination with the Generate(), .Item and other functions we have explored within our series.

About the Series ...

This article is a member of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements for getting the most out of the lessons included, please see my first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Current updates are assumed for MSSQL Server, MSSQL Server Analysis Services, and the related Books Online and Samples.

Overview

In this article, we will continue the examination of the TopCount() function that we began in the previous article of the MDX Essentials series, Basic Set Functions: The TopCount() Function, Part I. Before getting "hands-on" with TopCount() in our last session, we introduced the function, stating that its highest value lies in the fact that we can use it as a means for isolating best performers from among potentially large populations of fellow members. We emphasized that this ranking capability is a critical need in many data analysis and decision support scenarios, where we often seek to report upon best performers for various reasons.

We learned that TopCount() facilitates this ranking capability, allowing us to sort on the numeric value expression upon which we wish to focus our analysis. We can direct the function to retrieve the exact number of "top" members we wish to isolate (say, the "top ten," or the "top twenty"), for a "custom-fit" approach, that precisely matches the analysis needs of our own environments.

In this article, we will extend our examination of the TopCount() function to include somewhat more sophisticated uses. We will dive right into a couple of practical scenarios, where we will further evolve our understanding of the operation of TopCount() through:

  • Defining illustrative business needs as posed to us by hypothetical groups of information consumers;
  • Discuss the needs from the perspective of MDX in general, and the TopCount() function in particular;
  • Construct queries to meet the expressed requirements using combinations of TopCount() and other MDX functions we have explored in articles of the series;
  • Discuss the syntax contained within the solutions we construct for the information consumers;
  • Briefly discuss the results datasets we obtain in executing the MDX queries we construct.

To review an introduction to the TopCount() function before beginning this article, together with examples of its use in basic scenarios, see my article Basic Set Functions: The TopCount() Function, Part I, a member of the MDX Essentials series at Database Journal.

Combining TopCount() with other MDX Functions to Add Sophistication

In our introductory article, we examined the syntax for the TopCount() function, and then looked at its behavior, based upon arguments we provided to achieve our ends. We learned that TopCount() sorts a set we specify by another expression we provide within the function, thus breaking the natural hierarchy of the set. In returning "a specified number of items from the topmost members of a specified set, optionally ordering the set first," we noted, the TopCount() function is an excellent example of the potential power of MDX.

We learned that we specify three parameters to TopCount(), a set, a count, and a measure, and that we can return the number of top performers (or "top anything," in effect), based upon our input, using the syntax shown in the following string:

TopCount(<< Set >>, << Count >> [,<< Numeric Expression >>])

Sophisticated results can be generated using TopCount() when we supply well-considered parameters to the <<Set>> and <<Numeric Expression>> arguments, as we shall see in the practice exercises that follow.

Practice

Perform a Topcount() within the Context of Another Dimensional Level

Because we discussed syntax and other considerations that accompany the basic use of Topcount() in our previous article, we will move directly into the practice examples in this article. The examples are a bit more elaborate, and will require somewhat detailed explanations, as they involve the juxtaposition of TopCount() with other MDX functions. Consistent with our routine approach within the MDX Essentials series, we will use TopCount() within the context of meeting illustrative business needs for a group of hypothetical information consumers in the examples that follow.

Let's return to the MDX Sample Application as a platform from which to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Start the MDX Sample Application.

2.  Clear the top area (the Query pane) of any queries or remnants that might appear.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Warehouse cube in the Cube drop-down list box.

Let's assume, for our practice example, that we have been contacted by a group of information consumers within the FoodMart organization requesting some assistance with a new reporting requirement. The group, composed of analysts from the Finance and Accounting departments of the organization, wishes to be able to present 1998 Warehouse Profit for the three Warehouses with the highest Warehouse Profit in each of the States (the group is only concerned with U.S. operations from the perspective of this business need). We are told that the Warehouse cube, which houses the lion's share of the FoodMart Warehouse data, contains the information we need to meet the business requirement.

Working with the consumers, we draft a sketch of the proposed report layout, illustrated in Table 1, to confirm our understanding of the requirement.

State

Warehouse Profit

Warehouse 1

CA

(Highest Profit in CA)

Warehouse 2

CA

(Next Highest Profit in CA)

Warehouse 3

CA

(Third Highest Profit in CA)

Warehouse 1

OR

(Highest Profit in OR)

Warehouse 2

OR

(Next Highest Profit in OR)

Warehouse 3

OR

(Third Highest Profit in OR)

Warehouse 1

WA

(Highest Profit in WA)

Warehouse 2

WA

(Next Highest Profit in WA)

Warehouse 3

WA

(Third Highest Profit in WA)


Table 1: Results Dataset, with TopCount() Defining Columns

The challenge in the requirement we have described lies within the fact that we are working within multiple levels within the Warehouse dimension to produce the requested results. We are, in effect, being asked to classify the three top Warehouse performers by a higher level in the Warehouse dimension (State Province). In order to do this, we will employ the TopCount() function as we have done in previous scenarios, but this time in conjunction with other MDX functions, to present the State information associated with each of the top performing Warehouses.

Let's construct part of the query to get started. First, we will construct the part with which we are familiar, from our previous article at least, the core TopCount() function.

5.  Type the following query into the Query pane:


-- MDX032-1-1, TopCount() and Descendants() Combination to retrieve "3 Most 
--      Profitable Warehouses in 1998"
SELECT
     {[Measures].[Warehouse Profit]}  ON COLUMNS,
     {TOPCOUNT(
          DESCENDANTS(
               [Warehouse].CURRENTMEMBER, [Warehouse].[Warehouse Name]
          ),3,[Measures].[Warehouse Profit])} ON ROWS
FROM
     [WAREHOUSE]
WHERE
      ([Time].[Year].[1998])

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 1 appears.


Illustration 1: Results Dataset - Simple Use of the TopCount() Function

As we might have expected, this simple approach hardly achieves our ultimate objectives: it returns the top three Warehouses from the perspective of Warehouse Profit for 1998 - but the top three from the entire FoodMart organization. We will next advance beyond our previous basic queries by employing another MDX function, Descendants(), to supply the <<Set>> argument for the TopCount() function, but have a bit more to do to reach our goal of presenting the top performer Warehouses by U.S. Warehouse State.

NOTE: For information surrounding the Descendants() function, see my article MDX Member Functions: The "Family" Functions, a member of the MDX Essentials series at Database Journal. For an introduction to the .CurrentMember function, see MDX Member Functions: "Relative" Member Functions, an article in the same series.

7.  Select File -> Save As, name the file MDX032-1-1, and place it in a meaningful location.

8.  Leave the query open for the next section.

Next, we need to add logic to generate the same "top three" data, but for an ancestor of Warehouse (a level in the Warehouse hierarchy), the State it inhabits. To do so, we will introduce more MDX functions, as we see in the next step.

9.  Within the query we have saved as MDX032-1-1, replace the top comment line of the query with the following:


-- MDX032-1-2, TopCount(), combined with Ancestor() and .Name  
--   in a calculated member, to provide Warehouse State with each of "Top 3 
--        Most Profitable Warehouses in FoodMart Organization in 1998"

10.  Save the query as MDX032-1-2, to keep MDX032-1-1 intact as a working sample.

11.  Add the following lines to the query, between the top comment line we just replaced, and the SELECT keyword that begins the query:


WITH 
MEMBER 
     [Measures].[State]
AS 
     'ANCESTOR([Warehouse].CURRENTMEMBER, 
          [Warehouse].[State Province]).NAME'

This will create the calculated member State, which will allow us to present the States alongside their respective member Warehouses.

NOTE: For an introduction to the Ancestor() function, see my article MDX Member Functions: The "Family" Functions, a member of the MDX Essentials series at Database Journal.

12.  Modify the ON COLUMNS line of the query, which already contains the Warehouse Profit measure, to contain the new State calculated member we defined above, as follows:

{[Measures].[State], [Measures].[Warehouse Profit]}  ON COLUMNS,

13.  Leave the remainder of the query in its original state.

The Query pane appears as depicted in Illustration 2, with our modifications marked.


Illustration 2: The Query with Added Calculated Member

14.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 3 appears.


Illustration 3: Results Dataset - With Our Modifications

We note that the State to which each of the "top three" warehouses (still the top three for the entire FoodMart organization) belongs appears in the results. Our calculated member has taken us a step closer to realization of the information consumers' requirements.

15.  Re-save the file as MDX032-1-2.

16.  Leave the query open for the next step.

At this point, all that remains is to generate the "top three" data for each of the Warehouse States, limiting the States to the three American States within which FoodMart operates Warehouses. The word "generate" is particularly apt here: we will use the Generate() function in our efforts to return the "top three" from each of the three U.S. States.

17.  Within the query we have saved as MDX032-1-2, replace the top comment line of the query with the following:


-- MDX032-1-3, Addition of Generate() to retrieve 
--   "3 Most Profitable Warehouses in Each U. S. State in 1998"

18.  Save the query as MDX032-1-3, to keep MDX032-1-2 intact as a working sample.

19.  Substitute the following for the entire existing ON ROWS line within the SELECT statement of the query:


{GENERATE([Warehouse].[USA].CHILDREN, 
    TOPCOUNT(
         DESCENDANTS(
              [Warehouse].CURRENTMEMBER, [Warehouse].[Warehouse Name]
                    ),3,[Measures].[Warehouse Profit] ) )} ON ROWS

(In actuality, we are only adding the Generate() statement to the "front end" of the ON ROWS specification, as it already existed from earlier steps, with a closing right parenthesis - ")" - just before the ON ROWS keyword).

The Generate() statement builds a set based upon the top performer Warehouses, as retrieved by the TopCount() function (with Warehouse Profit as the <<Numeric Expression>> argument, and the Descendants() function, once again, specifying the <<Set>> argument as the individual Warehouses), limited to U.S. States (the "children" of the USA level of the Warehouse dimensional hierarchy.

NOTE: For information surrounding the Generate() function, see my MDX Essentials article Set and String Functions: The GENERATE() Function, (where we perform a combination of Generate() and TopCount(), as well). For an introduction to the .Children function, see MDX Member Functions: The "Family" Functions, within the same series.

20.  Leave the remainder of the query in its original state.

The Query pane appears as depicted in Illustration 4, with our addition marked.


Illustration 4: The Query with Substituted ON ROWS Line

21.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 5 appears.


Illustration 5: Results Dataset - After Final Modifications

Finally, we see the presentation as requested by the information consumers: 1998 Warehouse Profit for the three Warehouses with the highest Warehouse Profit for each of the U. S. States within which FoodMart conducts Warehouse operations. (We note that the State of Oregon shows only two Warehouses, but can easily verify that this is because only two Warehouse locations exist within the State, by expanding the Warehouse dimension at the OR level in the tree pane below the Cube selector in the MDX Sample Application, as depicted in Illustration 6.


Illustration 6: Verifying that Only Two Warehouses Exist in Oregon ...

22.  Re-save the file as MDX032-1-3.

Perform a Topcount() within the Context of Another Dimension, and Return a Related String for Presentation Purposes

Our next exercise comes in the form of a request from the same group of information consumers (the Accounting and Finance department at FoodMart), who have expressed satisfaction with our first solution. This request also involves "top counts" of Warehouses, and on the surface may sound simple to many of us. The consumers want a query that will produce a plain list containing the top individual Warehouse, for each of the twelve Months of 1998 (denominated simply in numbers one to twelve, as they appear in the Warehouse cube), this time in terms of number of Units Shipped within that month.

We again draft a sketch of the proposed report layout, to confirm our understanding and to agree fully on the requirement. The sketch appears as illustrated in Table 2.

Largest Shipper

Units Shipped

1

Warehouse Location shipping largest no of units in Month 1

Total No. Units

2

'' '' '' '' '' '' '' '' '' 2

'' '' "

3

'' '' '' '' '' '' '' '' '' 3

'' '' "

4

'' '' '' '' '' '' '' '' '' 4

'' '' "

5

'' '' '' '' '' '' '' '' '' 5

'' '' "

6

'' '' '' '' '' '' '' '' '' 6

'' '' "

7

'' '' '' '' '' '' '' '' '' 7

'' '' "

8

'' '' '' '' '' '' '' '' '' 8

'' '' "

9

'' '' '' '' '' '' '' '' '' 9

'' '' "

10

'' '' '' '' '' '' '' '' '' 10

'' '' "

11

'' '' '' '' '' '' '' '' '' 11

'' '' "

12

'' '' '' '' '' '' '' '' '' 12

'' '' "


Table 2: Results Dataset, with TopCount() Defining Columns

We agree to pursue a solution, and consider the requirement in bite-size pieces. One challenge that stands out immediately is the apparent need to return the Warehouse Name, a text string, from the lowest level of the Warehouse dimension. TopCount() is still at the heart of our contemplated approach, however, with the Units Shipped the measure upon which it is based.

Let's construct part of the query to get started. First, we will construct a query that lists all Warehouses, and respective quantities shipped, for each of the Months in 1998. This serves to build the core ON ROWS specification, the "months" part of which we will use within our final solution. It also serves to present the quantities for each Warehouse, each Month, allowing us to "proof" the accuracy of the results of the next step, where we generate the name of the top Warehouse from the perspective of shipping, for each Month.

1.  Select File --> New to open a blank Query pane.

2.  Type the following query into the Query pane:


-- MDX032-2-1, Shipping Volume Qty by Month, 1998, for Individual U.S.    
    Warehouses"
SELECT 
  {[Measures].[Units Shipped]} ON COLUMNS,
  NONEMPTYCROSSJOIN(
      {DESCENDANTS([Time].[1998], [Time].[Month])},
            {DESCENDANTS( [Warehouse].[All Warehouses].[USA], 
                  [Warehouse].[Warehouse Name] )})
  ON ROWS
FROM 
              [WAREHOUSE]

3.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset partially shown in Illustration 7 appears.


Illustration 7: Result Dataset - Quantities for All U.S. Warehouses, by Month in 1998

4.  Select File -> Save As, name the file MDX032-2-1, and place it in a meaningful location.

5.  Leave the query open for the next section.

At this stage, we can easily discern the largest number for each month within the measure column, which identifies at a glance, of course, the top shipping Warehouse for each month. Next, we will add the MDX required to generate the name of the top shipper for each month, initially placing the name next to the column displaying shipping volumes for each Warehouse, for easy verification of the accuracy of the name produced.

6.  Within the query we have saved as MDX032-2-1, replace the top comment line of the query with the following:


-- MDX032-2-2, Addition of Calculated Member to Produce Name of Warehouse with 
--   Highest 1998 Volume Monthly Shipping

7.  Save the query as MDX032-2-2, to keep MDX032-2-1 intact as a working sample.

8.  Add the following lines to the query, between the top comment line we just replaced, and the SELECT keyword that begins the query:


WITH 
MEMBER
     [Measures].[Largest Shipper]
AS
    'TOPCOUNT( DESCENDANTS( 
            [Warehouse].[All Warehouses].[USA], [Warehouse].
                [Warehouse Name] ), 
                     1,  (Time.CURRENTMEMBER, [Measures].[Units Shipped]) 
                ).ITEM(0).ITEM(0).NAME'

This will create the calculated member Largest Shipper, which will allow us to present a string containing the name of the Warehouse shipping the largest volumes for each month, alongside the member Warehouses' respective volumes.

The calculated member we create with this MDX is a "busy place." It accomplishes the TopCount(), providing as the << Set >> argument a Descendants() function. We saw the TopCount() / Descendants combination in very similar action in the first exercise of the series, and so I won't go into that any further here, other than to point out that the function is structured to return the set of the members of the Warehouse dimension that belong to the level it specifies, the Warehouse Name level. It is from this level that we are able to obtain the string representing the Warehouse Name.

The TopCount() function is provided a << Count >> argument of "1" , obviously, as we seek to return the top Warehouse with regard to the Units Shipped measure, the final argument of the function, << Numeric Expression >>. The interim result of the TopCount() function, therefore, is to return a single-member for each month, which enables us to leverage the .Item function. .Item, in turn allows us to exploit its capability to return the member in a tuple, based upon its position. By using .Item(0), we select the first single-member (the product of the TopCount() of 1) tuple in each Month set. (.Item contains a zero-based specification of position, hence the zero as the position argument.) We then use .Item(0) again, immediately after the first .Item function, to select the first (and only) member in the single-member tuple.

Finally, we use the .Name function to return the name of the member that we have so precisely isolated. This function, combined with the functions just described, result in a calculated member that returns the name of the Warehouse with the highest shipping quantity in a given Month in 1998, which is handily identified by the TopCount() / Descendants() combination we discussed earlier.

NOTE: For an introduction to the .Item function, see my article Basic Member Functions: The .Item Function, another member of the MDX Essentials series at Database Journal.

9.  Modify the ON COLUMNS line of the query, which already contains the Units Shipped measure, to contain the new Largest Shipper calculated member / measure we defined above, as follows:


{[Measures].[Units Shipped], [Measures].[Largest Shipper]} ON COLUMNS,

10.  Leave the remainder of the query in its original state.

The Query pane appears as depicted in Illustration 8, with our modifications marked.


Illustration 8: The Query with Added Calculated Member -

11.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the resulting dataset, partially shown in Illustration 9, appears.


Illustration 9: Result Dataset - New Calculated Member Appears (Partial View)

We note that the Warehouse Names of the locations with the highest shipping volumes appear. The presence of the months of 1998, which populate the rows already, result in the evaluation of the Largest Shipper calculated measure for each month respectively, giving us an outcome that meets the expressed requirements of the information consumers.

12.  Re-save the file as MDX032-2-2.

13.  Leave the query open for the next step.

Using the results from our latest query execution, we can easily verify that the calculated member is producing the results that its name purports. An example of such verification appears in Illustration 10, where we see that the top shipper Warehouse Name returned by our calculated member reflects the name of the Warehouse that displays the highest value in the monthly Units Shipped measure at its left.


Illustration 10: Verifying the Largest Shipper Name Returned as Correct (Partial View)

All that remains, to align the results dataset with the presentation requirements specified by the information consumers, is to remove the Units Shipped measure. Recall that we added this measure earlier simply to allow us to verify the correct operation of the Largest Shipper calculated member.

14.  Within the query we have saved as MDX032-2-2, replace the top comment line of the query with the following:


-- MDX032-2-3, Alignment of Results Dataset to the Business Requirements

15.  Save the query as MDX032-2-3, to keep MDX032-2-2 intact as a working sample.

16.  Remove the Units Shipped measure ([Measures].[Units Shipped]), from the existing ON ROWS line within the SELECT statement of the query, so that the line appears simply as:

  { [Measures].[Largest Shipper]} ON COLUMNS,

17.  Remove NONEMPTYCROSSJOIN, and the left parenthesis to its right, on the line below the row we modified in the last step (part of the ON ROWS specification). The item to be removed appears as:

NONEMPTYCROSSJOIN(

18.  Remove the following, the second Descendants() function, from the ON ROWS definition, as well:

{DESCENDANTS( 

      [Warehouse].[All Warehouses].[USA], [Warehouse].

            [Warehouse Name] )})

19.  Remove the comma (",") that appears to the right of this remaining portion of the ON ROWS line:

[Time].[1998], [Time].[Month] )},

To summarize the final set of modifications, the query appears, before our modifications, as follows, with syntax to be deleted enclosed in rectangles:

-- MDX032-2, Addition of Calculated Member to Produce Name of Warehouse with 

      Highest 1998 Volume Monthly Shipping

WITH 

MEMBER

     [Measures].[Largest Shipper]

AS
 
    'TOPCOUNT( DESCENDANTS( 

            [Warehouse].[All Warehouses].[USA], [Warehouse].[Warehouse Name] ), 

                1,  (Time.CURRENTMEMBER, [Measures].[Units Shipped]) 
            ).ITEM(0).ITEM(0).NAME'

SELECT 

  { [Measures].[Units Shipped], [Measures].[Largest Shipper]} ON COLUMNS,

  NONEMPTYCROSSJOIN(

{DESCENDANTS( [Time].[1998], [Time].[Month] )} , {DESCENDANTS( [Warehouse].[All Warehouses].[USA], [Warehouse].[Warehouse Name] )})
ON ROWS FROM [WAREHOUSE]

20.  Leave the remainder of the query in its original state.

The Query pane appears as depicted in Illustration 11, with deletions having been made.


Illustration 11: The Query, after Final Modifications

21.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 12 appears.


Illustration 12: Result Dataset - Reflecting the Business Requirement

22.  Re-save the file as MDX032-2-3.

23.  Exit the MDX Sample Application when ready.

Summary ...

This article served as an extension of our previous introductory article, Basic Set Functions: The TopCount() Function, Part I. We briefly reviewed the operation of the TopCount() function, and then extended our examination of the TopCount() function to include somewhat more sophisticated uses.

We defined illustrative business needs as posed to us by hypothetical groups of information consumers, and, after discussing the needs in general, with regard to challenges inherent to meeting them with the TopCount() function in particular, we constructed queries to meet the expressed requirements using combinations of TopCount() and other MDX functions we have explored in articles of the MDX Essentials series. In both practice examples, we discussed the syntax contained within the solutions we constructed for the information consumers, as well as the datasets we obtained in employing TopCount() and other functions, together with other surrounding considerations.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved