MDX Essentials: Numeric Functions: Introduction to the AVG() Function

Monday Sep 13th 2004 by William Pearson
Share:

Compute the average of the values populating a set with the useful AVG() function. Join MSAS Architect Bill Pearson in a practical exploration of using the AVG() function to meet examples of simple and sophisticated business needs.

About the Series ...

This article is a member of the series MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs. For more information about the series in general, as well as the software and systems requirements needed for getting the most out of the lessons included, please see the first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 / 3a updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

Overview

In this lesson, we will introduce a commonly used numeric function in the MDX toolset, the AVG() function. The general purpose of the AVG() function, as we shall discover, is to return the average of the tuples occupying a set. We will consider elementary uses of the function in this article, and then explore more sophisticated uses in subsequent articles. For now, we will build a foundation in the basics.

The AVG() function can be leveraged in activities that range from the simple to the complex, as is the case with many other MDX functions. We will introduce the function, commenting upon its operation and touching upon variations at a general level, and then we will:

  • Examine the syntax surrounding the function;
  • Undertake illustrative examples of the uses of the function in practice exercises;
  • Briefly discuss the results datasets we obtain in the practice examples.

The AVG() Function

Introduction

According to the Analysis Services Books Online, the AVG() function "returns the average value of a numeric expression evaluated over a set." The function uses a numeric expression to indicate the base value for which the average will be calculated. (An example of the numeric expression might be the measure Warehouse Sales in the FoodMart 2000 Warehouse cube).

The AVG() function ignores empty values found within the cells that are associated with the specified set. Its behavior with regard to empty cells can be circumvented at least a couple of ways, as we shall see in the next section.

After discussing its operation in the next section, we will examine the syntax for the AVG() function. Next, we will undertake practice examples constructed to support hypothetical business needs that illustrate uses for the function. This will allow us to activate what we explore in the Discussion and Syntax sections, by getting some hands-on exposure in creating expressions that leverage the function.

Discussion

To reword our initial explanation of its operation, the AVG() function computes the average of the non-empty values populating the cells of the set specified within the function. Mechanically, this means that the total value (the sum) of the cells inhabiting the set is divided by the number (or count) of populated cells.

A key concept here is that a behind-the-scenes count of the populated cells is taking place for use as the "divisor" in the calculation. Empty cells are not included in the divisor. In cases where we wish to count the empty cells, as well, we can force the inclusion of these cells by employing the COALESCEEMPTY() function. Another approach might be to work around the normal exclusion of empty cells by simply approaching the computation of the average in a different manner, such as by taking the results obtained by subjecting the set to the SUM() function, which we would then divide by the results returned by subjecting the same set to an appropriate COUNT() function.

NOTE: For more information on the SUM() function, see my Database Journal articles Mastering Time: Period - to - Date Aggregations, and Calculated Members: Leveraging Member Properties (in the MDX in Analysis Services series), both of which contain references to the function. In addition, for a detailed look at the COUNT() function, see my article Basic Numeric Functions: The Count() Function, in the MDX Essentials series, also at Database Journal.

Subjecting an empty set to the AVG() function returns the same result we obtain within a scenario where we divide by zero (commonly denoted by "1.INF" appearing within the affected parts of the dataset).

Let's look at some syntax illustrations to further clarify the operation of AVG().

Syntax

Syntactically, the set we use to specify the range of cells for which we wish to calculate the average is placed within the parentheses to the right of AVG, and separated by a comma from the numeric expression we have described.

The syntax is shown in the following string:

Avg(Set, [, Numeric Expression])

The following example expression illustrates a use of the AVG() function. Let's say that information consumers from the FoodMart Logistics department, whose data is housed within the Warehouse cube, come to us with a straightforward request: The consumers wish to see the total national averages for Units Shipped for each of the Category groups composing our Non-Consumable products line. They wish the totals for the member Product Categories to be displayed, with the averages for the three countries in which FoodMart operates, Canada, Mexico and the USA, to be presented side by side.

The basic AVG() function involved, within a core query that presents the information in the manner requested, would be constructed as follows:

WITH
MEMBER
    [Store].[Nat'l Avg]  
AS 
   'AVG(  { [Store].[Store Country].Members}, [Measures].[Units Shipped])'
SELECT
{ [Store].[Store Country].Members, [Store].[Nat'l Avg] } ON COLUMNS,
    {[Product].[Product Family].[Non-Consumable].Children} ON ROWS
FROM 
   [Warehouse]
WHERE [Measures].[Units Shipped]

Our query is simply expressing that we wish to retrieve the "average total units shipped for each of our Non-Consumable product categories, by country of store operation." We are ignoring time - the consumers are aware that the cube contains data from two years, and want the information to be based upon the cube as a whole.

The query we construct returns a dataset similar to that depicted in Table 1.

Canada

Mexico

USA

National Avg

Carousel

 

427

427

Checkout

 

1,445

1,445

Health & Hygiene

 

11,513

11,513

Household

 

20,765

20,765

Periodicals

 

3,515

3,515

Table 1: Results of a Simple AVG() Function, with Empty Cells in the Specified Set

The above example serves to illustrate the treatment, within the AVG() function, of empty cells within the specified set. The results demonstrate clearly that the function excludes empty cells in calculating the average we see.

We will practice the use of the AVG() function in the section that follows. We will start with a relatively simple scenario, and then construct a second, slightly more sophisticated query.

Practice

Preparation

To reinforce our understanding of the basics we have covered so far, we will use the AVG() function in a manner that illustrates its operation. We will do so in illustrative scenarios that place AVG() within the context of meeting a business need.

To begin, we will construct a SELECT query with a clearly defined calculated member that contains the AVG() function, and then present the returned dataset, via the core query, in a way that meets the business needs of a group of hypothetical information consumers. The intent is, of course, to demonstrate the operation of the AVG() function in a straightforward manner.

Let's return to the MDX Sample Application as a platform from which to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Start the MDX Sample Application.

2.  Clear the top area (the Query pane) of any queries or remnants that might appear.

3.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4.  Select the Warehouse cube in the Cube drop-down list box.

Let's assume, for our practice example, that we have received a call from the Finance department of the FoodMart organization, requesting some information surrounding Warehouse Sales for a given time frame upon which they plan to perform some basic analysis. The Finance information consumers specifically wish to know the average total Warehouse Sales figure by Store Type, for the entire FoodMart organization, and for the time periods included in the current Warehouse cube. In addition, the consumers would like to see the average totals broken out by Product Family, ideally "across the x-axis," as they describe it, to make the performance of the various family types readily obvious in relation to each other.

We will meet the business need expressed by the Finance group by composing a query using a calculated member that contains the AVG() function. We will then present the returned data with a basic query that defines the presentation as requested, with the requested averages appearing in the appropriate arrangement.

5.  Type the following query into the Query pane:

-- MDX023-1 Basic Use of AVG() Function
WITH
MEMBER 
   [Product].[AverageSales] AS 'Avg({[Product].[Product Family].Members}, 
  [Measures].[Warehouse Sales])'
SELECT
   { [Product].[Product Family].Members, [Product].[AverageSales]} ON COLUMNS,
   NON EMPTY { [Store Type].Members } on rows
FROM 
   [Warehouse]
WHERE
    ([Measures].[Warehouse Sales])  

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 1 appears.


Illustration 1: Result Dataset - Basic Use of the AVG() Function

The above presentation meets the Finance group's requirements, permitting them to analyze average Warehouse Sales by Product Family component, for each of the Store Type groups. In addition, the column to the far right also allows them to see a simple combined average for all Products. (We will examine the generation of weighted averages in a subsequent article).

7.  Select File ` Save As, name the file MDX023-1, and place it in a meaningful location.

8.  Leave the query open for the next section.

We provide the results we have assembled to the information consumers, who are delighted with the presentation. The following day, we receive a call asking for more information in a somewhat similar, but a little more sophisticated vein. This time, the Finance group wants us to generate averages for two additional measures, Unit Shipped and Warehouse Profits, with the same row axis of Store Type groups. But they throw us another complication with their next stipulation: They want the total averages for products that number among the top ten individual product performers.

We are amused, once again, at the way that our consumers rapidly refine their requirements from the most basic requirements to gradually more elegant analysis. Seeing what they can have only leads to more in-depth requirements. The beautiful thing is that, the more we have met such needs, the more we can offer from the very start, and offer solutions within a preemptive, pilot development that gets our audiences where they need to be far faster than they could have imagined. Everyone benefits with well-founded business intelligence!

We take the following actions to meet the new consumer request:

9.  Select File ` New to open a blank query:

10.  Type the following query into the Query pane:

-- MDX023-2 More Sophisticated Example of Avg() Function in Use
WITH
SET
   [Leading10Products] 
AS
   'TopCount ([Product].[Product Name].Members, 10, [Measures].[Warehouse Sales])'
MEMBER
    [Product].[OverLeadingAvg] AS 'Avg ([Leading10Products])'
SELECT
   { [Measures].[Warehouse Sales], [Measures].[Units Shipped], 
   [Measures].[Warehouse Profit] } ON COLUMNS,
   NON EMPTY { [Store Type].Members } ON ROWS
FROM 
   [Warehouse]
WHERE 
   ([Product].[OverLeadingAvg])

11.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 2 appears.


Illustration2: Result Dataset - More Sophisticated Use of the AVG() Function

The above presentation meets the Finance group's requirements, permitting them to perform the analysis they have envisioned. In effect, the results dataset is comprised of three measures that are broken down by six Store Types (including the All Store Types level). Each of the measures is averaged for each of the ten highest performing Products within the context of Warehouse Sales.

Another nuance in our solution is the absence of the numeric expression component in the AVG() function within our second calculated member. The function interprets this to mean that we are specifying the current members of the dimension and measure that belong to the set we have created just above. (Creation of a set whose purpose is to act as a proxy within a subsequent calculation, or other manipulation, in such a manner is an excellent way to achieve many such ends.)

Finally, we employ the calculated member as the slicer via the WHERE clause. We thus generate averages for each measure across each of the ten leading products, coming to an "aggregate average," as it were, as shown in Illustration 2 above.

12.  Select File ` Save As, name the file MDX023-2, and place it in a meaningful location.

13.  Select File -` Exit when ready to close the Sample Application.

Summary ...

In this lesson, we explored the MDX AVG() function. We commented generally upon the purpose of the function, to return the average value of a numeric expression as assessed over a specified set, and overviewed its operation. We examined the syntax employed in using AVG(), and then undertook illustrative practice examples based upon providing solutions for the expressed business needs of a hypothetical group of information consumers. Finally, we concluded each illustrative scenario with a brief discussion of the results datasets we obtained.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved