MDX Essentials: Set and String Functions: The GENERATE() Function

Monday Mar 7th 2005 by William Pearson
Share:

Apply a set to each member of another set, and join the resulting sets; or return a concatenated string to nest with a set to meet reporting needs. MSAS Architect Bill Pearson explores the set and string versions of the powerful GENERATE() function, and leads practice in its operation.

About the Series ...

This article is a member of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements for getting the most out of the lessons included, please see my first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

Overview

In this lesson, we will introduce a function that provides us the ability to derive a set by systematically applying a set expression to each of the members of a set we define. GENERATE() is a potent function in our MDX toolset, in that it enables us to select precisely only certain members of a dimension level. It effectively operates upon two sets to create a new set, based upon the members of a second set that are also in a primary set.

We will consider elementary uses of the GENERATE() function in this article, and then explore more sophisticated uses in subsequent articles. As with the Basic Functions articles within this series, our objective is to build a foundation in the rudiments of the function, from which we can expand to more sophisticated exploitation in subsequent articles.

As we have noted the case to be with many other MDX functions, the GENERATE() function can be leveraged to perform tasks that range from the simple to the sophisticated. For that matter, in the limited body of knowledge that is commonly available surrounding MDX in general, at this writing, hands-on references to GENERATE() remain elusive, with the few documents that discuss the function describing it as "complex." We will introduce the function, commenting upon its operation and touching upon uses at a general level, and then we will:

  • Examine the syntax surrounding the function;
  • Undertake illustrative examples of the use of the function, in a couple of practice exercises;
  • Briefly discuss the results datasets we obtain in the practice examples.

The GENERATE() Function

Introduction

The GENERATE() function, according to the Analysis Services Books Online, "applies a set to each member of another set and joins the resulting sets by union." The Books Online goes on to say that GENERATE() "alternatively, returns a concatenated string created by evaluating a string expression over a set." We will examine the way the function accomplishes these combinations, eliminating duplicates automatically (while allowing us the option to include them, if necessary), in the sections that follow.

We will examine the syntax for the GENERATE() function in general, building to operations upon sets in practice exercises, within which we will meet a hypothetical business need. In this way, we will be able to clearly see that the GENERATE() function does, in fact, produce the results we might expect. Our objective is to gain a richer understanding of the capabilities found within the GENERATE() function, together with a feel for its similarities to the CROSSJOIN() function, which we have discussed in Basic Set Functions: The CROSSJOIN() Function, as well as our previous article, The CROSSJOIN() Function: Breaking Bottlenecks.

Discussion

The GENERATE() function comes in two "models:" The Books Online refer to these as a "set version" and a "string version." In the set version, the function generates a results set based upon the application of a specified secondary set (which often itself contains a function) to a specified primary set. As we shall see, the function conducts itself much like the CROSSJOIN() function in many cases - most notably when the secondary set is composed of a more or less fixed group of members. We will also see that the power of GENERATE() is leveraged significantly when we go beyond a relatively fixed set of members in the secondary set, and construct the secondary set via an expression that specifies the primary set's current member.

The string version provides for the concatenation of a string expression (substituted in the place of the secondary set appearing in the set version) with each element appearing in the primary set. A delimiter can be specified to separate the elements, as well, if this is useful to the end result (as it proves to be in a practice example we undertake in a later section).

Let's look at some syntax illustrations to further clarify the operation of GENERATE().

Syntax

The set version of GENERATE() resembles closely the string version, with regard to syntax. In the former, the primary and secondary sets upon which the operation of the function is to be performed are placed within the parentheses to the right of GENERATE. The set version applies Set2 to each member of Set1, performing a union of the resulting sets. We can direct that duplicates in the results are retained by specifying ALL, but the default behavior is to eliminate duplicates. The syntax is shown in the following string:

GENERATE( <<Set1>>, <<Set2>>[, ALL] )

The string version of GENERATE() appears as follows:

GENERATE( <<Set>>, <<String Expression>>[, <<Delimiter>>])

Iterating through each member of the set specified in <<Set>> above, this version of the function evaluates the specified <<String Expression>> against the respective member and returns a concatenation between the two in each case. The member and the evaluated <<String Expression>> can be delimited in the return string with the optionally supplied <<Delimiter>>, should we desire separation of the two components in the concatenated string that is returned.

The following simple example illustrates conceptually the operation of the GENERATE() function, set version (by far the more useful and pervasive version). It also shows that, within the context of simpler requirements, we can often obtain the same results with a seemingly less complex approach. The example then illustrates a more elaborate scenario, where the GENERATE() approach is certainly more efficient.

NOTE: We will be doing a practice exercise in subsequent sections, but if you want to "test drive" the below samples, the syntax will work if it is cut and pasted, or typed, into the MDX Sample Application. I have often found "fragments" in discussions such as this less than useful, when one is trying to learn new techniques, and so forth. The fact that it is easier for the author makes the practice commonplace, but it is one of many aggravating aspects of technical publishing that I hope to continue to avoid).

Let's say we have a requirement to return the top three cities in the states of California and Washington with regard to Units Shipped, one of several measures stored within the sample FoodMart Warehouse cube that accompanies an Analysis Services installation. We can achieve our objectives by employing the set version of the GENERATE() function as follows:

SELECT
   {[Measures].[Units Shipped]} ON COLUMNS,
   {GENERATE
      ({[Store].[All Stores].[USA].[CA], 
        [Store].[All Stores].[USA].[WA]},
           TOPCOUNT(DESCENDANTS([Store].Currentmember, 
              [Store].[Store City]),
                 3, [Measures].[Units Shipped]))} ON ROWS
FROM 
   [WAREHOUSE]

The query results would appear as depicted in Table 1.

 

Units Shipped

Los Angeles

24,587

San Diego

23,835

Beverly Hills

10,759

Tacoma

32,411

Seattle

24,110

Bremerton

22,734


Table 1: Results of the GENERATE() Function, Selecting Units Shipped as the Measure

We can obtain identical results with the following query:

SELECT
   {[Measures].[Units Shipped]} ON COLUMNS,
   {TOPCOUNT(
      {[Store].[All Stores].[USA].[CA].Children}, 
          3, [Measures].[Units Shipped]),
             TOPCOUNT(
                {[Store].[All Stores].[USA].[WA].Children}, 
                   3, [Measures].[Units Shipped])} ON ROWS
FROM 
   [WAREHOUSE]

The second query may seem more intuitive to many of us, and certainly presents indirect insight into the operation of the GENERATE() function. Intuitive or not, however, the GENERATE() function can certainly be the compact alternative in more elaborate uses. Consider the following query:

SELECT

   {[Measures].[Units Shipped]} ON COLUMNS,

   {GENERATE

      ({[Warehouse].[City].Members}, 

         TOPCOUNT(DESCENDANTS([Warehouse].Currentmember,

            [Warehouse].[Warehouse Name]),

               1, [Measures].[Units Shipped]))} ON ROWS

FROM 

  [WAREHOUSE]

WHERE 

  ([Time].[1998])

The query results would appear as shown in Table 2.

 

Units Shipped

Bellmont Distributing

22,988

Rose Food Warehousing

10,355

Freeman And Co.

10,707

Derby and Hunt

23,925

Salka Warehousing

24,884

Focus, Inc.

2,189

Jamison, Inc.

21,664

Bastani and Sons

7,304

Anderson Warehousing

23,699

Worthington Food Products

10,045

Big Quality Warehouse

10,115

Artesia Warehousing, Inc.

24,714

Jorgensen Service Storage

19,483

Food Service Storage, Inc.

1,814

Quality Distribution, Inc.

26,569

Treehouse Distribution

32,409

Foster Products

1.949

Destination, Inc.

7,512

Quality Warehousing and Trucking

29,041

Jones International

5,668

Jose Garcia, Inc.

31,221

Valdez Warehousing

2,353

Maddock Stored Foods

10,097


Table 2: Results of the Second Query Example Containing the GENERATE() Function

The above example, where our query is retrieving the Warehouse in each individual Warehouse City, together with the largest quantity of Units Shipped, represents a scenario where the GENERATE() approach is more concise than alternative approaches. To achieve the same result, we would be forced to employ TOPCOUNT() for each Warehouse City present in the Warehouse cube. This would be cumbersome, at best, and result in a far lengthier query than the compact query we can achieve using GENERATE().

We will activate the concepts involved in the foregoing discussions by practicing the use of the GENERATE() function in the section that follows. As part of our practice, we will undertake examples with each of the set and string versions of the function.

Practice

Preparation

To reinforce our understanding of the basics we have covered so far, we will use the GENERATE() function in a manner that illustrates its operation within a couple of examples. In both scenarios, we will create a dataset containing information that might be useful in a business scenario. In each case, we will outline a brief business requirement as we might receive it from a hypothetical group of information consumers. This will perhaps help to reinforce the concepts in our minds, where they can be triggered upon meeting a similar situation in our respective business environments.

We will call upon the MDX Sample Application again, as our tool to construct and execute the MDX we examine, and to view the results datasets we obtain.

1.  Start the MDX Sample Application.

2.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

3.  Select the WAREHOUSE cube in the Cube drop-down list box.

4.  Clear the top area (the Query pane) of any queries or remnants that might appear.

Procedure - GENERATE(): Set Version

Let's assume, for our initial practice example, that we have received a request from a group of information consumers in the Finance department of the FoodMart organization. The consumers have asked for support in the presentation of some data, housed within the Warehouse sample cube, regarding Warehouse Sales for 1997. The consumers wish to know, for each U. S. State in which FoodMart operates Stores (California, Oregon and Washington), to which five Products we can attribute the greatest Warehouse Sales.

We immediately recognize the opportunity to employ the set version of the GENERATE() function, and compose a simple query to return the desired dataset.

5.  Type the following query into the Query pane:


-- MDX29-01:  Use of GENERATE() Function - SET VERSION
SELECT
    {[Measures].[Warehouse Sales]} ON COLUMNS,
    GENERATE([All Stores].[USA].Children, CROSSJOIN({[Store].CurrentMember}, 
        TOPCOUNT( [Product].[Product Name].Members, 5, [Store].CurrentMember) 
            )) ON ROWS
FROM 
    [Warehouse]
WHERE
    ([Time].[Year].[1997])

The reasoning behind the query is as follows: The set version of the GENERATE() function will evaluate the secondary set of the function (represented by TOPCOUNT( [Product].[Product Name].Members, 5, [Store].CurrentMember, or the top five Products with regard to 1997 Warehouse Sales) separately for each member of the primary set (represented by [All Stores].[USA].Children, the set of the U. S. States in which FoodMart operates). We can appreciate the fact that the independent evaluation of "top five" for each State is important, because the top selling products will likely vary between States.

6.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset depicted in Illustration 1 appears.


Illustration 1: Results Dataset, GENERATE() Function, Set Version

We see there are a total of three Store States (California, Oregon and Washington), each of which appears crossjoined with its five top selling Products. In effect, GENERATE() has performed a union between each State and the independent results of its respective TOPCOUNT() evaluation.

7.  Select File -> Save As... from the main menu.

8.  Name the file MDX29-01, and place it in a meaningful location.

Now let's get some practice with the string version of the GENERATE() function.

Procedure - GENERATE(): String Version

Let's assume for our second practice example that we have received a request from a group of information consumers in the Marketing unit of the FoodMart organization. The group needs to meet a basic presentation requirement, with the data under consideration once again being housed within the Warehouse sample cube. The consumers wish to present a simple matrix dataset, and display total FoodMart Warehouse Sales for the children of the Drink and Food Product Families only (the children of the Product Family level of the Product dimension, we recall, are themselves Product Departments) .

The information consumers also inform us that they would like to make a "cosmetic addition" to the core presentation. They would like to see a column called "Macro Class" (their internal term for groups of Product Families) added to the layout, between the row axis containing the Product Family names, and the Warehouse Sales measure. Within this column, they would like text displayed to reinforce the fact that the Sales information in the dataset relates to "Drink or Food" families only, so that no confusion can erupt among the intended audience: The intent of the report is to present focused data about consumable products only. Finally, the information consumers want the data restricted to Warehouse Sales for 1997.

We recognize an opportunity to use the string version of the GENERATE() function, and compose a simple query to return the desired dataset.

9.  Select File -> New from the main menu.

10.  Type the following query into the Query pane:


-- MDX29-02:  Use of GENERATE() Function - STRING VERSION
WITH 
MEMBER
    [Measures].[Macro Class]
AS
    'GENERATE( {[Product].[Product Family].[Drink], 
      [Product].[Product Family].[Food]}, Product.CurrentMember.Name, " or ")'
SELECT
    { [Measures].[Macro Class], [Measures].[Warehouse Sales] } ON COLUMNS,

    { [Product].[Product Family].[Drink].Children, 
          [Product].[Product Family].[Food].Children} ON ROWS
FROM
    [WAREHOUSE]
WHERE
     ([Time].[Year].[1997])

The reasoning behind the query is as follows: The string version of the GENERATE() function will evaluate the specified <<String Expression>>, represented by Product.CurrentMember.Name in the above query, against the members of the primary set, {Product.[Product Family].[Drink], Product.[Product Family].[Food]}. We employ the optional delimiter of "or," to place it between the Drink and Food family names, which will evaluate into the string we intend to create to nest against the Product Families in the row axis, as we shall see.

11.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 2 appears.


Illustration 2: Results Dataset using the String Version of GENERATE()

Our dataset displays the row axis as required, and the new string, with which the Product Family children are concatenated, reveals their membership in the "Drink or Food" group, in accordance with the request of the information consumers. Again, the "or" delimiter (optional) provides the separation of the two components in the concatenated string that is returned.

We thus meet the business needs of the Marketing information consumers through the use of the string version of the GENERATE() function.

12.  Select File -> Save As... from the main menu.

13.  Name the file MDX29-02, and place it in a meaningful location.

14.  Exit the MDX Sample Application when ready

Summary ...

In this lesson, we will introduced the GENERATE() function, which can be properly classed, depending upon the syntax we employ within it, as either a set or a string function. We focused primarily upon the set verision, as this is the more powerful use, and because it is more prevalent in the business environment, but we performed a practice example with each version to gain some hands-on exposure to the possibilities. We prepared for our practice sessions with a discussion of the purposes and operation of the function, focusing upon the syntax involved in reaching our objectives from within both set and string contexts.

In addition to discussing the purpose and operation of the GENERATE() function, we mentioned the elimination of duplicates by the function, as well as the option to bypass that process, if need be. We then practiced the uses of the function within a multi-step exercise, using both set and string versions to meet the expressed business needs of two groups of hypothetical information consumers. Throughout the steps of the practice exercise, we discussed the results we obtained with each step's execution, remarking on the distinguishing characteristics of each.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved