MDX Essentials: Basic Set Functions: The EXTRACT() Function

Monday Oct 4th 2004 by William Pearson
Share:

Return a fresh set of tuples from specified dimension elements using MDX. In this article, author Bill Pearson explores the EXTRACT() function, and leads practice in its operation.

About the Series ...

This is the twenty-fourth article of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each tutorial progressively adding features designed to meet specific real-world needs.

For more information about the series in general, as well as the software and systems requirements needed for getting the most out of the lessons included, please see the first article, MDX at First Glance: Introduction to MDX Essentials.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples.

Overview

In this lesson, we will introduce a function that operates upon a specified set, extracting the dimensions of that set as we dictate. As most references point out, by way of explanation, the EXTRACT() function works in a manner that is the opposite of the CROSSJOIN() function (for detailed information on the CROSSJOIN() function, see my Database Journal article MDX Essentials: Basic Set Functions: The CrossJoin() Function ). We will consider elementary uses of the EXTRACT() function in this article, and then explore more sophisticated uses in subsequent articles. As with other Basic Functions articles within this series, our objective is to build a foundation in the rudiments of the function, from which we can expand to more sophisticated exploitation in subsequent articles.

As we have noted the case to be with many other MDX functions, the EXTRACT() function can be leveraged to perform tasks that range from the simple to the sophisticated. We will introduce the function, commenting upon its operation and touching upon uses at a general level, and then we will:

  • Examine the syntax surrounding the function;
  • Undertake an illustrative example of the use of the function, in a multi-step practice exercise;
  • Briefly discuss the results datasets we obtain in the practice examples.

The Extract()Function

Introduction

The EXTRACT() function, according to the Analysis Services Books Online, "Returns a set of tuples from extracted dimension elements." We will examine the function's manner of accomplishing these extractions, which, as we shall see, eliminates duplicates automatically, in the sections that follow.

We will examine the syntax for the EXTRACT() function in general, building to the extraction of a set of tuples from a dimension we specify in the function, from a set we construct as a part of preparation for a practice exercise, with which we will meet a hypothetical business need. In this way, we will be able to clearly see that the EXTRACT () function does, in fact, generate the results we might expect. Our objective is to gain a richer understanding of the capabilities found within the EXTRACT () function, together with a feel for the "oppositeness" that it maintains with the CROSSJOIN() function that we discussed in Basic Set Functions: The CrossJoin() Function.

Discussion

EXTRACT() allows us to return a set from an initial set we specify, with the returned set composed of tuples from specified dimensional components. As we have stated, EXTRACT() acts in a manner opposite to CROSSJOIN(). In addition, the function always removes duplicates, so, with EXTRACT(), we have no concern for flags or the conscious management of duplicates.

Let's look at some syntax illustrations to further clarify the operation of EXTRACT().

Syntax

Syntactically, the set from which the specified dimensional members are to be extracted within the EXTRACT() function is placed within the parentheses to the right of EXTRACT. The dimension(s) are separated by comma(s). The syntax is shown in the following string:

EXTRACT( <<Set>>, <<Dimension>>[, <<Dimension>>...] )

The members of the dimension(s) specified in the function are extracted into fresh tuples, as we shall see in a step in the practice example. No duplicates are allowed to remain in the set that is produced.

The following simple example illustrates conceptually the operation of the EXTRACT() function. (We will be doing a practice exercise in subsequent sections, but if you want to "test drive" a sample, you can certainly cut and paste, or type, the below into the MDX Sample Application).

We will extract the tuples comprising the Position dimension from a set as shown in the following working query, which we can apply to the sample HR cube.

SELECT
   {[Measures].[Count]} ON COLUMNS, 
EXTRACT(
    {([Position].[All Position].[Store Management], [Store].
        [All Stores].[Canada]),
           ([Position].[All Position].[Store Temp Staff], 
               [Store].[All Stores].[Mexico]),
                   ([Position].[All Position].[Store Management], 
                       [Store].[All Stores].[USA])}, Position)  
                           ON ROWS
FROM
   [HR]

This query, for the measure Count (of employees), would result in the extraction of a set similar to that depicted in Table 1.

Count

Store Management

648

Store Temp Staff

1,680

Table 1: Results of an EXTRACT() Operation, Selecting Employee Count as the Measure

Note that, although the Store Management level appears twice in the original set (for each of countries of Canada and USA), there are no duplicates in the returned dataset.

We will activate the concepts involved in the foregoing discussions by practicing the use of the EXTRACT() function in the section that follows.

Practice

Preparation

To reinforce our understanding of the basics we have covered so far, we will use the EXTRACT() function in a manner that illustrates its operation within a multi-step example. We will first create a set that resembles information that might be useful in a business scenario. We will then explore the use of the EXTRACT() function to return fresh tuples from the original set, based upon a dimension that we specify.

We will call upon the MDX Sample Application again, as our tool to construct and execute the MDX we examine, and to view the results datasets we obtain.

1. Start the MDX Sample Application.

2. Clear the top area (the Query pane) of any queries or remnants that might appear.

3. Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

4. Select the WAREHOUSE cube in the Cube drop-down list box.

Let's assume for our practice example that we have received a request from a group of information consumers in the Planning unit of the FoodMart2000 organization for support in the presentation of some data, housed within the Warehouse sample cube, regarding Warehouse Sales for 1998. The consumers wish to know to which five Store Cities, in descending order of sales, we can attribute the sales of the Products with the highest Warehouse Sales.

We will begin by composing a simple query to prepare the original set, upon which we will use the EXTRACT() function in our next section. My objective at the same time is to illustrate use of the function to meet the hypothetical business requirement we have outlined as a means of reinforcing the concepts in our minds where they can be triggered upon meeting a similar situation in our respective business environments.

First, we will use a CROSSJOIN() function to build a "starter" set, which we will then chisel down to a result dataset that matches the needs of the intended audience.

5. Type the following query into the Query pane:

-- MDX24-1:  Tutorial Query Step 1
SELECT 
    {[Measures].[Warehouse Sales]} ON COLUMNS,
    CROSSJOIN ({[Food]}, {[Store].[Store City].Members}) ON ROWS
FROM 
   [Warehouse]
WHERE 
   ([Time].[1998])

The purpose of this stage of the query is to simply crossjoin Food products with the membership of the Store Cities, all within operating year 1998.

6. Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated by Analysis Services, and the dataset shown in Illustration 1 appears.


Illustration 1: Initial Results Dataset

We see there are a total of twenty-four Store Cities (one of which actually has no Warehouse Sales activity) that appear crossjoined with Food products.

7. Select File --> Save As..., name the file MDX24-1, and place it in a meaningful location. Leave the file open for the next step.

8. Modify the comment line to read:

-- MDX24-2:  Tutorial Query Step 2

9. Save the file as MDX24-2, to protect MDX24-1.

Now we will take another preparatory step and filter out the Store City with no Warehouse Sales. To do this, we will employ the NOT ISEMPTY keywords, as follows.

10.  Insert the Filter() function, as shown:

     FILTER(

between the third line of the query (counting the comment line, and shown below):

  {[Measures].[Warehouse Sales]} ON COLUMNS,

and the fourth line of the query (beginning with CROSSJOIN, as follows):

        CROSSJOIN ({[Food]}, {[Store].[Store City].Members}) ON ROWS

11.  Remove the ON ROWS keyword from the above line, replacing ON ROWs with a comma (,).

12.  Insert the following line immediately below the line that once contained ON ROWS,

             NOT ISEMPTY([Measures]. [Warehouse Sales]))  ON ROWS

The Query pane appears as shown in Illustration 2, with changes circled in red.


Illustration 2: Modified Query in Query Pane (Compressed View)

13.  Execute the query by clicking the Run Query button in the toolbar, once again.

The Results pane is populated once more, and the dataset depicted in Illustration 3 appears.


Illustration 3: Results Dataset, Filtering Out Empties

We see there are a total of twenty-three Store Cities, now that the "empty" Store City of Alameda is filtered out.

14.  Select File --> Save to save the query as MDX24-2. Leave the file open for the next step.

15.  Modify the comment line to read:

-- MDX24-3:  Tutorial Query Step 3

16.  Save the file as MDX24-3, to protect MDX24-2.

We have completed our preparatory steps, and now have a realistic dataset with which to work. We will put the EXTRACT() function to use within our query in the following section.

Procedure

We are now ready to apply the EXTRACT() function to the general population of Product-Store City combinations that we have created with the CROSSJOIN() function in our preparatory steps. We effectively have a set that contains all 1998 Store City Warehouse Sales, filtered, of course, for the empties. To meet the expressed needs of the information consumers with which we are working, we now are faced with extracting fresh Store City tuples, and then presenting the five Store Cities whose products have experienced the highest Warehouse Sales in the 1998 time frame.

1. Insert the EXTRACT() function, as shown:

     EXTRACT(

between the third line of the query (counting the comment line, once again, and shown below):

{[Measures].[Warehouse Sales]} ON COLUMNS,

and the fourth line of the query (which we just added in the last section, as follows):

               FILTER(

2. Insert the following into the line beginning with NOT ISEMPTY , which we also added in the last section, between the last parentheses in the line (just before ON ROWS) and ON ROWS:

                           , [Store]) 

The complete line, together with our addition, should appear as follows:

            NOT ISEMPTY([Measures]. [Warehouse Sales])), [Store])  ON ROWS

With the addition of the EXTRACT() function, together with our specification of the Store dimension in the line ending in ON ROWS, we are, in effect, saying "return a dataset composed of tuples from the Store dimension only." This again generates fresh tuples of only the Store Cities - through the specification of the Store dimension, and the implicit elimination of the Food component from the original tuples.

The Query pane appears as shown in Illustration 4, with additions circled in red.


Illustration 4: Modified Query in Query Pane (Compressed View)

3.  Execute the query by clicking the Run Query button in the toolbar

The Results pane is populated, and the dataset shown in Illustration 5 appears.


Illustration 5: Results Dataset, With EXTRACT() in Place

We see there are still a total of twenty-three Store Cities; the difference now is that the Food component has vanished. (Actually, the Store City tuples have been created anew, without the Food component that came along with the CROSSJOIN() function earlier).

4.  Select File --> Save to save the query as MDX24-3. Leave the file open for the next step.

5.  Modify the comment line to read:

-- MDX24-4:  Tutorial Query Step 4

6.  Save the file as MDX24-4, to protect MDX24-3.

We have extracted the Store dimension members from the crossjoin-produced original set, and thus are nigh unto meeting the needs of the information consumers. The remaining consideration, to present the five Store Cities whose products have experienced the highest Warehouse Sales, is only a single function away from realization. Let's add the appropriate TOPCOUNT() function and give the users precisely what they have requested.

7.  Insert the TOPCOUNT() function, as shown:

     TOPCOUNT(

Once again, between the third line of the query (shown below):

  {[Measures].[Warehouse Sales]} ON COLUMNS,

And the fourth line of the query (which we just added in the last section, as follows):

               EXTRACT(

8. Insert a comma (,) after [Store]) in the ON ROWS line of the query.

9. Insert the following into the same line, just behind the comma we inserted in the last step, and just before ON ROWS:

5, [Measures].[Warehouse Sales] ) ON ROWS

The complete line, together with our addition, should appear as follows:

NOT ISEMPTY([Measures]. [Warehouse Sales])), [Store]), 
    5, [Measures].[Warehouse Sales] ) ON ROWS

With the addition of the TOPCOUNT() function, together with our specifications of the "5" and the measure Warehouse Sales, in the line ending in ON ROWS, we are saying "return the Store Cities with the top five Warehouse Sales values."

The Query pane appears as depicted in Illustration 6, with additions circled in red.


Illustration 6: Finalized Query in Query Pane (Compressed View)

10.  Execute the query by clicking the Run Query button in the toolbar.

The Results pane is populated, and the dataset shown in Illustration 7 appears.


Illustration 7: Results Dataset, Final Enhancements in Place

We now see the Store Cities, in descending order, with the top five Warehouse Sales values.

11.  Select File --> Save to save the query as MDX24-4.

12.  Exit the Sample Application when ready.

In the forgoing manner we meet an expressed business requirement in a simple example that illustrates the operation of the EXTRACT() function.

Summary ...

In this article, we explored the useful EXTRACT() function, whose purpose is to return a set from an initial set we specify, much in a manner that is opposite to the behavior of the CROSSJOIN() function. The returned set is composed of tuples from dimension(s) that we specify as arguments within the function. We discussed the EXTRACT() function in general, and emphasized its capabilities within MDX, and its usefulness within our analysis toolsets.

In addition to discussing the purpose and operation of the EXTRACT() function, we mentioned the elimination of duplicates by the function. We practiced the use of the function within a multi-step exercise, building an original set in preparation to meet a hypothetical business need. Throughout the steps of the practice exercise, we discussed the results we obtained with each step's execution, remarking on the distinguishing characteristics of each.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved