Introduction to MSSQL Server Analysis Services: Point-and-Click Cube Schema Simplification

Monday Mar 14th 2005 by William Pearson
Share:

Simplify an Analysis Services cube schema with point-and-click simplicity, eliminating joins between dimension and fact tables. MSAS Architect Bill Pearson leads a hands-on introduction to using this option to significantly reduce cube processing time.

About the Series ...

This article is a member of the series Introduction to MSSQL Server 2000 Analysis Services. The series is designed to provide hands-on application of the fundamentals of MS SQL Server 2000 Analysis Services, with each installment progressively adding features and techniques designed to meet specific real - world needs. For more information on the series, as well as the hardware / software requirements to prepare for the exercises we will undertake, please see my initial article, Creating Our First Cube.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples. Images are from a Windows 2003 Server environment, upon which I have also implemented MS Office 2003, but the steps performed in the articles, together with the views that result, will be quite similar within any environment that supports MSSQL Server 2000 and MSSQL Server 2000 Analysis Services ("Analysis Services" or "MSAS"). The same is generally true, except where differences are specifically noted, when MS Office 2000 and above are used in the environment, in cases where MS Office components are presented in the article.

Introduction

In this article, we will explore another tool that MSAS offers for the enhancement of cube processing, the Optimize Schema option. Optimize Schema attempts to identify unnecessary joins between our fact and dimension tables, and then to remove them. In many cases, the tool works effectively to accomplish this, leading to a significant reduction in a cube's processing time. Elimination of the joins means more rapid resolution of MSAS' queries to the relational database, upon which our cube is dependent as a data source. This, in turn, means that data is pulled into Analysis Services in less time, contributing to a more rapid cube build overall.

The operation of the Optimize Schema option takes advantage of a common circumstance within the construction of many star or snowflake schemas: the foreign key that serves as the basis of a join between the fact table and a given dimension table is identical to the member key itself. When this is the case, MSAS can eliminate the join, and source the member key directly from the fact table, instead of relying upon a join to the dimension table to obtain the key.

In this article, we will examine the use of Optimize Schema in making our cubes process faster. To accomplish this objective, we will:

  • Overview the Optimize Schema option;
  • Create a copy of the Warehouse sample cube for use in our practice session;
  • Prepare the cube copy further by processing;
  • Discuss the Member Key Column property, and examine existing settings within our practice cube;
  • Discuss possible considerations in determining the appropriateness of the use of the Optimize Schema option in our respective business environments;
  • Perform a practice exercise within which we employ the Optimize Schema option;
  • Examine some of the effects of using Optimize Schema.

Simplify Schemas for Faster Cube Processing

Overview and Discussion

Cube processing performance is often a high visibility subject to our clients and employees. There are many means and considerations in optimizing build times, but few are as straightforward as the use of Optimize Schema, which quickly and "automatically" makes modifications that can provide palpable improvements, in many cases. Manual intervention can become necessary, in some instances, to achieve the intended objectives of the option for all dimensions, but the concepts behind the steps are identical.

The way this all works is, simply, that the Member Key Column property, aligned with the lowest level of a shared dimension in the cube, is pointed to a column in the fact table instead of a column in a dimension table. When this can be accomplished, the need for joins is eliminated in the SQL query that MSAS executes against the relational data source to populate the cube. As we might expect, such a circumstance imparts multiple benefits, including reductions in query complexity, reductions in the amount of data accessed in the relational data source, and reductions in network traffic between the Analysis Server and the relational data source.

Several conditions must exist for schema optimization (via Optimize Schema, or its manual equivalent) to be effective. For any dimension under consideration, these conditions are as follows:

  • The dimension must be a shared dimension;
  • The dimension must have been processed at least once prior to selecting the Optimize Schema option;
  • The lowest dimension level must be physically represented in the cube (it can be hidden, but not disabled);
  • The Member Key Column property for the lowest dimension level must contain the key that relates the fact table to the dimension table;
  • The Member Key Column property for the lowest dimension level must be the only key that is needed to relate the fact and dimension tables;
  • The Member Key Column property for the lowest dimension level must be unique.

Multiple considerations can arise in altering the schema of a cube through the Optimize Schema option. These include scenarios where our cube design is dependent upon joins between the dimension and fact tables, as a mechanism for forcing the exclusion of rows of the fact table from the cube that is to be generated. As is no doubt apparent to most of us, removal of all joins through Optimize Schema, or any other process will mean a corresponding removal of the filtering mechanism, and the resulting selection of all rows in the fact table.

Moreover, it is important to keep in mind that the Optimize Schema process, in its elimination of existing joins, can alter our selection of available tables in subsequent attempts to specify drillthrough options. This, like all the other considerations, needs to be evaluated as a part of planning to use the Optimize Schema option, or to pursue other steps to replicate its action, to enhance cube processing performance.

NOTE: When manually optimizing a cube schema, it is important to remember that no warning or error dialogs will appear if the incorrect column is selected in the respective Member Key Column property. If incorrect choices are made, incorrect values may be returned, or cube processing may fail altogether. If the optimized dimension is used as a slicer within a cube partitioning strategy, processing failures are likely, as well.

Within the "manual" scenario, consideration needs to be given to examining several variables before attempting to perform schema optimization. Consult the Microsoft Developer Network (MSDN), the Books Online, and other reliable documentation before attempting this on any but development systems - and with a backup in place before beginning in every case.

To confirm our understanding of the steps we will undertake, together with the impact of these steps, we will perform "before and after" examinations of the cube of a hypothetical client, particularly focusing on the settings of the Member Key Column property at the lowest level that we have discussed. The changes that occur to this property as a result of our Optimize Schema efforts will make the process more transparent. For purposes of our practice session, we will create a copy of the targeted cube, as we have in various articles within our Database Journal series'. We will then process the clone cube to "register" it with Analysis Services, before beginning our examination of Optimize Schema within our practice exercise.

Considerations and Comments

For purposes of this exercise, we will be working with the Warehouse cube, within the FoodMart 2000 MSAS database; these working samples accompany a typical installation of MSAS. If the samples are not installed in, or have been removed from, your environment, they can be obtained from the installation CD, from the Analysis Services section of the Microsoft website, and perhaps elsewhere. If you prefer not to alter the structure of your sample cubes, as they currently exist, make copies of the cube we reference in the article before beginning the practice exercises. For instructions on copying cubes, see the Preparation section of Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances.

Hands-On Procedure

We will begin with a scenario that illustrates a requirement for a schema simplification, using a hypothetical business need to add practical value. Let's say that our client, the FoodMart organization, has approached us with an enhancement request it wishes to meet, from its Warehouse cube users. We are told that the cube was developed by an energetic young professional who has since moved on. The client is happy with the work in general, but in light of several "discoveries" they have made since the developer left, they suspect that this cube, as well as several others, might experience enhanced performance if they subjected its design to a "second set of eyes."

When we hear that the initial developer had no formal training in MSAS or even OLAP, except for constantly referencing rudimentary text she had obtained from a popular online auction site, we propose an examination of the overall setup, including the schema design in Analysis Manager for the cube. To prepare, we will create a copy of the original Warehouse cube, to preserve the original in its current state, as well as to allow access to the original cube in the meantime from enterprise users.

Preparation

Create a Copy Cube

Let's get started by creating a copy of the Warehouse sample cube, which, along with the FoodMart database that contains it, accompanies an MSAS installation. This will allow us to keep the original sample cube intact for other uses.

1.  Open Analysis Manager, beginning at the Start menu.

2.  Expand the Analysis Servers folder by clicking the "+" sign to its immediate left.

Our server(s) appear.

3.  Expand the desired server.

Our database(s) appear, in much the same manner as shown in Illustration 1.


Illustration 1: Example Databases Displayed within Analysis Manager

4.  Expand the FoodMart 2000 database.

5.  Expand the Cubes folder.

The sample cubes appear, as shown in Illustration 2.


Illustration 2: The Sample Cubes in the FoodMart 2000 Database

NOTE: Your databases / cube tree will differ, depending upon the activities you have performed since the installation of MSAS (and the simultaneous creation of the original set of sample cubes). Should you want or need to restore the cubes to their original state, simply restore the database under consideration. For instructions, see the MSSQL Server 2000 Books Online.

6.  Right-click on the Warehouse sample cube.

Again, we are making a copy of the Warehouse cube, because our lesson will involve making changes to the cube we use within the practice example. As we have noted, working with the copy will allow us to maintain our existing sample cube in its current condition, and available to other users.

7.  Select Copy from the context menu that appears.

8.  Right-click on the Cubes folder.

9.  Select Paste from the context menu that appears.

The Duplicate Name dialog appears.

As noted in previous articles, we cannot have two cubes of the same name in a given MSAS database.

10.  Type the following into the Name box of the Duplicate Name dialog:

OPTIMAL SCHEMA

The Duplicate Name dialog appears, with our modification, as depicted in Illustration 3.


Illustration 3: The Duplicate Name Dialog, with New Name

TIP: As I have mentioned elsewhere in this and other series, the foregoing is also an excellent way of renaming a cube (a "rename" capability is not available here, as it is in many Windows applications). Simply create a duplicate, give it the name to which you wish to rename the old cube, and then delete the old cube, as appropriate. This also works for MSAS databases, dimensions and other objects.

11.  Click OK to apply the name change.

The new cube, OPTIMAL SCHEMA, appears in the cube tree, among those already in place. We now have a copy of the Warehouse cube, within which we can perform the steps of our practice exercise. Let's process the new cube to "register" it with Analysis Services, and to ensure that we are all in a "processed" state.



Process the Copy Cube


1.  Right-click the new OPTIMAL SCHEMA cube.

2.  Select Process... from the context menu that appears, as depicted in Illustration 4.




Illustration 4: Select Process... from the Context Menu

The Process a Cube dialog appears, as depicted in Illustration 5, with the processing method defaulted to Full Process (as this is the first time the cube has been processed).




Illustration 5: Full Process Selected in the Process a Cube Dialog

3.  Click OK to begin processing.

Processing begins. The Process viewer displays various logged events, then presents a green Processing completed successfully message, as shown in Illustration 6.


Illustration 6: Indication of Successful Processing Appears (Compact View)

If we examine the SQL generated in processing the cube (by highlighting either of the rows beginning with the word "Partition," in the Process viewer, then clicking View Details), we will see that the joins under consideration are in effect.

4.  Click Close to dismiss the viewer.

We will next examine representative properties of the OPTIMAL SCHEMA cube, as they appear in the design left by the departing developer, before undertaking the process for selecting the Optimize Schema option to make the cube process faster.

Review Select Property Settings

Before we pursue schema optimization further, let's take a look at the initial settings of the Member Key Column property in our copy of the Warehouse cube, to get a feel for how the initial developer "pointed" it. We have already discussed the fact that the Optimize Schema option changes this property. We can verify that fact with ease by examining the setting before and after the selection of the option.

1.  Right-click the OPTIMAL SCHEMA cube in the cube tree.

2.  Select Edit from the context menu that appears.

The Cube Editor opens.

3.  Expand the Dimensions folder in the Tree pane, by clicking the "+" sign to its left, as necessary.

4.  Expand the Store dimension in the Tree pane.

5.  Click the lowest level in the hierarchy, the Store Name level, as depicted in Illustration 7.


Illustration 7: Levels Appear in the Expanded Store Dimension

6.  With the Basic tab selected in the Properties pane, below the Tree pane (click Properties beneath the tree pane if the Properties pane does not appear), examine the Member Key Column property setting, as shown in Illustration 8.


Illustration 8: Member Key Column Property - Basic Tab of Store Dimension Property Pane

We note that the Member Key Column is tied to the store_id column of the store table. This is not an unusual approach when one is first developing cubes, particularly with a wizard, and optimization has not become a focus. We can view this column, also, with a glance at the store dimensional table, which we can easily see on the Schema tab, as depicted in Illustration 9.


Illustration 9: The Member Key Column - In the Dimensional Table, As Well as the Fact Table

We also notice that the same column, store_id, exists in the fact table of the same schema. It is upon this key that the dimensional (store) and fact (inventory_fact_1997) tables in the star schema are joined.

If we look at the Member Key Column property for the lowest levels of the Warehouse and Product dimensions, Warehouse Name and Product Name, respectively, we see a similar occurrence - each is pointed to a column with a similar name in the associated dimension table (warehouse_id and product_id, respectively), which has an identical counterpart in the fact table

If, however, we examine the same property in the Time dimension, we see a bit of a difference.

7.  Expand the Time dimension in the Tree pane.

8.  Click the lowest level in the hierarchy, the Month level, as shown in Illustration 10.


Illustration 10: The Month Level in the Time Dimension

With the Basic tab selected in the Properties pane, we can examine the Member Key Column property setting, as we did for the Store dimension earlier. A glance to the schema tab on the right reveals a difference, in this case, between this key and the two joined keys in the dimension and fact tables: We note that the Member Key Column for the lowest level of the Time dimension, Month, is set to the month_of_year column of the time_by_day table.

The Schema tab reveals that the Member Key Column for Month, month_of_year, is not the key with which the time_by_day table is joined to the inventory_fact_1997 table. The common key in the schema is time_id, as depicted in Illustration 11.


Illustration 11: Key Joining Fact and Dimensional Tables Differs from Member Key Column

(Make a mental note of the difference in this case; we will revisit it shortly.)

We have examined the Member Key Column property settings, as the original developer left them, for the standard dimensions within our practice cube. Let's move next to optimize the schema and return to review these settings, to reinforce our understanding of the workings of the process.

Procedure

The process of selecting the Optimize Schema option is about as straightforward as it gets. The ease with which the option can be applied is a major enticement to its use, for anyone wishing to enhance cube processing performance quickly, and with minimal effort. Let's take the plunge, and then study the results by returning to the settings we have just examined.

1.  Click Tools in the main menu of the Cube Editor.

2.  Click Optimize Schema from the menu that appears, as shown in Illustration 12.

Click for larger image

Illustration 12: Select Tools -> Optimize Schema

The Cube Schema Optimization message box appears, as depicted in Illustration 13.

Click for larger image

Illustration 13: The Cube Schema Optimization Dialog - Details of the Optimization

We can ascertain, through the information provided on the Cube Schema Optimization message box, that the Member Key Column properties for the Store, Warehouse and Product dimensions have been optimized. The initial settings, which the original developer pointed to the keys in the respective dimension tables, have now been redirected to their (identical) counterparts in the fact table, inventory_fact_1997.

We also note the statement that the Time dimension cannot be optimized through the mechanism of the option. This is because of the disparity between the Time key column identifiers shared in the join between the Time dimension table and the fact table, and the Member Key Column property for the lowest-level Time dimension member, Month, which we mentioned earlier.

We are also informed that private and virtual dimensions cannot be subjected to the optimization process. This behavior is consistent with the set of requirements that MSAS imposes on dimensions that are candidates for the Optimize Schema option, as we learned in the Overview and Discussion section above.

3.  Click OK to close the Cube Schema Optimization message box.

Confirm Results

We will take a look at the Member Key Column property settings once again, and see the physical changes are as expected in our copy of the Warehouse cube. To do this from our current position within the Cube Editor, we will take the following steps:

1.  Expand the Dimensions folder, as required, in the Tree pane, by clicking the "+" sign to its left, if necessary.

2.  Expand the Store dimension in the Tree pane, if required.

3.  Click the lowest level in the hierarchy, the Store Name level, as we did in our earlier examination of the associated Member Key Column property.

4.  With the Basic tab selected in the Properties pane, below the Tree pane, examine the Member Key Column property setting, as shown in Illustration 14.


Illustration 14: Member Key Column Property - Basic Tab of Store Dimension Property Pane

We note that the Member Key Column is tied to the store_id column of the inventory_fact_1997 table, just as we might have expected. We can also examine the Member Key Column property for the lowest levels of the Warehouse and Product dimensions, Warehouse Name and Product Name, respectively, to see that an identical modification has occurred: each is pointed to the column in the inventory_fact_1997 table that corresponds to the column (from the respective dimension table) that occupied it before we selected Optimize Schema (warehouse_id and product_id, respectively). An examination of the same property in the Time dimension, of course, makes it clear that no modification in the Member Key Column property has occurred, as we have discussed above.

To summarize the process we have witnessed within our practice cube, when we selected the Optimize Schema option, each dimension was evaluated to determine its compliance with the conditions for optimization. For each of the dimensions that met the required conditions, the Member Key Column property of the lowest level of the dimension was changed to reference the foreign key in the fact table, instead of the key in the dimension table to which it had previously referred. Going forward, the new reference directs MSAS to use the key from the fact table during processing, instead of issuing queries that join the dimension to the fact table in the relational database. The removal of joins results in reduced processing time. In many cases, the Optimize Schema option alone can enhance the cube processing cycle significantly.

Conclusion

In this article, we examined the use of the Optimize Schema option within the Cube Editor, as a means of making our cubes process faster. As in the other articles of this series, after introducing our topic, we described a scenario whereby we were called upon to meet the business needs of a hypothetical client. We then prepared for our practice exercise by creating and processing a copy of the Warehouse sample cube.

We next discussed the Member Key Column property of our cube, and examined the existing, pre-optimized settings within our practice cube. Moreover, we relayed important considerations to ponder in determining the appropriateness of our use of the Optimize Schema option in our respective business environments. We instituted the Optimize Schema option within the Cube Editor, and noted the information that MSAS provided via the resulting message box, informing us of the modifications that it had made in the Member Key Column property of the applicable dimensions. Finally, we examined and discussed the immediate effects of the Optimize Schema option within the structure of our cube.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved