Simplify an Analysis Services cube schema
with point-and-click simplicity, eliminating joins between dimension and fact
tables. MSAS Architect Bill Pearson leads a hands-on introduction to using this
option to significantly reduce cube processing time.
About the Series ...
article is a member of the series Introduction to MSSQL Server 2000 Analysis
Services. The series is designed to provide hands-on application of
the fundamentals of MS SQL Server 2000 Analysis Services, with each
installment progressively adding features and techniques designed to meet
specific real - world needs. For more information on the series, as well as
the hardware / software requirements to prepare for the exercises we
will undertake, please see my initial article, Creating Our First Cube.
Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL
Server 2000 Analysis Services, and the related Books Online
and Samples. Images are from a Windows 2003 Server
environment, upon which I have also implemented MS Office 2003, but the
steps performed in the articles, together with the views that result, will be
quite similar within any environment that supports MSSQL Server 2000 and MSSQL Server 2000 Analysis Services ("Analysis
Services" or "MSAS"). The same is generally true,
except where differences are specifically noted, when MS Office 2000 and
above are used in the environment, in cases where MS Office components
are presented in the article.
In this article, we will explore another tool that MSAS
offers for the enhancement of cube processing, the Optimize Schema
option. Optimize Schema attempts to identify unnecessary joins between
our fact and dimension tables, and then to remove them. In many cases, the tool
works effectively to accomplish this, leading to a significant reduction in a
cube's processing time. Elimination of the joins means more rapid resolution
of MSAS' queries to the relational database, upon which our cube is dependent
as a data source. This, in turn, means that data is pulled into Analysis
Services in less time, contributing to a more rapid cube build overall.
operation of the Optimize Schema option takes advantage of a common
circumstance within the construction of many star or snowflake schemas: the foreign
key that serves as the basis of a join between the fact table and a given
dimension table is identical to the member key itself. When this is the
case, MSAS can eliminate the join, and source the member key directly from the
fact table, instead of relying upon a join to the dimension table to obtain the
In this article, we will examine the use of Optimize
Schema in making our cubes process faster. To accomplish this objective, we
Overview the Optimize
Create a copy
of the Warehouse sample cube for use in our practice session;
cube copy further by processing;
Discuss the Member
Key Column property, and examine existing settings within our practice cube;
possible considerations in determining the appropriateness of the use of the Optimize
Schema option in our respective business environments;
practice exercise within which we employ the Optimize Schema option;
of the effects of using Optimize Schema.
Simplify Schemas for Faster Cube Processing
Overview and Discussion
processing performance is often a high visibility subject to our clients and
employees. There are many means and considerations in optimizing build times,
but few are as straightforward as the use of Optimize Schema, which
quickly and "automatically" makes modifications that can provide
palpable improvements, in many cases. Manual intervention can become necessary,
in some instances, to achieve the intended objectives of the option for all
dimensions, but the concepts behind the steps are identical.
way this all works is, simply, that the Member Key Column property,
aligned with the lowest level of a shared dimension in the cube, is
pointed to a column in the fact table instead of a column in a dimension
table. When this can be accomplished, the need for joins is eliminated in
the SQL query that MSAS executes against the relational data source to populate
the cube. As we might expect, such a circumstance imparts multiple benefits,
including reductions in query complexity, reductions in the amount of data
accessed in the relational data source, and reductions in network traffic
between the Analysis Server and the relational data source.
conditions must exist for schema optimization (via Optimize Schema, or its
manual equivalent) to be effective. For any dimension under consideration, these conditions are as follows:
must be a shared dimension;
must have been processed at least once prior to selecting the Optimize
dimension level must be physically represented in the cube (it can
be hidden, but not disabled);
Key Column property for the lowest dimension level must contain the key
that relates the fact table to the dimension table;
Key Column property for the lowest dimension level must be the only key that
is needed to relate the fact and dimension tables;
Key Column property for the lowest dimension level must be unique.
Multiple considerations can arise in altering the schema of a
cube through the Optimize Schema option. These include scenarios where
our cube design is dependent upon joins between the dimension and fact tables,
as a mechanism for forcing the exclusion of rows of the fact table from the
cube that is to be generated. As is no doubt apparent to most of us, removal
of all joins through Optimize Schema, or any other process will mean a
corresponding removal of the filtering mechanism, and the resulting selection
of all rows in the fact table.
Moreover, it is important to keep in mind that the Optimize
Schema process, in its elimination of existing joins, can alter our
selection of available tables in subsequent attempts to specify drillthrough
options. This, like all the other considerations, needs to be evaluated as
a part of planning to use the Optimize Schema option, or to pursue other
steps to replicate its action, to enhance cube processing performance.
NOTE: When manually optimizing a
cube schema, it is important to remember that no warning or error dialogs will
appear if the incorrect column is selected in the respective Member Key
Column property. If incorrect choices are made, incorrect values may be
returned, or cube processing may fail altogether. If the optimized dimension
is used as a slicer within a cube partitioning strategy, processing failures
are likely, as well.
Within the "manual"
scenario, consideration needs to be given to examining several variables before
attempting to perform schema optimization. Consult the Microsoft
Developer Network (MSDN), the Books Online, and other
reliable documentation before attempting this on any but development systems -
and with a backup in place before beginning in every case.
To confirm our
understanding of the steps we will undertake, together with the impact of these
steps, we will perform "before and after" examinations of the cube of
a hypothetical client, particularly focusing on the settings of the Member Key Column property at the lowest level that
we have discussed. The changes that occur to this property as a result of our Optimize
Schema efforts will make the process more transparent. For purposes of our practice session, we will create
a copy of the targeted cube, as we have in various articles within our Database
Journal series'. We will then process the clone cube to "register"
it with Analysis Services, before beginning our examination of Optimize
Schema within our practice exercise.
Considerations and Comments
For purposes of this
exercise, we will be working with the Warehouse cube, within the FoodMart
2000 MSAS database; these working samples accompany a typical installation
of MSAS. If the samples are not installed in, or have been removed from, your
environment, they can be obtained from the installation CD, from the Analysis
Services section of the Microsoft website, and perhaps elsewhere. If you
prefer not to alter the structure of your sample cubes, as they currently
exist, make copies of the cube we reference in the article before beginning the
practice exercises. For instructions on copying cubes, see the Preparation
section of Introduction
to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic
We will begin with a
scenario that illustrates a requirement for a schema simplification,
using a hypothetical business need to add practical value. Let's say that our
client, the FoodMart organization, has approached us with an enhancement
request it wishes to meet, from its Warehouse cube users. We are told
that the cube was developed by an energetic young professional who has since
moved on. The client is happy with the work in general, but in light of
several "discoveries" they have made since the developer left, they
suspect that this cube, as well as several others, might experience enhanced
performance if they subjected its design to a "second set of eyes."
When we hear that the
initial developer had no formal training in MSAS or even OLAP, except for
constantly referencing rudimentary text she had obtained from a popular online
auction site, we propose an examination of the overall setup, including the
schema design in Analysis Manager for the cube. To prepare, we will create a
copy of the original Warehouse cube, to preserve the original in its
current state, as well as to allow access to the original cube in the meantime
from enterprise users.
Create a Copy Cube
Let's get started by
creating a copy of the Warehouse sample cube, which, along with the FoodMart
database that contains it, accompanies an MSAS installation. This will allow us
to keep the original sample cube intact for other uses.
Manager, beginning at the Start menu.
Expand the Analysis
Servers folder by clicking the "+" sign to its immediate
in much the same manner as shown in Illustration 1.
Illustration 1: Example
Databases Displayed within Analysis Manager
Expand the FoodMart
Expand the Cubes
sample cubes appear,
as shown in Illustration 2.
Illustration 2: The
Sample Cubes in the FoodMart 2000 Database
NOTE: Your databases / cube tree will differ, depending upon
the activities you have performed since the installation of MSAS (and the
simultaneous creation of the original set of sample cubes). Should you want or
need to restore the cubes to their original state, simply restore the database
under consideration. For instructions, see the MSSQL Server 2000 Books
the Warehouse sample cube.
are making a copy of the Warehouse cube, because our lesson will involve
making changes to the cube we use within the practice example. As we have
noted, working with the copy will allow us to maintain our existing sample cube
in its current condition, and available to other users.
from the context menu that appears.
the Cubes folder.
from the context menu that appears.
Name dialog appears.
in previous articles, we cannot have two cubes of the same name in a given MSAS
following into the Name box of the Duplicate Name dialog:
Name dialog appears, with our modification, as depicted in Illustration
Illustration 3: The
Duplicate Name Dialog, with New Name
TIP: As I have mentioned elsewhere in
this and other series, the foregoing is also an excellent way of renaming
a cube (a "rename" capability is not available here, as it is in
many Windows applications). Simply create a duplicate, give it the name to
which you wish to rename the old cube, and then delete the old cube, as
appropriate. This also works for MSAS databases, dimensions and other objects.
to apply the name change.
cube, OPTIMAL SCHEMA, appears in the cube tree, among those already in
place. We now have a copy of the Warehouse cube, within which we can
perform the steps of our practice exercise. Let's process the new cube to "register"
it with Analysis Services, and to ensure that we are all in a "processed" state.
Process the Copy Cube
the new OPTIMAL
from the context menu that appears, as depicted in Illustration 4.
Illustration 4: Select
Process... from the Context Menu
a Cube dialog appears, as depicted in Illustration 5, with the processing method
defaulted to Full Process (as this is the first time the cube has been
Illustration 5: Full
Process Selected in the Process a Cube Dialog
to begin processing.
begins. The Process viewer displays various logged events, then presents
a green Processing completed successfully message, as shown in Illustration 6.
Indication of Successful Processing Appears (Compact View)
examine the SQL generated in processing the cube (by highlighting either of the
rows beginning with the word "Partition," in the Process
viewer, then clicking View Details), we will see that the joins under
consideration are in effect.
to dismiss the viewer.
next examine representative properties of the OPTIMAL SCHEMA cube, as they appear in the design
left by the departing developer, before undertaking the process for selecting
the Optimize Schema option to make the cube process faster.
Review Select Property Settings
Before we pursue schema
optimization further, let's take a look at the initial settings of the Member
Key Column property in our copy of the Warehouse cube, to get a feel
for how the initial developer "pointed" it. We have already
discussed the fact that the Optimize Schema option changes this property.
We can verify that fact with ease by examining the setting before and after the
selection of the option.
OPTIMAL SCHEMA cube in the cube tree.
from the context menu that appears.
The Cube Editor
Expand the Dimensions
folder in the Tree pane, by clicking the "+" sign to
its left, as necessary.
Expand the Store
dimension in the Tree pane.
lowest level in the hierarchy, the Store Name level, as depicted in Illustration
Illustration 7: Levels
Appear in the Expanded Store Dimension
With the Basic
tab selected in the Properties pane, below the Tree pane
(click Properties beneath the tree pane if the Properties pane
does not appear), examine the Member Key Column property setting, as
shown in Illustration 8.
Illustration 8: Member
Key Column Property - Basic Tab of Store Dimension Property Pane
We note that the Member
Key Column is tied to the store_id column of the store table.
This is not an unusual approach when one is first developing cubes,
particularly with a wizard, and optimization has not become a focus. We can
view this column, also, with a glance at the store dimensional table,
which we can easily see on the Schema tab, as depicted in Illustration
Illustration 9: The
Member Key Column - In the Dimensional Table, As Well as the Fact Table
notice that the same column, store_id, exists in the fact table
of the same schema. It is upon this key that the dimensional (store)
and fact (inventory_fact_1997) tables in the star schema are joined.
If we look at the Member
Key Column property for the lowest levels of the Warehouse and Product
dimensions, Warehouse Name and Product Name, respectively, we see
a similar occurrence - each is pointed to a column with a similar name in the
associated dimension table (warehouse_id and product_id,
respectively), which has an identical counterpart in the fact table
If, however, we examine
the same property in the Time dimension, we see a bit of a difference.
Expand the Time
dimension in the Tree pane.
lowest level in the hierarchy, the Month level, as shown in Illustration
Illustration 10: The
Month Level in the Time Dimension
With the Basic tab
selected in the Properties pane, we can examine the Member Key Column
property setting, as we did for the Store dimension earlier. A glance to
the schema tab on the right reveals a difference, in this case, between this
key and the two joined keys in the dimension and fact tables: We note that the Member
Key Column for the lowest level of the Time dimension, Month,
is set to the month_of_year column of the time_by_day table.
The Schema tab
reveals that the Member Key Column for Month, month_of_year,
is not the key with which the time_by_day table is joined to the inventory_fact_1997
table. The common key in the schema is time_id, as depicted in Illustration
Illustration 11: Key
Joining Fact and Dimensional Tables Differs from Member Key Column
(Make a mental note of
the difference in this case; we will revisit it shortly.)
We have examined the Member
Key Column property settings, as the original developer left them, for the
standard dimensions within our practice cube. Let's move next to optimize the
schema and return to review these settings, to reinforce our understanding of
the workings of the process.
The process of selecting the Optimize Schema option is about as straightforward as it gets. The ease with which the option can be applied is a major enticement to its use, for anyone wishing to enhance cube processing performance quickly, and with minimal effort. Let's take the plunge, and then study the results by returning to the settings we have just examined.
1. Click Tools in the main menu of the Cube Editor.
2. Click Optimize Schema from the menu that appears, as shown in Illustration 12.
The Cube Schema Optimization message box appears, as depicted in Illustration 13.
We can ascertain, through the information provided on the Cube Schema Optimization message box, that the Member Key Column properties for the Store, Warehouse and Product dimensions have been optimized. The initial settings, which the original developer pointed to the keys in the respective dimension tables, have now been redirected to their (identical) counterparts in the fact table, inventory_fact_1997.
We also note the statement that the Time dimension cannot be optimized through the mechanism of the option. This is because of the disparity between the Time key column identifiers shared in the join between the Time dimension table and the fact table, and the Member Key Column property for the lowest-level Time dimension member, Month, which we mentioned earlier.
We are also informed that private and virtual dimensions cannot be subjected to the optimization process. This behavior is consistent with the set of requirements that MSAS imposes on dimensions that are candidates for the Optimize Schema option, as we learned in the Overview and Discussion section above.
3. Click OK to close the Cube Schema Optimization message box.
We will take a look at the Member Key Column property settings once again, and see the physical changes are as expected in our copy of the Warehouse cube. To do this from our current position within the Cube Editor, we will take the following steps:
1. Expand the Dimensions folder, as required, in the Tree pane, by clicking the "+" sign to its left, if necessary.
2. Expand the Store dimension in the Tree pane, if required.
3. Click the lowest level in the hierarchy, the Store Name level, as we did in our earlier examination of the associated Member Key Column property.
4. With the Basic tab selected in the Properties pane, below the Tree pane, examine the Member Key Column property setting, as shown in Illustration 14.
Illustration 14: Member Key Column Property - Basic Tab of Store Dimension Property Pane
We note that the Member Key Column is tied to the store_id column of the inventory_fact_1997 table, just as we might have expected. We can also examine the Member Key Column property for the lowest levels of the Warehouse and Product dimensions, Warehouse Name and Product Name, respectively, to see that an identical modification has occurred: each is pointed to the column in the inventory_fact_1997 table that corresponds to the column (from the respective dimension table) that occupied it before we selected Optimize Schema (warehouse_id and product_id, respectively). An examination of the same property in the Time dimension, of course, makes it clear that no modification in the Member Key Column property has occurred, as we have discussed above.
To summarize the process we have witnessed within our practice cube, when we selected the Optimize Schema option, each dimension was evaluated to determine its compliance with the conditions for optimization. For each of the dimensions that met the required conditions, the Member Key Column property of the lowest level of the dimension was changed to reference the foreign key in the fact table, instead of the key in the dimension table to which it had previously referred. Going forward, the new reference directs MSAS to use the key from the fact table during processing, instead of issuing queries that join the dimension to the fact table in the relational database. The removal of joins results in reduced processing time. In many cases, the Optimize Schema option alone can enhance the cube processing cycle significantly.
In this article, we examined the use of the Optimize Schema option within the Cube Editor, as a means of making our cubes process faster. As in the other articles of this series, after introducing our topic, we described a scenario whereby we were called upon to meet the business needs of a hypothetical client. We then prepared for our practice exercise by creating and processing a copy of the Warehouse sample cube.
We next discussed the Member Key Column property of our cube, and examined the existing, pre-optimized settings within our practice cube. Moreover, we relayed important considerations to ponder in determining the appropriateness of our use of the Optimize Schema option in our respective business environments. We instituted the Optimize Schema option within the Cube Editor, and noted the information that MSAS provided via the resulting message box, informing us of the modifications that it had made in the Member Key Column property of the applicable dimensions. Finally, we examined and discussed the immediate effects of the Optimize Schema option within the structure of our cube.
» See All Articles by Columnist William E. Pearson, III
Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.