Introduction to MSSQL Server 2000 Analysis Services: Another Approach to Local Cube Design and Creation

Monday Jun 21st 2004 by William Pearson
Share:

Design and create local cubes directly from a relational data source. Join MSAS Architect Bill Pearson in a hands-on introduction to another approach to the creation of local cubes, and to the integration of MS Office and MSAS.

About the Series ...

This is the twenty-fourth article of the series, Introduction to MSSQL Server 2000 Analysis Services. As I stated in the first article, Creating Our First Cube, the primary focus of this series is an introduction to the practical creation and manipulation of multidimensional OLAP cubes. The series is designed to provide hands-on application of the fundamentals of MS SQL Server 2000 Analysis Services, with each installment progressively adding features and techniques designed to meet specific real-world needs. For more information on the series, as well as the hardware / software requirements to prepare for the exercises we will undertake, please see my initial article, Creating Our First Cube.

Note: Service Pack 3 updates are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis Services, and the related Books Online and Samples. Images are from a Windows 2003 Server environment, upon which I have also implemented MS Office 2003, but the steps performed in the articles, together with the views that result, will be quite similar within any environment that supports MSSQL Server 2000 and MSSQL Server 2000 Analysis Services ("MSAS") (and uses MS Office 2000 and above, in cases where MS Office components are presented in the article).

Introduction

In our last article, Introduction to Local Cubes, we ventured beyond earlier topics surrounding the retrieval and reporting of data from a server-based MSAS cube, and transitioned into the realm of remote, independent OLAP data source design and creation. We explored approaches to creating local cubes within MS Office, discussing many of the foundational concepts behind the architecture of multidimensional data sources, and their creation from an integrated MS Office client application, Excel. As a part of a hands-on practice exercise, we then created a local cube from an existing Excel PivotTable report, sourced initially from an MSAS server-based cube.

We explored many practical aspects of putting the functionality to work immediately, discussing ways that local (or "offline") cubes can meet the business requirements of distributed information consumers, and add value to the organization in general. Throughout the hands-on practice exercise we performed, in creating a local cube from an existing server-based cube, we commented upon the results we obtained, to reinforce our understanding of the concepts involved.

In this article, we will explore a second approach to the creation of a local cube. While we will rely again upon the PivotTable report as our design and development tool, this time we will focus more on the use of Microsoft Query ("MS Query"), and begin with a relational database instead of an MSAS server-based cube. We will discuss advantages in taking this approach and situations for which it is especially useful. As with the prior article, Introduction to Local Cubes, the intent of this article is to offer options for more independence from the perspective of the information consumer, as well as to make the fruits of MSAS OLAP available to enterprise team members through the conduits of the applications that are pervasive in the desktop population we find in business today.

In this lesson, we will:

  • Discuss the creation of a local cube from a relational data source;
  • Discuss scenarios where starting with a relational source might be advantageous;
  • Discuss how the creation of a local cube from a relational data source can be used to complement an MSAS implementation;
  • Derive a subset of relational data as the basis of our local cube with the Query Wizard;
  • Introduce the OLAP Cube Wizard, and complete design of our local cube;
  • Discuss the results obtained through the various steps of the cube development process in our practice exercises.

Another Approach to Local Cube Design and Creation

Our second approach to creating a local cube will consist of a return to Microsoft Query, to which we were briefly introduced in Introduction to Local Cubes, when we first began the creation of our PivotTable report, in the creation of the database and connection phases of the PivotTable report setup. We will use Microsoft Query to create a subset of a relational data source, and then we will finish the cube creation process using the OLAP Cube Wizard.

Start with a Relational Data Subset

Many situations arise where we have neither a server-based cube nor an Analysis Server available, but in which we can access a relational data source. For example, we may become involved in the early phases of transitioning an enterprise to the use of local cubes for remote users (say a team of sales people or managers who need analytical capabilities on their laptops). We may need to prototype an eventual solution by outfitting a representative test group with basic cubes, in order to obtain useful feedback regarding the contemplated design for a server-based generation and distribution environment. After obtaining the specifics from the information consumers, we can more effectively place the final design in production, and allow the server to generate the cubes in automated fashion for field deployment.

The capability to design and create cubes directly from a relational data source provides an excellent opportunity to perform "proof of concept" exercises with a reality-based model - a model that will often "prompt" consumers to specify their requirements more clearly. Users that are unfamiliar with OLAP and its uses can see the model in action, and can relate it to their day-to-day needs, and signal suggested improvements from the standpoint of both the overall cube structure (including the dimensions, measures and other components that are critical to effective design), and interface and usability considerations of a more mechanical nature.

Furthermore, local cubes offer the capability of "parallel testing" proposed model changes (without actually making them to a given production cube), keeping the production cube build process operating in an insulated manner. Such "paralleling" is useful both as a troubleshooting and continuous improvement process. For example, many clients with whom I have worked in the past have asked me to investigate issues with cube size and performance. The cube under consideration was often the production model, so we were not afforded the luxury of simply taking it offline to modify and tune its build cycle.

Using a "zero based" approach to constructing a new cube (a rapid process using a local prototype), based upon the currently desired reporting output of the production cube, many issues were subsequently resolved through a close investigation of what data in the cube was actually being used in the field; redundancies and abandoned elements often came to light through a rapid redesign that was based on current needs. Moreover, examination of the evolutionary "layers" (which were gradually added onto the original cube structure as consumer needs became known), at which summarization was being attempted, often led to the derivation of a creation cycle that was more finely attuned to the effective use of preprocessing and aggregation at the RDBMS level, where appropriate. In situations such as these, and in many others, the generation of a local cube from the relational source tables can be highly useful to the organization.

As we have mentioned, we begin the direct-from-relational creation of a cube by first deriving a subset of the relational data. Preparation, as we shall see, involves 1) setup of a data source for the relational database, and then 2) creation of the query that specifies the selection criteria for the subset. A query tool is needed to precisely identify the parameters of that subset, and once again, MS Query provides a straightforward means of achieving the creation of this subset (or rowset, as it is called when using the SQL - based MS Query tool).

MS Query allows us to perform joins between the dimension and fact tables, to create calculated columns, and to otherwise prepare a "virtual warehouse / mart" on a miniature scale from which to build a cube. After we establish the rowset, we use the OLAP Cube Wizard to design the cube and its member objects (including measures, dimensions, hierarchical levels and so forth).

Starting with the Relational Data Source

Let's get started with defining the rowset upon which the cube will be based. We will keep the structure fairly simple to allow us to focus on design and creation concepts as we proceed.

1.  Open a new Microsoft Excel ("Excel") worksheet.

2.  Select Data from the top menu.

3.  Select Import External Data from the dropdown menu.

4.  Click New Database Query from the cascading menu, as shown in Illustration 1.

Click for larger image

Illustration 1: Initializing the Database (Rowset) Query

MS Query is initialized, and the Choose Data Source dialog appears.

5.  On the Database tab of the dialog (the default when the dialog appears), choose Microsoft Access Database.

We leave the Query Wizard enabled, via the default setting on the dialog - the checkbox - to assist in guiding us through the design process.

The Choose Data Source dialog appears, with our selection, as depicted in Illustration 2.


Illustration 2: The Choose Data Source Dialog for Microsoft Query

6.  Click OK.

The Select Database dialog appears, along with a Connecting to Data Source progress dialog, as shown in Illustration 3.


Illustration 3: The Select Database Dialog

7.  Find and select the FoodMart 2000 Access database (FoodMart 2000.mdb), which is typically installed in the Samples folder of the Microsoft Analysis Services directory under Program Files on the drive upon which the Typical installation took place (in my case, for example, the .mdb is located at D:\Program Files\Microsoft Analysis Services\Samples).

The Select Database dialog, with our selection, appears as that depicted in Illustration 4:


Illustration 4: The Select Database Dialog

8.  Click OK.

The Query Wizard - Choose Columns dialog appears.

9.  Select the tables listed below, from the Available Tables and Columns box on the left, by selecting each and clicking the > button, to move the respective columns to the right.

  • product
  • product class
  • region
  • sales_fact_1998
  • store
  • time_by_day

The Query Wizard - Choose Columns dialog appears, as partially shown in Illustration 5.


Illustration 5: The Query Wizard - Choose Columns Dialog (Partial View)

While we would probably expand each table and select the specific columns we needed for our cube, in a real-world scenario, we have selected each of the tables in its entirety at this point, to make the process quicker for our lesson.

10.  Click Next.

The Query Wizard - Filter Data dialog appears, as shown in Illustration 6.


Illustration 6: The Query Wizard - Filter Data Dialog

We will simply click Next to skip this step, again noting the importance of filtering the data in a real-world scenario, to keep cube size minimal when the relational data source is large.

11.  Click Next.

The Query Wizard - Sort Order dialog appears, as shown in Illustration 7.


Illustration 7: The Query Wizard - Sort Order Dialog

Again, we will pass on sorting the data; we will not be working with the end product in this lesson. This, too, would likely be handled differently in most business environments.

12.  Click Next.

The Query Wizard - Finish dialog appears.

13.  Click Save Query, and name the query LocalCubeRelational, placing it in a convenient directory (I accepted the default).

The Save As dialog resembles that depicted in Illustration 8.


Illustration 8: Saving the Underlying Relational Query

14.  Click Save.

We are returned to the Query Wizard - Finish dialog.

15.  Click the Create an OLAP Cube from this Query radio button to select it.

The Query Wizard - Finish dialog appears, as shown in Illustration 9.


Illustration 9: The Query Wizard - Finish Dialog

16.  Click Finish.

The colorful Welcome dialog of the OLAP Cube Wizard is launched, based upon our selection in the Finish dialog. It appears as shown in Illustration 10, awaiting instructions as to how we want it to create the cube, for which we have established a source definition in MS Query.


Illustration 10: The OLAP Cube Wizard Welcome Dialog Appears

Cube Design and Creation with the OLAP Cube Wizard

We now enter the cube design and creation phase of our second approach for creating a local cube. Our next steps will focus upon organization of the external data we have defined for extraction, and the manner in which we want it to summarize, and to appear for analysis and reporting. As we have learned, many reporting options exist, including PivotTable reports (see Reporting Options for Analysis Services Cubes: MS Excel 2002 of this series for basic creation and use), PivotTable Lists (see Reporting Options for Analysis Services Cubes: MS FrontPage 2002 for an introduction to this option), PivotChart reports, and others. Many options exist, as well, outside of the MS Office suite, as various third-party reporting tools can access the OLAP Cube that we will generate (see my Database Journal articles index for other reporting options).

The OLAP Cube Wizard allows us to begin with the output (a flat series of records) of the query we have designed in MS Query, and to then apply a hierarchical organization to the fields. It also allows us to define the summary values we want to calculate for optimal reporting purposes. In addition to summarized values, our cube will contain descriptive facts surrounding those values.

The values to be summarized, or measures as we know them from other OLAP scenarios, are called data fields within the context of the OLAP Cube Wizard. The descriptive facts, such as the date and location of a transaction, are organized into the hierarchical levels of detail that we know as dimensions.

The successful definition of the dimensions and their associated levels depends upon determining the kinds of categories that the information consumers employ (or want to be able to employ) when they analyze the data in reports and browsers. We can organize data fields and dimensions to endow organizational reports with high-level summaries (such as total costs worldwide, or at country or regional levels), while also enabling the presentation of lower-level details, filtered for a myriad of criteria (such as locations or areas of management responsibility where costs are particularly high, or, alternatively, well controlled and minimal).

As we have discussed, the local cube design and creation process is easy, flexible and, best of all (from the perspective of "proof of concept" and other prototyping exercises) fast. After we create and view reports based upon a new version of a local cube, we can return immediately to the OLAP Cube Wizard to make changes to adjust for consumer suggestions and comments regarding usability and performance, as well as to test ideas we formulate on an ad hoc basis. A local cube means isolation of the development process and uninterrupted operation of any production cubes that we have in place. It also means ultimate portability and convenience, both in the design phase and in a distributed production scenario.

Let's begin exploring the processes involved in working with the OLAP Cube Wizard to create our local cube. The steps consist of the following:

  • Defining the data fields
  • Defining the dimensions and levels
  • Selecting the type of cube

17.  Click Next at the Welcome screen for the OLAP Cube Wizard.

The OLAP Cube Wizard - Step 1 of 3 dialog appears, as partially shown in Illustration 11.


Illustration 11: The OLAP Cube Wizard - Step 1 of 3 Dialog (Partial View)

The Step 1 of 3 dialog is a great example of how the wizard makes design straightforward and rapid - assuming planning (based upon a solid understanding of the business requirements of the information consumers) has taken place before we embark upon cube design. We simply select from a list the data source fields that we wish to present, and how we wish to summarize each of those fields, in an efficient and easy-to-use screen.

In this step, it is important that we decide which of our source data fields it makes sense to use as data fields. Data fields contain values (that is, they are measures) that we want to summarize, such as store costs for which information consumers have a need for totals. The wizard requires that we select at least one field to be a data field.

When the dialog initially appears, the wizard has several boxes checked already. These are selected by the well-meaning (but not necessarily correct) wizard, based upon its conclusion that these fields appear to contain measure-like data. It "proposes" them, as a result, for selection in this step. It is critical to verify whether the wizard's proposals are correct, and to make any changes to fit our business requirements. The fields that we leave unchecked in this step will comprise the set of available dimension fields in Step 2, from which we will select and organize those we need to design our dimension hierarchy structures

18.  Fill out the Step 1 0f 3 dialog, ensuring that only the settings in Table 1 below exist (clearing any unwanted checkboxes).

Source Field

Summarize By

Data Field Name

store_sales

Sum

Store Sales

store_cost

Sum

Store Costs

unit_sales

Sum

Store Unit Sales


Table 1: Initial Measures List with Suggested New Names

In the above settings, we made minor modifications to the field names, as we might to fit terminology that exists in current reports, and so forth.

The OLAP Cube Wizard - Step 1 of 3 dialog now partially appears, with all relevant selections displayed, as shown in Illustration 12.


Illustration 12: Step 1 of 3 Dialog (Partial View) with Our Selections

19.  Click Next.

The OLAP Cube Wizard - Step 2 of 3 dialog appears.

In this step, we organize the descriptive data into dimensions, each of which can be used as a field in any report we generate from our cube. The organization of the fields in levels of detail that we design at this stage should allow information consumers to select the level of detail to view, starting with high-level summaries, drilling to details and zooming back to summaries as appropriate to meet their reporting needs.

The wizard requires that we designate at least one dimension for a cube. We can designate fields that provide isolated facts, and do not belong in any particular hierarchy, such as the Store Type in our example, as dimensions with a single level. Rather obviously, our cube will be more useful for reports if we design some of the fields, as levels, to "roll up" to higher levels and dimensions.

To create a level within a dimension, drag each field from the Source Fields list onto an existing dimension or level in the Dimensions box, as shown in the following steps. To rename a selection, simply right-click and select Rename from the shortcut menu that appears. (The "click label and wait" routine also enables the direct typing of changes.)

20.  Move the selections shown below in Table 2 from the Source fields ("Table Name" in Table 2) list on the left to the appropriate position in the Dimensions list on the right. (To correctly place the dimensions / levels under the dimensions, use the "template" guide that automatically adjusts itself to remain at the bottom of the existing Dimension list, for each new dimension created.)

21.  Rename each "Table Name" selection, with the suggested "New Name" below it, as shown in Table 2. (See Illustration 13, below Table 2, to clarify any confusion as to placement).

Dimension

Level1

Level2

Level3

Table Name

product_category

product_category

product_subcategory

product_name

New Name

Product

Category

Subcategory

Product Name

Table Name

sales_country

sales_country

sales_state_province

sales_city

New Name

Sales

Sales Country

Sales State

Sales City

Table Name

store_country

store_country

store_state

store_city

New Name

Store

Store Country

Store State

Store City

Table Name

the_year

the_year

quarter

the_month

New Name

Time

Year

Quarter

Month


Table 2: Initial Dimensions List with Suggested New Names

The OLAP Cube Wizard - Step 2 of 3 dialog now resembles that partially illustrated in Illustration 13.


Illustration 13: Partial View of Step 2 of 3 Dialog with Our Selections

22.  Click Next.

The OLAP Cube Wizard - Step 3 of 3 dialog appears.

23.  Select the radio button with the caption Save a cube file containing all data for the cube by clicking it, if necessary.

24.  Select a convenient location in which to save the cube file. (I left mine at default.)

25.  Click Save, as required.

The OLAP Cube Wizard - Step 3 of 3 dialog now appears similar to Illustration 14 below.


Illustration 14: Step 3 of 3 Dialog with Cube Type Selection and File Name / Location

With another selection in this dialog, we decide what kind of cube we want to create. Our choice here depends on several factors in our operating environment, including the amount of data our cube will contain; the type and complexity of reports we plan to create based upon our cube; and the memory, disk space and other resources on our systems (as well as those of the systems upon which the information consumers will be interacting with our cube).  Experience and planning will be our best guides to selecting the correct options, as we develop larger and more complex cubes for various organizational needs. Experience and planning will be our best guides as we develop larger and more complex cubes for various organizational needs.

The Save a cube file containing all data for the cube option creates a separate cube file on our PCs, retrieving all the data we have designated within our cube design, and storing it in this file. This selection is not appropriate in all situations, but might represent a good choice when:

  • We are constructing and generating a cube for frequently changing interactive reports;
  • The amount of disk space used by the file is not a limiting concern;
  • We want to store the cube on a network server as a standalone source that information consumers can access to create their own reports (an alternative, but often innovative, use for a local cube. Local cubes are great to use for training sources for fledgling report writers, as well.)

A local cube file can act as an intermediate source of data from the original relational database that excludes source data to which we might want to prevent access. A local cube file can also provide a snapshot of some or all of the source database to facilitate offline access and analysis, either for a consumer community (for instance, on an isolated, separate network), or for a sole remote consumer or consumers.

As in most scenarios of variable processing and storage scenarios, resource and speed tradeoffs are a factor to consider. With the Save a cube file containing all data for the cube option, we can expect more time and resources to be necessary for the initial creation of the cube; but read operations, such as opening and modifying reports, will likely be faster (although cube size is a factor to consider in read speeds). In addition, the fact that the cubes we generate are self-contained is often a deciding factor for the selection of the Save a cube file containing all data for the cube option.

The sheer amount of the data we include in the cube, together with the number of dimensions and levels we attach to the model, are key factors in predicting ultimate cube size. Flatter hierarchies and filtered data selections are considerations in reducing cube size, as are other Cube Type options that can be selected on the Step 3 of 3 dialog. A close study of the options and prudent design of the cube, combined with testing in an appropriate development environment (and well facilitated by the ability to create and modify local cubes quickly and easily, as I have emphasized) will contribute heavily to efficient cube generation, delivery and overall operations.

For more information regarding the details of the various choices that appear on this dialog, as well as optimization techniques for cube building in general, see the Books Online that are installed with the Typical MSSQL Server 2000 / Analysis Services installation, or which can be accessed on the installation CDs or on the Microsoft MSSQL Server website.

26.  Click Finish.

The Save As dialog appears, prompting us to name the definition file of the OLAP query (.oqy).

27.  Name the file LocalCube_Relational, and navigate to store it in a convenient place. (I again left mine at default.)

The Save As dialog appears as shown below.


Illustration 15: The Save As Dialog with Location Indicated for the New Definition File

MS Query prompts us to save the cube definition (.oqy) file, which is separate from the cube file that we create to store data. We can reuse the .oqy file in Excel for report creation or for other possible purposes later. When we chose the Save a cube file containing all data for the cube option at the last dialog, a file with a .cub extension was "scheduled" to be created, in a location to be specified. The .cub file actually contains the data for the cube, and is not created immediately when we click Finish. In our case, it is created when we save the cube definition as a file; once MS Query creates the OLAP query file, it hands off instructions to the PivotTable Service to use the newly saved definition to kick off creation of the local cube.

To modify our initial cube design, we have only to open the .oqy file to initialize the OLAP Cube Wizard once more. For more details regarding the .oqy files, see the MS Query Online Help.

28.  Click the Save button on the Save As dialog.

The .oqy file is quickly saved, and the Creating Offline Cube dialog appears, and remains until the cube is created, confirming that the build is taking place.

We are returned to Microsoft Excel, where we are greeted by the PivotTable and PivotChart Wizard - Step 3 0f 3 dialog.

29.  Click the top, left corner cell (Sheet1!$A$1) to place the new PivotTable on the current worksheet.

30.  Click Finish.

We see the standard PivotTable "map" appear, signaling that 1) we have a connection to the new cube and 2) that the cube is ready to be reported upon using standard PivotTable report procedures (see Reporting Options for Analysis Services Cubes: MS FrontPage 2002 for a step-by-step tutorial on PivotTable report navigation and use). The PivotTable report appears as shown in Illustration 16 below.


Illustration 16: The PivotTable Report - Ready for Reporting Action

The cube is now ready for reporting, and, from the perspective of a reporting application, is in many ways identical to a server-generated cube.

31.  Save the Excel worksheet as desired.

Keep in mind that we can call Microsoft Query at any time to rapidly edit the initial .oqy file, so as to make modifications based upon the results obtained in reporting efforts, or as new requirements arise from the information consumers who test the new cube design in development. A quick rebuild of the cube will implant any changes, giving us the opportunity to immediately test for desired results again.

Summary ...

In this article, we continued our exploration of local cubes, using MS Office components to access the realm of OLAP data source design and creation. We created a local cube from an existing Excel PivotTable report in our previous session, and then we performed a step-by-step exploration of a more design-oriented route in this lesson, using the OLAP Cube Wizard, to accomplish cube creation in a more flexible manner.

We drilled into many practical aspects of putting the functionality to work immediately, discussing ways that local (or offline) cubes can meet the business requirements of distributed information consumers ( a topic we first introduced in the previous article, Introduction to Local Cubes), and can add value to the organization in general. Throughout our hands-on practice session of creating a local cube from a relational data source, we discussed the advantages of using our local cube design process as a flexible and portable means of rapidly prototyping enterprise OLAP data sources.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved