Attribute Discretization: Using the "Clusters" Method

Tuesday Jun 23rd 2009 by William Pearson
Share:

Join BI Architect Bill Pearson as he leads hands-on practice with the “Clusters” discretization method in Analysis Services. In this article we continue to explore attribute discretization, as a part of an extended examination of the dimensional model lying at the heart of the integrated Microsoft Business Intelligence solution.

This article continues my exploration of attribute discretization, a capability in Analysis Services that allows us to group members of an attribute into a number of member groups. Our concentration here will be to get some exposure to the pre-defined “Clustered” discretization method, one of three such pre-defined methods supported by Analysis Services, through hands-on application of the method to a representative dimension attribute within our sample UDM.

This article continues the overview of Attribute Discretization in Analysis Services begun in Introduction to Attribute Discretization, and continued in Attribute Discretization: Using the Automatic Method and Attribute Discretization: Using the “Equal Areas” Method . Both this article and its predecessor extend the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. After taking up various additional components of the dimensional model in subsequent articles, we performed hands-on exploration of the general characteristics and purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V. We then fixed our focus upon the properties underlying attributes, extending our overview into attribute member Keys, Names, Values and Relationships within several subsequent articles.

Note: For more information about my Introduction to MSSQL Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of this article.

Introduction

In Introduction to Attribute Discretization, Attribute Discretization: Using the Automatic Method, and Attribute Discretization: Using the Equal Areas Method, I summarized preceding articles within the current subseries, consisting of a general introduction to the dimensional model. I noted the wide acceptance of the dimensional model as the preferred structure for presenting quantitative and other organizational data to information consumers. The articles of the series then undertook an examination of dimensions, the analytical “perspectives” upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:

  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and obtain rapid results datasets.

We extended our examination of dimensions into a couple of detailed articles. These articles, Dimensional Model Components: Dimensions Parts I and II, emphasized that dimensions, which represent the perspectives of a business or other operation, and reflect the intuitive ways that information consumers need to query and view data, form the foundation of the dimensional model. We noted that each dimension within our model contains one or more hierarchies. (As we learn in other articles of this series, two types of hierarchies exist within Analysis Services: attribute hierarchies and user - sometimes called “multi-level” - hierarchies.)

We next introduced dimension attributes within the subseries, and conducted an extensive overview of their nature, properties, and detailed settings in Dimensional Attributes: Introduction and Overview Parts I through V. We noted that attributes help us to define with specificity what dimensions cannot define by themselves. Moreover, we learned that attributes are collected within a database dimension, where we can access them to help us to specify the coordinates required to define cube space.

Throughout the current subseries, I have emphasized that dimensions and dimension attributes should support the way that management and information consumers of a given organization describe the events and results of the business operations of the entity. Because we maintain dimension and related attribute information within the database underlying our Analysis Services implementation, we can support business intelligence for our clients and employers even when these details are not captured within the system where transaction processing takes place. Within the analysis and reporting capabilities we supply in this manner, dimensions and attributes are useful for aggregation, filtering, labeling, and other purposes.

Having covered the general characteristics and purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V, we fixed our focus upon the properties underlying them, based upon the examination of representative attributes within our sample cube. We then continued our extended examination of attributes to yet another important component we had touched upon earlier, the attribute member Key, with which we gained some hands-on exposure in practice sessions that followed our coverage of the concepts. In Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys, we explored the concepts of simple and composite keys, narrowing our examination in Part I to the former, where we reviewed the Properties associated with a simple key, based upon the examination of a representative dimension attribute within our sample UDM. In Part II, we revisited the differences between simple and composite keys, and explained in more detail why composite keys are sometimes required to uniquely identify attribute members. We then reviewed the properties associated with a composite key, based upon the examination of another representative dimension attribute within our sample UDM.

In Attribute Member Names, we examined the attribute member Name property, which we had briefly introduced in Dimensional Attributes: Introduction and Overview Part V. We shed some light on how attribute member Name might most appropriately be used without degrading system performance or creating other unexpected or undesirable results. We then examined the “sister” attribute member Value property (which we introduced along with attribute member Name in Dimensional Attributes: Introduction and Overview Part V) in Attribute Member Values in Analysis Services. As we did in our overview of attribute member Name, we examined the details of Value. Our concentration was also similarly upon its appropriate use in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results, that can accompany the uninformed use of the property.

In Introduction to Attribute Relationships in MSSQL Server Analysis Services, we examined yet another part of the conceptual model, Attribute Relationships. In this introduction, we discussed several best practices and design, and other, considerations involved in their use, with a focus upon the general exploitation of attribute relationships in providing support, once again, for the selection and delivery of enterprise data. In the subsequent two related articles, Attribute Relationships: Settings and Properties and More Exposure to Settings and Properties in Analysis Services Attribute Relationships, we examined attribute relationships in a manner similar to previous articles within this subseries, concentrating in detail upon the properties that underlay them.

With the next article, Introduction to Attribute Discretization, we introduced a capability in Analysis Services – to which we refer as attribute discretization - that allows us to group members of an attribute into a number of member groups. We discussed design, and other, considerations involved in the discretization of attributes, and touched upon best practices surrounding the use of this capability.

In Attribute Discretization: Using the Automatic Method, we introduced the first of multiple pre-defined discretization methods supported within the Analysis Services UDM. We discussed the options that are available, focusing upon the employment of the Automatic discretization method within the sample cube, to meet the business requirements of a hypothetical client. We then began our practice session with an inspection of the contiguous members of a select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement. Next, we enabled the Automatic discretization method within the dimension attribute Properties pane, and then reprocessed the sample cube with which we were working to enact the new Automatic discretization of the select attribute members. Finally, we performed further inspections of the members of the attribute hierarchy involved in the request for assistance by our hypothetical client, noting the new, more intuitive grouping established by the newly enacted Automatic discretization method.

Finally, in last month’s article, Attribute Discretization: Using the Equal Areas Method, we introduced the second of the pre-defined discretization methods supported within the Analysis Services UDM. We discussed the options that are available with this particular approach, as we did in the article previous for the Automatic method, focusing upon the employment of the Equal Areas discretization method, again within the sample cube, to meet the business requirements of a hypothetical client. We then began our practice session with an inspection, via the browser in the Dimension Designer, of the contiguous members of another select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement. Next, we enabled the Equal Areas discretization method within the dimension attribute Properties pane, and again reprocessed the sample cube with which we were working to enact the new Equal Areas discretization of the select attribute members. Finally, we performed another inspection, via the Dimension Designer and Cube Designer browsers, of the members of the attribute hierarchy involved in the request for assistance by our hypothetical client, noting the new, more intuitive grouping established by the newly enacted Equal Areas discretization method.

In this article, we will gain some hands-on exposure to setting up yet another of the discretization methods supported by Analysis Services. We will first briefly review the options that are available (referencing their coverage in other articles, where applicable), and then work with Clusters discretization in the sample cube. (In individual articles designed specifically for the purpose, we will examine the setup of other discretization options, in a manner similar to previous articles within this subseries, gaining hand-on exposure to the use of those options in individual practice scenarios.)

Our examination will include:

  • A brief review of attribute discretization in Analysis Services, potential benefits that accrue from discretization in our UDMs, and how the process can help us to meet the primary objectives of business intelligence.
  • A brief overview of the multiple pre-defined discretization processes supported within the Analysis Services UDM.
  • Examination, via the browser in the Dimension Designer, of the pre-existing members of a select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement.
  • Enablement of the Clusters discretization method within the dimension attribute Properties pane.
  • Reprocessing the cube to enact the new Clusters discretization of the select attribute members.
  • Another examination, via the browsers in both the Dimension Designer and the Cube Designer, of the members of a select attribute hierarchy, noting the new, more intuitive grouping established by the newly enacted Clusters discretization method.
  • Backward- and forward-looking references to previous and subsequent articles, respectively within our series, wherein we perform detailed examinations surrounding other details of discretization, as supported within the Analysis Services UDM.

Using the Clusters Attribute Discretization Method

As we learned in Introduction to Attribute Discretization, whenever we work with attributes, we can expect to encounter two general types of values, discrete and contiguous. Discrete values stand apart distinctly, and have clearly defined logical “boundaries” between themselves. Citing the Gender attribute, within the Customer dimension of the Adventure Works sample UDM, wherein the attribute is considered to have only one of two discrete values, female or male, we noted that possible values are naturally discrete for the lion’s share of attributes occurring in the business world.

In contrast to discrete values, we noted that contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Moreover, we discussed the fact that contiguous values, especially within large populations, can have very large numbers of possible values, and that information consumers can find it difficult to work effectively and efficiently within such wide ranges of values. As an example, we cited the Sick Leave Hours attribute, within the Employee dimension of the Adventure Works sample UDM, an attribute which could have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many sick leave days they can accumulate, and considerations of this nature. (The member values are based directly upon the values contained within the SickLeaveHours column of the DimEmployee table - with many of the values shared among multiple employees.) The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.

As we discussed in Introduction to Attribute Discretization, discretization can help us to make it easier for information consumers to work with large numbers of possible attribute member values. As discretization creates a manageable number of groups of attribute values that are clearly separated by boundaries, we can thereby group contiguous values into sets of discrete values, via a system-generated collection of consecutive dimension members known as member groups.

We also discussed, in Introduction to Attribute Discretization, that once the discretization process groups the attribute members into the member groups, the member groups are then housed within a level within the dimensional hierarchy. (A given level within a dimensional hierarchy can contain either members or member groups, but not both.) When information consumers browse a level that contains member groups, they see the names and cell values of the member groups. The members generated by Analysis Services to support member groups are called grouping members, and they look like ordinary members.

Analysis Services affords us flexibility in methods of attribute discretization, based upon algorithms of varying complexity. The different methods of discretization all have the same function – to group contiguous values into sets of discrete values. (Analysis Services can also effectively discretize large numbers of discrete values, such as Social Security numbers.) The methods simply manage grouping via different approaches.

Beyond the Dimension Designer, Analysis Services also supports user-defined discretization, via data definition language (DDL), should the “out – of – the – box” approaches not meet the business needs of our local environments. Moreover, we can alternatively implement custom discretization via the underlying data warehouse, using views at the relational level; named calculations in the data source view; calculated members in Analysis Services; or via other approaches.

Analysis Services supports four DiscretizationMethod property settings, which include three pre-defined discretization methods:

  • None (default): Analysis Services performs no grouping, and simply displays the attribute members.
  • Automatic: Analysis Services selects the method that best represents the data: either the EqualAreas method or the Clusters method.
  • EqualAreas: Analysis Services attempts to divide the members in the attribute into groups that contain an equal number of members.
  • Clusters: Analysis Services groups members by performing single-dimensional clustering on the input values by using the K-Means algorithm. It uses Gaussian distributions. This can only be used for numeric columns.

The DiscretizationMethod property determines whether Analysis Services is to create groupings, and then determines the type of grouping that is performed. As we learned in Introduction to Attribute Discretization, Analysis Services does not perform any groupings by default (the default setting for the DiscretizationMethod property is “None”). When we enable Clusters grouping, as we shall see in the practice session that follows, we direct Analysis Services (via the DiscretizationMethod property) to group members by performing single-dimensional clustering on the input values. We then process the affected dimension / cube and, according to the SQL Server 2005 Books Online, the associated Analysis Services algorithm divides the data into groups by sampling a random, 1000-row subset of the data (the “training data”), initializing to a number of random points, and then running several iterations of the Microsoft Clustering algorithm using the Expectation Maximization (EM) clustering method. The Clusters method is particularly useful because it works on any distribution curve. A disadvantage, however, lies in the fact that this discretization method requires more processing time than the other methods. Another limitation is that the Clusters method, as we have noted, can only be applied to numeric data columns.

NOTE: If sampling by the algorithm is undesirable, we can use the Equal Areas discretization method. See Attribute Discretization: Using the Equal Areas Method for details.

As we shall see, and as we have seen with the other two discretization methods in the articles I published to introduce each, once we specify a grouping method, we next specify the number of groups, by using the DiscretizationBucketCount property (its default value is zero).

As we have noted throughout my MSSQL Server Analysis Services column, as well as throughout my other Database Journal series’, one of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of opportunities where we can effectively use attribute discretization, and the effective design and placement of the member groups that we generate thereby, can mean the provision of a much more consumer-friendly interface for our clients and employers. We will gain hands - on exposure to the Clusters method of attribute discretization in the practice session that follows. Before we get started working within a sample cube clone, we will need to prepare the local environment for the practice session. We will take steps to accomplish this within the section that follows.

Preparation: Locate and Open the Sample Basic UDM Created Earlier

In Dimensional Model Components: Dimensions Part I, we created a sample basic Analysis Services database within which to perform the steps of the practice sessions we set out to undertake in the various articles of this subseries. Once we had ascertained that the new practice database was in place, and once we had renamed it to ANSYS065_Basic AS DB, we began our examination of dimension properties. We continued with our examination of attributes within the same practice environment, which we will now access (as we did within the earlier articles of this subseries)) by taking the following steps within the SQL Server Business Intelligence Development Studio.

NOTE: Please access the Analysis Services database which we prepared in Dimensional Model Components: Dimensions Part I (and have used in subsequent articles) before proceeding with this article. If you have not completed the preparation to which I refer, or if you cannot locate / access the Analysis Services database with which we worked in the referenced previous articles, please consider taking the preparation steps provided in Dimensional Model Components: Dimensions Part I before continuing, and prospectively saving the objects with which you work, so as to avoid the need to repeat the preparation process we have already undertaken for subsequent related articles within this subseries.

1.  Click Start.

2.  Navigate to, and click, the SQL Server Business Intelligence Development Studio, as appropriate.

We briefly see a splash page that lists the components installed on the PC, and then Visual Studio .NET 2005 opens at the Start page.

3.  Close the Start page, if desired.

4.  Select File -à Open from the main menu.

5.  Click Analysis Services Database ... from the cascading menu, as shown in Illustration 1.

Opening the Analysis Services Database ...
Illustration 1: Opening the Analysis Services Database ...

The Connect to Database dialog appears.

6.  Ensuring that the Connect to existing database radio button atop the dialog is selected, type the Analysis Server name into the Server input box (also near the top of the dialog).

7.  Using the selector just beneath, labeled Database, select ANSYS065_Basic AS DB, as depicted in Illustration 2.

Selecting the Basic Analysis Services Database
Illustration 2: Selecting the Basic Analysis Services Database ...

8.  Leaving other settings on the dialog at default, click OK.

SQL Server Business Intelligence Development Studio briefly reads the database from the Analysis Server, and then we see the Solution Explorer populated with the database objects. Having overviewed attribute discretization in this and the previous article, we will now get some hands-on exposure to the use of Automatic attribute discretization for the members of a representative dimension attribute within our practice UDM.

Procedure: Employ the Clusters Discretization Method for Attribute Members in Analysis Services

When we browse a cube, we typically dimension the members of one attribute hierarchy by the members of another attribute hierarchy. For example, we might group customer sales by product purchased, by customer geography (state, province or country), or by customer gender. However, with certain types of attributes, it is useful to have Analysis Services automatically create groupings of attribute members, particularly when large numbers of contiguous values are involved, based upon the distribution of the members within an attribute hierarchy.

As an example, let’s assume that we have been approached by representatives of our hypothetical client, the Adventure Works organization. These representatives, members of the Human Resources department of the business, tell us that one of the existing attributes of the Employee dimension, Sick Leave Hours, does not serve them well in browses and reports, as the many different values are simply listed, the members being derived from the unique Sick Leave Hours values in the SickLeaveHours column of the DimEmployee table that underlies the Analysis Services layer of the business intelligence solution. (The developer of the cube, who has suddenly, and without warning, returned to his home country, is no longer available to assist the HR department in modifying the Sick Leave Hours presentation into a more useful format.)

We listen to the description of the problem, and then suggest grouping the Sick Leave Hours information via the pre-defined Clusters discretization method offered within Analysis Services. As is often the case, we suggest this as a starter approach, so that our client colleagues can see how grouping the values in this manner might make the data more meaningful in browsing / reporting. We tell them that, once we do this, information consumers who browse the Sick Leave Hours attribute hierarchy will see the names / values of the groups instead of the members themselves. This limits the number of levels that are presented to users, which can be less confusing, and more useful for analysis. (We inform the client representatives, too, that other methods of discretization can be leveraged if the Clusters method proves less than ideal – we are simply suggesting the Clusters method as a good starting point to illustrate an approach to meeting the end objective – an approach that we can easily “tweak” once our colleagues understand the general concept of discretization, and can then make intelligent choices with regard to specific nuances that they might deem desirable.

Browse the Existing, Ungrouped Members within the Attribute Hierarchy

We will begin our practice with the Clusters discretization method within the Sick Leave Hours attribute hierarchy of the Employee dimension.

1.  Within the Solution Explorer, right-click the Employee dimension (expand the Dimensions folder as necessary).

2.  Click Open on the context menu that appears, as shown in Illustration 3.

Opening the Dimension via the Dimension Designer
Illustration 3: Opening the Dimension via the Dimension Designer ...

The tabs of the Dimension Designer open. Let’s look at the current state of the attribute under consideration, Sick Leave Hours.

3.  Click the Browser tab.

4.  Select Sick Leave Hours within the dropdown Hierarchy selector atop the Browse tab, as depicted in Illustration 4.

Select the Sick Leave Hours Hierarchy to Browse
Illustration 4: Select the Sick Leave Hours Hierarchy to Browse ...

5.  Expand the Employees level by clicking the “+” sign that appears to its immediate left.

The attributes belonging to the Sick Leave Hours attribute hierarchy appear as partially shown in Illustration 5.

The Member Attributes, Sick Leave Hours Attribute Hierarchy of the Employee Dimension
Illustration 5: The Member Attributes, Sick Leave Hours Attribute Hierarchy of the Employee Dimension

We note that many contiguous values appear. It is easy to see how grouping these values into logical “buckets” might make the data easier for information consumers to analyze.

Add Clusters Discretization within the Attribute Hierarchy

We will enact the Clusters discretization method from the Dimension Structure tab of the Employee dimension.

1.  Click the Dimension Structure tab.

2.  Within the Attributes pane, select Sick Leave Hours (the sixth attribute above the bottom of the pane).

The Properties window for the attribute appears (by default in the bottom right corner of the design environment), as partially depicted in Illustration 6.

The Properties Window for the Sick Leave Hours Attribute
Illustration 6: The Properties Window for the Sick Leave Hours Attribute

3.  In the Properties window, click the setting box to the right of the DiscretizationMethod property (which currently contains the word “None”), to enable the selector on the right edge of the box.

4.  Select Clusters within the selector, as shown in Illustration 7.

Changing the DiscretizationMethod Property Setting to “Clusters” ...
Illustration 7: Changing the DiscretizationMethod Property Setting to “Clusters” ...

The DiscretizationMethod property specifies the method used to group the members of the owner attribute. Once we tell Analysis Services the method of discretization, we can tell it how many “buckets” to create for purposes of grouping the attribute members.

5.  In the setting box to the right of the DiscretizationBucketCount property (which currently contains the “0”), replace the value for the DiscretizationBucketCount property with the number 10.

The DiscretizationBucketCount property specifies the number of buckets in which to discretize members of the attribute to which the property belongs. The default setting is “0”. When we leave the DiscretizationBucketCount property at default, Analysis Services generates the number of groups independently, after sampling the underlying data involved.

The affected portion of the Properties window for the Sick Leave Hours attribute hierarchy appears as depicted (modifications surrounded by the red box) in Illustration 8.

Properties Window for the Sick Leave Hours Attribute Hierarchy, with Modifications
Illustration 8: Properties Window for the Sick Leave Hours Attribute Hierarchy, with Modifications

We are now ready to process the Analysis Services database, and then to examine the results of our handiwork with the browser, as before.

Process the Analysis Services Database to Update Structure for Our Discretization Settings

We will process the Analysis Services database within which we have been working, and then browse the Sick Leave Hours attribute members to confirm that we have provided preliminary answers to the client’s stated requirements.

1.  Right-click the Analysis Services project atop the tree in the Solution Explorer.

2.  Select Process ... from the context menu that appears, as shown in Illustration 9.

Select Process ... to Process the Analysis Services Project
Illustration 9: Select Process ... to Process the Analysis Services Project

3.  Click Yes on the dialog telling us that the server content appears out of date, and asking if we would like to build and deploy the project first, which appears as depicted in Illustration 10.

Click Yes to Build and Deploy the Project First ...
Illustration 10: Click Yes to Build and Deploy the Project First ...

Information updates on the server, and then the Process Database dialog appears, as shown in Illustration 11.

The Process Database Dialog Appears ...
Illustration 11: The Process Database Dialog Appears ...

4.  Click the Run button on the dialog.

The Process Progress viewer appears, and generates periodic status updates for various processing events. When processing is complete, we see a Process succeeded message appear in the Status bar in the lower part of the viewer, as depicted in Illustration 12.

Process Succeeded Massage Appears ...
Illustration 12: Process Succeeded Massage Appears ...

5.  Click the Close button on the Process Progress viewer to dismiss the viewer.

6.  Click the Close button on the Process Database dialog to dismiss the dialog.

We are now ready to return to the Dimension Designer browser to examine the results of our handiwork.

Browse the Newly Discretized Attribute with the Dimension Browser

1.  Click the Browser tab in the Dimension Designer.

2.  Ensure that Sick Leave Hours remains selected in the Hierarchy selector atop the Browser tab.

3.  Click the Reconnect button atop the tab, as shown in Illustration 13.

Partial Browser View - before Reconnecting
Illustration 13: Partial Browser View - before Reconnecting

The browser details update, and (assuming the All Employees level remains expanded in the browser), we see ten groups appear, as depicted in Illustration 14.

The Discretized Attribute Member’s Groups
Illustration 14: The Discretized Attribute Member’s Groups

We see the groups appear, as expected. Ten groups have been created, based upon the algorithm selected by the Clusters discretization method.

Browse the Newly Discretized Attribute with the Cube Browser

Let’s go one step further and examine the results of our selection of the Clusters discretization method from another practical perspective, that of the cube browser. This will give us an appreciation for the improvements seen by the information consumers in querying / analyzing from the affected data.

1.  Within the Solution Explorer, once again, right-click the Basic cube (expand the Cubes folder as necessary).

2.  Click Open on the context menu that appears, as shown in Illustration 15.

Opening the Cube via the Cube Designer ...
Illustration 15: Opening the Cube via the Cube Designer ...

The tabs of the Cube Designer open and we arrive, by default, at the Cube Structure tab.

3.  Click the Browser tab.

4.  Click the Reconnect button atop the tab.

5.  In the Metadata pane, expand the Employee dimension by clicking the “+” sign to its immediate left.

6.  Expand the newly exposed Organization folder.

7.  Right-click the Sick Leave Hours attribute within the expanded Organization folder.

8.  Select Add to Row Area from the context menu that appears, as depicted in Illustration 16.

Adding the Sick Leave Hours Attribute to the Browser Row Area ...
Illustration 16: Adding the Sick Leave Hours Attribute to the Browser Row Area ...

We see all ten Sick Leave Hours buckets appear in the rows of the browser pane.

9.  Click and drag the Employee Name attribute to the immediate right of the physical column containing the newly placed Sick Leave Hours buckets (a line will form at the drop point), juxtaposing the Employee Names on rows to the immediate right of the Sick Leave Hours, as shown in Illustration 17.

Juxtaposing the Employee Name alongside the Sick Leave Hours in Rows
Illustration 16: Juxtaposing the Employee Name alongside the Sick Leave Hours in Rows

All ten Sick Leave Hours buckets continue to appear in the rows of the browser pane – with “+” sign “expand” buttons appearing to the immediate left of the bucket labels.

10.  Expand the 68-80 Sick Leave Hours buckets by clicking the “+” sign to its immediate left of each label.

The Sick Leave Hours bucket expands, revealing lists of the employee members whose total Sick Leave Hours place them within the respective buckets in which they appear, as partially depicted in Illustration 18.

Select Sick Leave Hours Buckets, Expanded to Show Membership
Illustration 18: Select Sick Leave Hours Buckets, Expanded to Show Membership

In what is but one example of how we can use the Sick Leave Hours buckets, we can see lists of employees that have total Sick Leave Hours on the books corresponding to each of the ten buckets we have created via Clusters discretization. We tell our client colleagues that they might use this arrangement to do far more than present lists of the members of the various strata. They might also flesh out the browser with other dimension members (such as Calendar Years, etc.), drop in various measures (such as Reseller Sales and the like), and perform analysis (as another simple example) upon employee balances of “unused sick leave” when viewed within the perspective of the sales figures attributed to those employees, and so forth.

Having demonstrated the potential effects that we can achieve using Clusters discretization, we turn the development environment over to the client representatives with which we have worked. Our colleagues express satisfaction with our efforts, and state that they grasp the concepts adequately to apply Clusters discretization to other contiguous attributes within their cubes.

11.  Experiment further within the browser, as desired.

12.  Select File -à Exit to leave the design environment, when ready, and to close the Business Intelligence Development Studio.

Conclusion

In this article, we continued our exploration of attributes in Analysis Services, this time with the objective of introducing, and gaining some hands-on exposure to setting up, one of the multiple pre-defined discretization methods supported within the Analysis Services UDM. We first discussed the options that are available, and then chose to work with Clusters discretization in the sample cube, to meet the business requirements of a hypothetical client. (We noted that, in individual articles designed specifically for the purpose, we will examine the setup of other discretization options, in a manner similar to the one we took here, gaining hands-on exposure to the use of those options in individual practice scenarios.)

Our examination included a brief, general review of attribute discretization in Analysis Services, potential benefits that accrue from discretization in our UDMs, and how the process can help us to meet the primary objectives of business intelligence. We performed an overview of the multiple pre-defined discretization processes supported within the Analysis Services UDM. We then began our practice session with an inspection, via the browser in the Dimension Designer, of the contiguous members of a select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement.

Next, we enabled the Clusters discretization method within the dimension attribute Properties pane. We then reprocessed the sample cube with which we were working to enact the new Clusters discretization of the select attribute members. Finally, we performed further inspections, via the Dimension Designer and Cube Designer browsers, of the members of the attribute hierarchy involved in the request for assistance by our hypothetical client, noting the new, more intuitive grouping established by the newly enacted Clusters discretization method.

About the MSSQL Server Analysis Services Series

This article is a member of the series Introduction to MSSQL Server Analysis Services. The series is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this series.

» See All Articles by Columnist William E. Pearson, III

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved