Join BI Architect Bill Pearson as he continues his overview of Attribute
Discretization, within his extended examination of the dimensional model lying at the heart of the integrated Microsoft Business Intelligence solution. In this article, we begin a hands-on overview of the use of the Automatic discretization method.
This article continues the overview of Attribute
Discretization in Analysis
Services begun in Introduction
to Attribute Discretization. Both this article and its predecessor extend the
examination of the
dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. After taking
up various additional components of the dimensional model in subsequent
articles, we performed hands-on exploration of the general characteristics and
purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V. We then fixed our focus upon
the properties underlying attributes, extending our overview into attribute
member Keys, Names and Values within several subsequent articles.
article continues the focus upon attribute discretization, a capability in
Analysis Services that allows us to group members of an attribute
into a number of member groups. Our concentration here will be to get
some exposure to the pre-defined Automatic discretization method, one of
three such pre-defined methods supported by Analysis Services, through hands-on
application of the method to a representative dimension attribute within our
Note: For more information about my Introduction to
MSSQL Server Analysis Services column in general, see the section
entitled About the MSSQL Server Analysis Services Series that follows the
conclusion of this article.
In Introduction to Attribute Discretization, I
summarized the articles preceding it within the current subseries, surrounding a
general introduction to the dimensional model. I noted the wide acceptance
of the dimensional model as the preferred structure for presenting quantitative
and other organizational data to information consumers. The articles of the
series then undertook an examination of dimensions, the analytical
perspectives upon which the dimensional model relies in meeting the primary
objectives of business intelligence, including its capacity to support:
presentation of relevant and accurate information representing business
operations and events;
the rapid and
accurate return of query results;
dice query creation and modification;
wherein information consumers can pose questions quickly and easily, and
achieve rapid results datasets.
extended our examination of dimensions into a couple of detailed
articles. These articles, Dimensional Model Components: Dimensions Parts I and
II, emphasized that dimensions,
which represent the perspectives of a business or other operation, and
reflect the intuitive ways that information consumers need to query and view
data, form the foundation of the dimensional model. We noted that each dimension
within our model contains one or more hierarchies. (As we learn in other
articles of this series, two types of hierarchies exist within Analysis
Services: attribute hierarchies and user - sometimes called multi-level - hierarchies.)
next introduced dimension attributes within the subseries, and conducted an
extensive overview of their nature, properties, and detailed settings in Dimensional Attributes: Introduction and Overview Parts
through V. We noted that attributes help us to define with specificity
what dimensions cannot define by themselves. Moreover, we learned that attributes
are collected within a database dimension, where we can access them to help us to
specify the coordinates required to define cube space.
the current subseries, I have emphasized that dimensions and dimension attributes
should support the way that management and information consumers of a given
organization describe the events and results of the business operations of the
entity. Because we maintain dimension and related attribute information within
the database underlying our Analysis Services implementation, we can support
business intelligence for our clients and employers even when these details are
not captured within the system where transaction processing takes place.
Within the analysis and reporting capabilities we supply in this manner, dimensions
and attributes are useful for aggregation, filtering, labeling, and other
Having covered the general characteristics and purposes of attributes
in Dimensional Attributes: Introduction
and Overview Parts I through V,
we fixed our focus upon the properties underlying them, based upon the
examination of representative attributes within our sample cube. We then
continued our extended examination of attributes to yet another important
component we had touched upon earlier, the attribute member Key, with which we
gained some hands-on exposure in practice sessions that followed our coverage
of the concepts. In Attribute Member Keys Pt I:
Introduction and Simple Keys and Attribute Member Keys Pt II:
Composite Keys, we explored the
concepts of simple and composite keys, narrowing our examination in Part I
to the former, where we reviewed the Properties associated with a simple key,
based upon the examination of a representative dimension attribute within our
sample UDM. In Part II, we revisited the differences between simple and
composite keys, and explained in more detail why composite keys are sometimes
required to uniquely identify attribute members. We then reviewed the properties
associated with a composite key, based upon the examination of another representative
dimension attribute within our sample UDM.
Member Names, we examined the attribute member Name property,
which we had briefly introduced in Dimensional
Attributes: Introduction and Overview Part V. We shed some light on
how attribute member Name might most appropriately be used without degrading
system performance or creating other unexpected or undesirable results. We then
examined the sister attribute member Value property (which we introduced
along with attribute member Name in Dimensional
Attributes: Introduction and Overview Part V) in Attribute
Member Values in Analysis Services. As we did in our overview of attribute
member Name, we examined the details of Value. Our concentration was also
similarly upon its appropriate use in providing support for the selection and
delivery of enterprise data in a more focused and consumer-friendly manner,
without the unwanted effects of system performance degradation, and other
unexpected or undesirable results, that can accompany the uninformed use of the
In Introduction to Attribute Relationships in MSSQL Server
Analysis Services, we examined yet another part of the conceptual model, Attribute
Relationships. In this introduction, we discussed several best practices and
design, and other, considerations involved in their use, with a focus upon the
general exploitation of attribute relationships in providing support, once
again, for the selection and delivery of enterprise data. In the subsequent
two related articles, Attribute
Relationships: Settings and Properties
Exposure to Settings and Properties in Analysis Services Attribute
Relationships, we examined attribute relationships in a manner
similar to previous articles within this subseries, concentrating in detail
upon the properties that underlay them.
Finally, in Introduction
to Attribute Discretization, we introduced a capability in Analysis
Services to which we refer as attribute discretization - that allows
us to group members of an attribute into a number of member groups. We discussed design, and other, considerations
involved in the discretization of attributes, and touched upon best practices surrounding
the use of this capability. Our focus was upon the general exploitation of discretization
in providing support for the selection and delivery of enterprise data.
In this article, we will gain some hands-on exposure to
setting up one of the multiple pre-defined discretization processes supported
within the Analysis Services UDM. We will first discuss the options that are
available, and then work with Automatic discretization
in the sample cube. (In individual articles designed specifically for the
purpose, we will examine the setup of other discretization options, in a manner
similar to previous articles within this subseries, gaining hand-on exposure to
the use of those options in individual practice scenarios.)
Our examination will include:
A brief review
of attribute discretization in Analysis Services, potential benefits that
accrue from discretization in our UDMs, and how the process can help us to meet
the primary objectives of business intelligence.
overview of the multiple pre-defined discretization processes supported within
the Analysis Services UDM.
via the browser in the Dimension Designer, of the pre-existing members of a
select attribute hierarchy, noting the absence of grouping and discussing
shortcomings of this default arrangement.
the Automatic discretization method within the dimension attribute Properties
the cube to enact the new Automatic discretization of the select attribute
examination, via the browsers in both the Dimension Designer and the Cube
Designer, of the members of a select attribute hierarchy, noting the new, more
intuitive grouping established by the newly enacted Automatic discretization method.
references to subsequent articles within our series, where we will perform
detailed examinations surrounding other discretization methods supported within
the Analysis Services UDM.
Using the Automatic Attribute Discretization Method
As we learned in Introduction to Attribute Discretization, whenever
we work with attributes, we can expect to encounter two general types of
values, discrete and contiguous. Discrete values stand apart
distinctly, and have clearly defined logical boundaries between themselves. Citing
the Gender attribute, within the Customer dimension of the Adventure Works
sample UDM, wherein the attribute is considered to have only one of two discrete values, female or male,
we noted that possible values are naturally discrete for the lions share of attributes
occurring in the business world.
In contrast to discrete values, we noted that contiguous
values do not stand apart distinctly, but flow along, as if in a continuous
line. Moreover, we discussed the fact that contiguous values, especially within
large populations, can have very large numbers of possible values, and that information
consumers can find it difficult to work effectively and efficiently within such
wide ranges of values. As an example, we cited the Vacation Hours attribute, within
the Employee dimension of the Adventure Works sample UDM, an attribute which
could have a wide range of possible values, depending upon how many employees
are involved, whether there are limits on how many vacation days they can
accumulate, and considerations of this nature. (The member values are based
directly upon the values contained within the VacationHours column of the
DimEmployee table - with many of the values shared among multiple employees.)
The sheer number of values might make working with them cumbersome for
information consumers, if they are simply made available in their existing
As we discussed in Introduction to Attribute Discretization, discretization
can help us to make it easier for information consumers to work with large
numbers of possible attribute member values. As discretization creates a
manageable number of groups of attribute values that are clearly separated by
boundaries, we can thereby group contiguous values into sets of discrete
values, via a system-generated collection of consecutive dimension members
known as member groups.
We also discussed, in Introduction to Attribute
Discretization, that once the discretization process groups the attribute members into
the member groups, the member groups are then housed within a level within the dimensional
hierarchy. (A given level within a dimensional hierarchy can contain either members
or member groups, but not both.) When information consumers browse a level
that contains member groups, they see the names and cell values of the member
groups. The members generated by Analysis Services to support member groups are
called grouping members, and they look like ordinary members.
Analysis Services affords us flexibility in methods of attribute
discretization, based upon algorithms of varying complexity. The different
methods of discretization all have the same function to group contiguous
values into sets of discrete values. They simply manage grouping via different
approaches. Beyond the Dimension Designer, Analysis Services also
supports user-defined discretization, via data definition language (DDL),
should the out of the box approaches not meet the business needs of our
local environments. Moreover, we can alternatively implement custom discretization
via the underlying data warehouse, using views at the relational level; named
calculations in the data source view; calculated members in Analysis Services;
or via other approaches.
Analysis Services supports four DiscretizationMethod
property settings, which include three pre-defined discretization methods:
None (default): Analysis Services
performs no grouping, and simply displays the attribute members.
Automatic: Analysis Services selects the
method that best represents the data: either the EqualAreas method or the Clusters
EqualAreas: Analysis Services attempts to
divide the members in the attribute into groups that contain an equal number of
Clusters: Analysis Services attempts to
divide the members in the attribute into groups that contain an equal number of
members. (Per the Books Online, this method is useful because it works
on any distribution curve, but is more expensive in terms of processing time.)
The DiscretizationMethod property determines
whether Analysis Services is to create groupings, and then determines the type of
grouping that is performed. As we learned in Introduction
to Attribute Discretization, Analysis Services does not perform any
groupings by default (the default setting for the DiscretizationMethod property
is None). When we enable Automatic grouping, as we shall see in the practice
session that follows, we direct Analysis Services to automatically determine
the best grouping method based upon the structure of the attribute for which we
are performing discretization. We then process the affected dimension / cube and
Analysis Services creates group ranges, and then distributes the total
population of attribute members appropriately across those groups. As we shall
see, once we specify a grouping method, we next specify the number of groups,
by using the DiscretizationBucketCount property (its default value is
have noted throughout my MSSQL Server Analysis Services column, as well
as throughout my other Database Journal series, one of the most
important objectives in building a high performance Analysis Services solution
is an efficient and effective dimension design. The identification of
opportunities where we can effectively use attribute discretization, and the
effective design and placement of the member groups that we generate thereby,
can mean the provision of a much more consumer-friendly interface for our
clients and employers. We
will gain hands - on exposure to the Automatic method of attribute discretization in the
practice session that follows. Before we get started working within a sample cube clone, we will
need to prepare the local environment for the practice session. We will take
steps to accomplish this within the section that follows.
Preparation: Locate and Open the Sample
Basic UDM Created Earlier
In Dimensional Model
Components: Dimensions Part I, we created a sample basic Analysis Services database within which to
perform the steps of the practice sessions we set out to undertake in the
various articles of this subseries. Once we had ascertained that the new
practice database appeared to be in place, and once we had renamed it to ANSYS065_Basic
AS DB, we began our examination of dimension properties. We continued with our
examination of attributes within the same practice environment, which we will
now access (as we did within the earlier articles of this subseries)) by taking
the following steps within the SQL Server Business Intelligence Development
NOTE: Please access the Analysis
Services database which we prepared in Dimensional Model Components: Dimensions Part I (and have used in subsequent
articles) before proceeding with this article. If you have not completed the
preparation to which I refer, or if you cannot locate / access the Analysis
Services database with which we worked in the referenced previous articles,
please consider taking the preparation steps provided in Dimensional Model Components: Dimensions Part I before continuing, and
prospectively saving the objects with which you work, so as to avoid the need
to repeat the preparation process we have already undertaken for subsequent
related articles within this subseries.
and click, the SQL
Server Business Intelligence Development Studio, as appropriate.
briefly see a splash page that lists the components installed on the PC, and
then Visual Studio .NET 2005 opens at the Start page.
Close the Start
page, if desired.
Select File -> Open from the main menu.
Services Database ... from the cascading menu, as shown in Illustration 1.
Illustration 1: Opening the Analysis Services Database ...
to Database dialog appears.
the Connect to existing database radio button atop the dialog is
selected, type the Analysis Server name into the Server input box (also near
the top of the dialog).
selector just beneath, labeled Database, select ANSYS065_Basic AS DB, as
depicted in Illustration
Illustration 2: Selecting the Basic Analysis Services Database ...
settings on the dialog at default, click OK.
Server Business Intelligence Development Studio briefly reads the database from
the Analysis Server, and then we see the Solution Explorer populated with the
database objects. Having overviewed attribute discretization in this and the
previous article, we will now get some hands-on exposure to the use of Automatic
attribute discretization for the members of a representative dimension attribute
within our practice UDM.
Procedure: Employ the Automatic Discretization
Method for Attribute Members in Analysis Services
When we browse a cube, we typically dimension the members
of one attribute hierarchy by the members of another attribute hierarchy. For
example, we might group customer sales by product purchased, by customer geography
(state, province or country), or by customer gender. However, with certain
types of attributes, it is useful to have Analysis Services automatically
create groupings of attribute members, particularly when large numbers of contiguous
values are involved, based upon the distribution of the members within an
As an example, lets assume that we have been approached
by representatives of our hypothetical client, the Adventure Works
organization. These representatives, members of the Human Resources department
of the business, tell us that one of the existing attributes of the Employee dimension,
Vacation Hours, does not serve them well in browses and reports, as the many
different values are simply listed, the members being derived from the unique Vacation
Hours values in the VacationHours column of the DimEmployee table that
underlies the Analysis Services layer of the business intelligence solution. (The
developer of the cube, having returned to his home country, is no longer
available to assist the HR department in modifying the Vacation Hours
presentation into a more useful format.)
We listen to the description of the problem, and then
suggest grouping the Vacation Hours information via the pre-defined Automatic
discretization method offered within Analysis Services. We suggest this as a
starter approach, so that our client colleagues can see how grouping the values
in this manner might make the data more meaningful in browsing / reporting. We
tell them that, once we do this, information consumers who browse the Vacation
Hours attribute hierarchy will see the names / values of the groups instead of
the members themselves. This limits the number of levels that are presented to
users, which can be less confusing, and more useful for analysis. (We inform
the client representatives, too, that other methods of discretization can be
leveraged if the Automatic method proves less than ideal we are simply
suggesting the Automatic method as a good starting point to illustrate an approach
to meeting the end objective an approach that we can easily tweak once our
colleagues understand the general concept of discretization, and can then make
intelligent choices with regard to specific nuances that they might deem desirable.
Browse the Existing, Ungrouped Members
within the Attribute Hierarchy
will begin our practice with the Automatic discretization method within the Vacation
Hours attribute hierarchy of the Employee dimension.
Within the Solution
Explorer, right-click the Employee dimension (expand the Dimensions
folder as necessary).
Click Open on
the context menu that appears, as shown in Illustration 3.
Illustration 3: Opening the Dimension via the Dimension Designer ...
of the Dimension Designer open. Lets look at the current state of the
attribute under consideration, Vacation Hours.
Click the Browser
Hours within the dropdown Hierarchy selector atop the Browse tab, as depicted in Illustration 4.
Illustration 4: Select the Vacation Hours Hierarchy to Browse ...
The attributes belonging to the Vacation Hours attribute hierarchy appear as partially shown in Illustration
Illustration 5: The Member Attributes, Vacation Hours Attribute Hierarchy of the Employee Dimension
note that many contiguous values appear. It is easy to see how grouping these
values into logical buckets might make the data easier for information
consumers to analyze.
Add Automatic Discretization within the
will enact the Automatic discretization method from the Dimension Structure tab
of the Employee dimension.
Click the Dimension
Within the Attributes
pane, select Vacation Hours (the bottom attribute hierarchy in the pane).
The Properties window for the attribute appears (by default
in the bottom right corner of the design environment), as depicted in Illustration 6.
Illustration 6: The Properties Window for the Vacation Hours Attribute
3. In the Properties window, click
the setting box to the right of the DiscretizationMethod property (which
currently contains the word None), to enable the selector on the right edge
of the box.
4. Select Automatic within the
selector, as shown in Illustration 7.
Illustration 7: Changing the DiscretizationMethod Property Setting to Automatic ...
The DiscretizationMethod property specifies the method
used to group the members of the owner attribute. Once we tell Analysis Services
the method of discretization, we can tell it how many buckets to create for
purposes of grouping the attribute members.
5. In the setting box to the right of
the DiscretizationBucketCount property (which currently contains the 0), replace the value for the DiscretizationBucketCount property with
the number 10.
The DiscretizationBucketCount property specifies the
number of buckets in which to discretize members of the attribute to which the
property belongs. The default setting is 0. When we leave the DiscretizationBucketCount property at default, Analysis
Services generates the number of groups independently, after sampling the underlying
The affected portion of
the Properties window for the Vacation Hours attribute hierarchy appears as depicted
(modifications surrounded by the red box) in Illustration 8.
Illustration 8: Properties Window for the Vacation Hours Attribute Hierarchy, with Modifications
We are now ready to process the Analysis Services database,
and then to examine the results of our handiwork with the browser, as before.
Process the Analysis Services Database
to Update Structure for Our Discretization Settings
We will process the Analysis Services database within
which we have been working, and then browse the Vacation Hours attribute
members to confirm that we have provided preliminary answers to the clients
the Analysis Services project atop the tree in the Solution Explorer.
... from the context menu that appears, as shown in Illustration 9.
Illustration 9: Select Process ... to Process the Analysis Services Project
Click Yes on
the dialog asking if you would like to save changes, which appears as depicted
in Illustration 10.
Illustration 10: Click Yes to Save All Changes before Processing
Information updates on the server, and then the Process
Database dialog appears, as shown in Illustration 11.
Illustration 11: The Process Database Dialog Appears ...
Click the Run button
on the dialog.
The Process Progress viewer
appears, and generates periodic status updates for various processing events.
When processing is complete, we see a Process succeeded message appear in the Status
bar in the lower part of the viewer, as depicted in Illustration 12.
Illustration 12: Process Succeeded Massage Appears ...
Click the Close
button on the Process Progress viewer to dismiss the viewer.
Click the Close
button on the Process Database dialog to dismiss the dialog.
We are now ready to return to the Dimension Designer
browser to examine the results of our handiwork.
Browse the Newly Discretized Attribute
with the Dimension Browser
Click the Browser
tab in the Dimension Designer.
Ensure that Vacation
Hours remains selected in the Hierarchy selector atop the Browser tab.
Click the Reconnect
button atop the tab, as shown in Illustration 13.
Illustration 13: Partial Browser View - before Reconnecting
The browser details update, and (assuming the All
Employees level remains expanded in the browser), we see ten groups appear, as
depicted in Illustration 14.
Illustration 14: The Discretized Attribute Members Groups
the groups appear, as expected. Ten groups have been created, based upon the algorithm
selected by the Automatic discretization method.
Browse the Newly Discretized Attribute
with the Cube Browser
one step further and examine the results of our selection of the Automatic discretization method
from another practical perspective, that of the cube browser. This will give
us an appreciation for the improvements seen by the information consumers in
querying / analyzing from the affected data.
Within the Solution
Explorer, once again, right-click the Basic cube (expand the Cubes
folder as necessary).
Click Open on
the context menu that appears, as shown in Illustration 15.
Illustration 15: Opening the Dimension via the Dimension Designer ...
tabs of the Cube Designer open, and we arrive, by default, at the Cube
Click the Browser
Click the Reconnect
button atop the tab.
In the Metadata
pane, expand the Employee dimension by clicking the + sign to its immediate
newly exposed Organization folder.
the Vacation Hours attribute within the expanded Organization folder.
Select Add to
Row Area from the context menu that appears, as depicted in Illustration 16.
Illustration 16: Adding the Vacation Hours Attribute to the Browser Row Area ...
all ten Vacation Hours buckets appear in the rows of the browser pane.
Click and drag
the Employee Name attribute to the immediate right of the physical column
containing the newly placed Vacation Hours buckets (a line will form at the
drop point), juxtaposing the Employee Names on rows to the immediate right of
the Vacation Hours, as shown in Illustration 17.
Illustration 16: Juxtaposing the Employee Name alongside the Vacation Hours in Rows ...
All ten Vacation
Hours buckets continue to appear in the rows of the browser pane with +
sign expand buttons appearing to the immediate left of the bucket labels.
Expand the 70-79
and the 90-99 Vacation Hours buckets by clicking the + sign to its immediate
left of each label.
The two Vacation Hours buckets expand, revealing lists of
the employee members whose total Vacation Hours place them within the
respective buckets in which they appear, as partially depicted in Illustration 18.
Illustration 18: Select Vacation Hours Buckets, Expanded to Show Membership
In what is but one example of how we can use the Vacation
Hours buckets, we can see lists of employees that have Vacation Hours on the
books for each of the ten buckets we have created via Automatic discretization.
We tell our client colleagues that they might use this arrangement to do far
more than present lists of the members of the various strata. They might also
flesh out the browser with other dimension members (such as Calendar Years,
etc.), drop in various measures (such as Reseller Sales and the like), and
perform analysis (as another simple example) upon employee balances of unused
vacation when viewed within the perspective of the sales figures attributed to
Having demonstrated the potential effects that we can
achieve using Automatic discretization, we turn the development environment
over to the client representatives with which we have worked. Our colleagues
express satisfaction with our efforts, and state that they grasp the concepts
adequately to apply Automatic discretization to other contiguous attributes
within their cube.
further within the browser, as desired.
Select File -> Exit to leave the design environment, when ready,
and to close the Business
Intelligence Development Studio.
In this article, we continued our exploration of attributes
in Analysis Services, this time with the objective of introducing, and gaining
some hands-on exposure to setting up, one of the multiple pre-defined discretization
methods supported within the Analysis Services UDM. We first discussed the
options that are available, and then chose to work with
Automatic discretization in the sample cube, to meet the business requirements
of a hypothetical client. (We noted that, in individual articles designed
specifically for the purpose, we will examine the setup of other discretization
options, in a manner similar to the one we took here, gaining hands-on exposure
to the use of those options in individual practice scenarios.)
Our examination included a brief,
general review of attribute discretization in Analysis Services, potential
benefits that accrue from discretization in our UDMs, and how the process can
help us to meet the primary objectives of business intelligence. We performed an overview of the multiple pre-defined discretization
processes supported within the Analysis Services UDM. We then began our
practice session with an inspection, via the browser in the Dimension Designer,
of the contiguous members of a select attribute hierarchy, noting the absence
of grouping and discussing shortcomings of this default arrangement.
Next, we enabled the Automatic discretization method
within the dimension attribute Properties pane. We then reprocessed the sample
cube with which we were working to enact the new Automatic discretization of
the select attribute members. Finally, we performed further inspections, via
the Dimension Designer and Cube Designer browsers, of the members of the attribute
hierarchy involved in the request for assistance by our hypothetical client,
noting the new, more intuitive grouping established by the newly enacted Automatic
About the MSSQL Server Analysis Services Series
is a member of the series
Introduction to MSSQL
Server Analysis Services.
The series is designed to provide hands-on application of the fundamentals of MS
SQL Server Analysis Services (Analysis Services), with each installment
progressively presenting features and techniques designed to meet specific
real-world needs. For more information on the series, please see my initial
article, Creating Our First Cube. For the software components, samples and tools
needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this
See All Articles by Columnist William E. Pearson, III