Join BI Architect Bill Pearson as he introduces Attribute
Relationships into his extended examination of the dimensional model within the
integrated Microsoft Business Intelligence solution.
This article introduces Attribute Relationships in Analysis Services, and extends the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. Here, we will focus upon the purpose of attribute relationships, which define the possible associations between attributes. Moreover, we will discuss why these relationships are important, and how they allow us to define the properties of association that a given attribute has with other attributes.
Note: For more information about my Introduction to MSSQL Server Analysis Services column in general, see the section entitled About the MSSQL Server
Analysis Services Series that follows the conclusion of this article.
Introduction
In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. We then began our examination of dimensions, the analytical perspectives upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:
- the presentation of relevant and accurate information representing business operations and events;
- the rapid and accurate return of query results;
- slice and dice query creation and modification;
- an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.
We learned, in Dimensional Model Components: Dimensions Parts I and II, that dimensions form the foundation of the dimensional model. They represent the perspectives of a business or other operation, and reflect the intuitive ways that information consumers need to query and view data. We noted that we might consider dimensions as nouns that take part in, are acted upon by, or are otherwise associated with, the verbs (or actions / transactions undertaken by the business) that are represented by the facts or measures contained within our business intelligence systems.
We discovered in earlier articles that, within the Analysis Services model, database dimensions underlie all other dimensions, whose added properties distinguish them from the database dimensions they reference, within the model. Each dimension within our model contains one or more hierarchies. As we learn in other articles of this series, two types of hierarchies exist within Analysis Services: attribute hierarchies and user (sometimes called multi-level) hierarchies. For purposes of this article, the term attribute means the same thing as attribute hierarchy. (We examine user hierarchies, to which we will simply refer as hierarchies, in other articles specifically devoted to the topic.)
To extend the metaphor we used earlier in describing dimensions as nouns and measures as verbs, we might consider attributes as somewhat similar to adjectives. That is, attributes help us to define with specificity what dimensions cannot define by themselves. Dimensions alone are like lines in geometry: they don't define area within multidimensional space, nor do they themselves even define the hierarchies that they contain. A database dimension is a collection of related objects called attributes, which we use to specify the coordinates required to define cube space.
Within the table underlying a given dimension (assuming a more-or-less typical star schema database) are individual rows supporting each of the members of the associated dimension. Each row contains the set of attributes that identify, describe, and otherwise define and classify the member upon whose row they reside. For instance, a member of the Patient dimension, within an Analysis Services implementation for a healthcare provider, might contain information such as patient name, patient ID, gender, age group, race, and other attributes. Some of these attributes might relate to each other hierarchically, and, as we shall see in other articles of this subseries (as well as within other of my articles), multiple hierarchies of this sort are common in real-world dimensions.
Dimensions and dimension attributes should support the way that management and information consumers of a given organization describe the events and results of the business operations of the entity. Because we maintain dimension and related attribute information within the database underlying our Analysis Services implementation, we can support business intelligence for our clients and employers even when these details are not captured within the system where transaction processing takes place. Within the analysis and reporting capabilities we supply in this manner, dimensions and attributes are useful for aggregation, filtering, labeling, and other purposes.
Having covered the general characteristics and purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V, we fixed our focus upon the properties underlying them, based upon the examination of a representative attribute within our sample cube. We then continued our extended examination of attributes to yet another important component we had touched upon earlier, the attribute member Key, with which we gained some hands-on exposure in practice sessions that followed our coverage of the concepts. In Attribute Member Keys Pt I: Introduction and Simple Keys and Attribute Member Keys Pt II: Composite Keys, we introduced attribute member Keys in detail, continuing our recent group of articles focusing upon dimensional model components, with an objective of discussing the associated concepts, and of providing hands-on exposure to the properties supporting them.
As a part of our exploration of attribute member Keys, we first discussed the three attribute usage types that we can define within a containing dimension. We then narrowed our focus to the Key attribute usage type (a focus that we developed, as we have noted, throughout Attribute Member Keys Pt I: Introduction and Simple Keys and Attribute Member Keys Pt II: Composite Keys), discussing its role in meeting our business intelligence needs. We next followed with a discussion of the nature and uses of the attribute Key from a technical perspective, including its purpose within a containing dimension within Analysis Services.
In Attribute Member Keys Pt I: Introduction and Simple Keys and Attribute Member Keys Pt II: Composite Keys, we introduced the concepts of simple and composite keys, narrowing our exploration in Part I to the former, where we reviewed the Properties associated with a simple key, based upon the examination of a representative dimension attribute, Geography, within our sample UDM. In Part II, we revisited the differences between simple and composite keys, and explained in more detail why composite keys are sometimes required to uniquely identify attribute members. We then reviewed the properties associated with a composite key, based upon the examination of a representative dimension attribute, Date, also within our sample UDM.
In Attribute Member Names, we examined the attribute member Name property, which we had briefly introduced in Dimensional Attributes: Introduction and Overview Part V. We examined the details of the attribute member Name, and shed some light on how they might most appropriately be used without degrading system performance or creating other unexpected or undesirable results. Finally, we examined the sister attribute member Value property (which we introduced along with attribute member Name in Dimensional Attributes: Introduction and Overview Part V) in Attribute Member Values in Analysis Services. As we did in our overview of attribute member Name, we examined the details of Value. Our concentration was also similarly upon its appropriate use in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results, that can accompany the uninformed use of the property.
In this article, we will examine another part of the conceptual model, Attribute Relationships. In this introduction, we will discuss several best practices and design, and other, considerations involved in their use. Our focus will be upon the general exploitation of attribute relationships in providing support for the selection and delivery of enterprise data. (In our next article, we will examine attribute relationships in a manner similar to previous articles within this subseries, concentrating in detail upon the properties that underlie them.)
Our examination will include:
- A review of the nature of an attribute relationship, its purpose within its containing dimension within Analysis Services, and its role in helping to meet the primary objectives of business intelligence.
- A general overview of attribute relationships, and a discussion of their default configurations, and how we can improve upon those configurations to enhance overall processing performance.
- A discussion of other best practices and general considerations surrounding attribute relationships.
- A look forward to the article that follows within our series, where we will perform a detailed examination of the properties underlying attribute relationships, along with a review of the respective settings associated with each property, based upon a representative dimension attribute within our sample UDM.
Attribute Relationships
As we
have learned, attributes serve as the foundation for our dimensions
and cubes. Each dimension contains a set of attributes,
the organization of which is based upon attribute relationships.
We create attribute relationships as a part of creating a given dimension.
For each table that we include within a dimensional structure, an attribute
relationship is established to associate the key attribute to other attributes
from the table.
Multiple
advantages accrue from the knowledgeable use of attribute relationships,
including:
-
Enhancements
in dimension, partition, and query processing, based upon
a reduction in the total amount of memory required for dimension processing.
-
Enhanced query
performance due to more rapid storage access, as well as more
optimized execution plans within Analysis Services.
-
Enhanced
performance because of the selection of more effective aggregates via
the Analysis Server aggregation design algorithms (assuming that user-defined
hierarchies have been defined along relationship paths).
One of
the most important objectives in building a high performance Analysis
Services solution is an efficient and effective dimension design.
The identification of attribute relationships, and the effective use of hierarchies
(which we examine in other articles of this subseries) represent the two most
important considerations underlying the optimization of dimension
design.
Overview
An Analysis Services dimension typically
has, as an underlying source, a relational data warehouse dimension table. And
within relational data warehouses, each dimension usually contains a primary
key, attributes, and, in some instances, foreign key
relationships to other tables. The attributes provide descriptive
context to the primary key and, from a relational perspective, each attribute
has either a many-to-one relationship with the primary key (this
is, by far, most frequently the case) or a one-to-one relationship with the
primary key. The attribute relationships existing within the
Product dimension of the Adventure Works DW sample database
(available to anyone installing Analysis Services) appear as depicted in
Illustration 1.
Illustration 1:
Attribute Relationships: Product Dimension Primary Key
Some attributes also have relationships
with other attributes. For example, in the Product dimension
we have cited, the Subcategory attribute has a many-to-one relationship
with the Category attribute, as shown in Illustration 2.
Illustration 2:
Attribute Relationships for Subcategory in Adventure Works DW
A grasp of the concept of relationships
within Analysis Services is as important to the OLAP developer /
administrator as an understanding of the dependencies among fields in
relational databases are to their own respective developers and
administrators. To allow it to create useful aggregations, to
accurately and completely aggregate data, and to efficiently store and retrieve
information, the relationships between attributes must make sense to Analysis
Services. This is the purpose of attribute relationships: to
define and describe the relationship between a given pair of attributes.
To optimize our dimension designs, we must first understand how our attributes
are related to each other, and then we must take steps to let Analysis
Services know, via attribute relationships, what those relationships
are.
When we initially create a dimension
and add attributes, Analysis Services automatically defines
the relationship between the primary key attribute and all other attributes
as many - to one. While this default scenario is not inaccurate, on
the surface, it is likely to be suboptimal from the perspective of Analysis
Services performance, as it requires the server to perform extra work to
determine that some combinations do not exist, among other activities. With
the default scenario in place, anytime a query to the data source includes an attribute
within the dimension, Analysis Services will summarize data from
the primary key and then group it by the attribute under
consideration. This means that if we choose to present our data by a given attribute
at one point in time, individual member keys are grouped on the fly by the
same attribute. If we next request summarized data for another attribute,
then the entire on-the-fly grouping is performed once again for the attribute
requested. Unsurprisingly, inefficiencies can mount in cases where the totals
of a given attribute might be easily derived from the totals of another attribute.
An example would be a case where we
request a total for a couple of attributes (such as City and State-Province
within the Customer dimension of the Adventure Works sample cube)
where City rolls up into the State-Province attribute of the
same dimension: leaving the attribute relationships at default of many
- to one would mean that Analysis Services would summarize data from
the primary key (CustomerKey in the underlying DimCustomer
table), and then perform an onthefly grouping by the City
attribute. To deliver the second total for State-Province, Analysis
Services will perform a separate, onthefly grouping for the State-Province
attribute. The processing inefficiencies are obvious, since each State-Provinces total can be derived (more quickly) from the totals of
the Cities underlying it. (The same scenario applies to the rollup
relationship between the State-Province and Country attributes.)
We can see the attribute
relationships within the Customer dimension that we have
discussed in our geographical attribute examples, appearing as depicted in Illustration
3.
Illustration 3: Some Attribute
Relationships for Geographical Attributes within the Customer Dimension in
Adventure Works DW
The default design has other
inherent inefficiencies. Since Analysis Services cannot identify which attribute
combinations naturally exist in the dimension, the server must use the fact
data to identify meaningful member combinations, as we intimated earlier.
As an illustration, at query time, if an information consumer requests data by Country
and State-Province, Analysis Services will be forced to
perform additional processing to ascertain, for example, that the combination
of Country: France and State-Province: Arizona does not exist.
We can turn to the browser for the
Customer dimension within the Adventure Works cube to see the
structure underlying the aforementioned example attributes in our Customer
geography example, as shown in Illustration 4.
Illustration 4: Using
the Browser for the Customer Dimension to See the Results of the Underlying
Attribute Relationships
Here we can see that the Analysis
Server has used the attribute relationships depicted in Illustration
3 to align the appropriate State-Provinces to their
respective Countries, and therefore to make the example we have offered,
Country: France and State-Province: Arizona, a physically invalid
combination.
Even though, in the example above,
Customer Key is no longer directly related to Country and State-Province,
it remains indirectly related to these attributes through chained
attribute relationships. More specifically, Customer Key is
related to Country using the chain of attribute relationships that
link Customer Key to Postal Code, Postal Code to City,
City to State-Province, and State-Province to Country.
Such a chain of attribute relationships is also known as a cascading
attribute relationship.
Cascading attribute
relationships enable better
performance decisions within Analysis Services with regard to several
factors, including (as we have intimated earlier) aggregation design, data
storage, data retrieval, and MDX calculations. Moreover,
attribute relationships also assist us in enforcing dimension
security and in associating measure group data with non-primary key
granularity attributes. (For example, if we have a measure group
that contains Internet and other - Sales data by Customer
Key and forecast data by State-Province, the forecast measure
group will only know how to roll up data from State-Province to Country
if attribute relationships exist only between State-Province and Country.)
As we have noted in Attribute Member Keys Pt I: Introduction and
Simple Keys and Attribute Member Keys Pt II: Composite Keys,
among other articles of this series, the KeyColumns
property consists of a source column or combination of source columns
(known as a collection) that uniquely identifies the members for a given
attribute. Once we define relationships among our
attributes, the importance of the KeyColumns property becomes even more
prominent. We must ensure that the KeyColumns property of each attribute
in the dimension uniquely identifies the respective attribute
member. If the KeyColumns property does not uniquely
identify each member, duplicates encountered during processing (which
are ignored by default), result in incorrect data rollups.
It is important to be
aware, also, that if the attribute relationship has a default type
of Flexible, Analysis Services will not notify us when it
encounters duplicates, and will assign all of the duplicate members
incorrectly. (We will examine the relationship types further within the
hands-on practice section that appears in our next article.)
As we define new attribute relationships,
one important consideration we should keep in mind, from the tandem
perspectives of performance and data correctness, is the removal of any redundant
relationships. As an example, when the relationships shown in Illustration
3 were established, the Customer Key no longer required the
(default) direct relationship to City or State-Province. These attribute
relationships would have best been removed at this point.
NOTE: To help us to identify redundant attribute
relationships, the Business Intelligence Development Studio provides
a warning message, underneath the three panes of the Dimension Structure
tab of the dimension, to alert us to any existing redundancies. The Studio,
however, does not require us to eliminate the redundancy. It is a best
practice to always manually remove each redundant relationship. (Once we
have removed the redundancy, the warning disappears.)
Best Practices and Other Considerations Surrounding Attribute Relationships
Best practices dictate that, in designing attribute relationships, we 1) create the most efficient dimension model that 2) best represents the semantics of the business that it represents. Perhaps the most critical element of success we can seek in doing so is a deep familiarity with the organizations data, together with an extensive understanding of the business requirements that the data must support. Defining attribute relationships incorrectly can cause invalid query results, among other maladies, within our Analysis Services databases.
Anytime the data underlying our Analysis Services UDM support the definition of unique attribute relationships, we should use the Attribute Relationships tab of Dimension Designer to define these unique relationships.
NOTE: We will examine detailed settings that support existing attribute relationships within the sample Adventure Works cube in the hands-on practice section of the article that follows this one.
Any attribute that has an outbound relationship must have a unique key relative to its related attribute. In other words, a member in a source attribute must identify one - and only one - member in a related attribute. For example, consider the relationship, City -> State-Province that we used in examples earlier: in this relationship, the source attribute is City and the related attribute is State. The source attribute is the many side and the related side is the one side of the many-to-one relationship. The key for the source attribute would be City + State. A visual example how this key is established in the Customer dimension of the Adventure Works cube is depicted in Illustration 5.
Illustration 5: Establishing a Unique Key for the City Attribute, Relative to the State-Province Attribute
We noted earlier that attributes within a given dimension are always related either directly or indirectly to the key attribute. We also noted that, when we define a dimension based on a star schema (again, typically a scenario where all dimension attributes are derived from the same relational table), an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. Moreover, when we define a dimension based upon a snowflake schema (a scenario where dimension attributes are derived from multiple related tables), an attribute relationship is automatically defined as follows:
- Between the key attribute and each non-key attribute bound to columns in the main dimension table.
- Between the key attribute and the attribute associated with the foreign key in the secondary table that links the underlying dimension tables.
- Between the attribute associated with the foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.
We noted earlier that we might want to modify these default attribute relationships, and cited but one example. Reasons might include the following:
- To define a natural hierarchy;
- To define a custom sort order; or
- To define dimension granularity based upon a non-key attribute.
Natural Hierarchy Relationships
A hierarchy is known as a natural hierarchy when each attribute included in the user-defined hierarchy has a one-to-many relationship with the attribute immediately below it. For example, consider a Customer dimension (in this case, a slimmed down version of the Customer dimension table that underlies the sample Adventure Works cube), based upon a relational source table with these columns:
- CustomerKey
- CustomerName
- Age
- Gender
- Email
- City
- Country
- Region
The corresponding Analysis Services dimension has seven attributes:
- Customer (based on CustomerKey, with CustomerName supplying member names)
- Age
- Gender
- Email
- City
- Region
- Country
Relationships representing natural hierarchies are enforced by creating an attribute relationship between the attribute for a level, and the attribute for the level there under. This specifies a natural relationship and potential aggregation within Analysis Services. In our slim version of the Customer dimension, a natural hierarchy exists for the Country, Region, City, and Customer attributes. The natural hierarchy for {Country, Region, City, Customer} is described by adding the following attribute relationships:
- The Country attribute has an attribute relationship to the Region attribute.
- The Region attribute has an attribute relationship to the City attribute.
- The City attribute has an attribute relationship to the Customer attribute.
As we see in other articles of this series, we are not limited to working with natural hierarchies in Analysis Services. We can also create navigational structures via user-defined hierarchies (which represent ad hoc, reporting hierarchies) whose bases reside in logical hierarchical relationships versus associations that exist in the data.
For example, we might create a user-defined hierarchy based on {Age, Gender}. The ultimate information consumer would not see any difference in how the natural and user-defined hierarchies behave, although performance benefits would accrue to the natural hierarchy in terms of aggregating and indexing structures (invisible to the consumer, in this case) based upon the natural associations within the source data.
We will examine the many properties, and their settings that we use in constructing attribute relationships in next months column, where we will gain hands-on exposure to these in a working environment.
Conclusion
In this article we introduced attribute relationships in Analysis Services, extending the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. In a manner similar to previous articles within this subseries, we overviewed the general concepts involved and looked ahead to our next article, where we will perform a hands-on, detailed examination of the properties that support attribute relationships existing within a working sample environment. Our focus, we stated, was upon the appropriate use of attribute relationships in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results that can accompany the uninformed use of this component of the conceptual model.
Our introduction included a review of the nature of attribute relationships, and their purposes / uses within a containing dimension within Analysis Services, as well as their role in helping to meet the primary objectives of business intelligence. We performed an overview of attribute relationships, and a discussion of their default configurations, and how we can improve upon those configurations to enhance overall processing performance. We then moved into a discussion of other best practices and general considerations surrounding attribute relationships. Finally, we looked ahead to the next article of this series, where we will examine the individual properties underlying attribute relationships, and conduct a review of the respective settings associated with each property, basing our exercise upon the review of a representative dimension attribute within our sample UDM.
About the MSSQL Server Analysis Services Series
This article is a member of the series Introduction to MSSQL Server Analysis Services. The monthly column is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (Analysis Services), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this series.
» See All Articles by Columnist William E. Pearson, III
Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.