Introduction to Attribute Relationships in MSSQL Server Analysis Services

Tuesday Dec 23rd 2008 by William Pearson
Share:

Join BI Architect Bill Pearson as he introduces Attribute Relationships into his extended examination of the dimensional model within the integrated Microsoft Business Intelligence solution.

This article introduces Attribute Relationships in Analysis Services, and extends the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. Here, we will focus upon the purpose of attribute relationships, which define the possible associations between attributes. Moreover, we will discuss why these relationships are important, and how they allow us to define the properties of association that a given attribute has with other attributes.

Note: For more information about my Introduction to MSSQL Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of this article.

Introduction

In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. We then began our examination of dimensions, the analytical “perspectives” upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:

  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.

We learned, in Dimensional Model Components: Dimensions Parts I and II, that dimensions form the foundation of the dimensional model. They represent the perspectives of a business or other operation, and reflect the intuitive ways that information consumers need to query and view data. We noted that we might consider dimensions as nouns that take part in, are acted upon by, or are otherwise associated with, the verbs (or actions / transactions undertaken by the business) that are represented by the facts or measures contained within our business intelligence systems.

We discovered in earlier articles that, within the Analysis Services model, database dimensions underlie all other dimensions, whose added properties distinguish them from the database dimensions they reference, within the model. Each dimension within our model contains one or more hierarchies. As we learn in other articles of this series, two types of hierarchies exist within Analysis Services: attribute hierarchies and user (sometimes called “multi-level”) hierarchies. For purposes of this article, the term “attribute” means the same thing as “attribute hierarchy”. (We examine user hierarchies, to which we will simply refer as “hierarchies,” in other articles specifically devoted to the topic.)

To extend the metaphor we used earlier in describing dimensions as nouns and measures as verbs, we might consider attributes as somewhat similar to adjectives. That is, attributes help us to define with specificity what dimensions cannot define by themselves. Dimensions alone are like lines in geometry: they don't define “area” within multidimensional space, nor do they themselves even define the hierarchies that they contain. A database dimension is a collection of related objects called attributes, which we use to specify the coordinates required to define cube space.

Within the table underlying a given dimension (assuming a more-or-less typical star schema database) are individual rows supporting each of the members of the associated dimension. Each row contains the set of attributes that identify, describe, and otherwise define and classify the member upon whose row they reside. For instance, a member of the Patient dimension, within an Analysis Services implementation for a healthcare provider, might contain information such as patient name, patient ID, gender, age group, race, and other attributes. Some of these attributes might relate to each other hierarchically, and, as we shall see in other articles of this subseries (as well as within other of my articles), multiple hierarchies of this sort are common in real-world dimensions.

Dimensions and dimension attributes should support the way that management and information consumers of a given organization describe the events and results of the business operations of the entity. Because we maintain dimension and related attribute information within the database underlying our Analysis Services implementation, we can support business intelligence for our clients and employers even when these details are not captured within the system where transaction processing takes place. Within the analysis and reporting capabilities we supply in this manner, dimensions and attributes are useful for aggregation, filtering, labeling, and other purposes.

Having covered the general characteristics and purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V, we fixed our focus upon the properties underlying them, based upon the examination of a representative attribute within our sample cube. We then continued our extended examination of attributes to yet another important component we had touched upon earlier, the attribute member Key, with which we gained some hands-on exposure in practice sessions that followed our coverage of the concepts. In Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys, we introduced attribute member Keys in detail, continuing our recent group of articles focusing upon dimensional model components, with an objective of discussing the associated concepts, and of providing hands-on exposure to the properties supporting them.

As a part of our exploration of attribute member Keys, we first discussed the three attribute usage types that we can define within a containing dimension. We then narrowed our focus to the Key attribute usage type (a focus that we developed, as we have noted, throughout Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys), discussing its role in meeting our business intelligence needs. We next followed with a discussion of the nature and uses of the attribute Key from a technical perspective, including its purpose within a containing dimension within Analysis Services.

In Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys, we introduced the concepts of simple and composite keys, narrowing our exploration in Part I to the former, where we reviewed the Properties associated with a simple key, based upon the examination of a representative dimension attribute, Geography, within our sample UDM. In Part II, we revisited the differences between simple and composite keys, and explained in more detail why composite keys are sometimes required to uniquely identify attribute members. We then reviewed the properties associated with a composite key, based upon the examination of a representative dimension attribute, Date, also within our sample UDM.

In Attribute Member Names, we examined the attribute member Name property, which we had briefly introduced in Dimensional Attributes: Introduction and Overview Part V. We examined the details of the attribute member Name, and shed some light on how they might most appropriately be used without degrading system performance or creating other unexpected or undesirable results. Finally, we examined the “sister” attribute member Value property (which we introduced along with attribute member Name in Dimensional Attributes: Introduction and Overview Part V) in Attribute Member Values in Analysis Services. As we did in our overview of attribute member Name, we examined the details of Value. Our concentration was also similarly upon its appropriate use in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results, that can accompany the uninformed use of the property.

In this article, we will examine another part of the conceptual model, Attribute Relationships. In this introduction, we will discuss several best practices and design, and other, considerations involved in their use. Our focus will be upon the general exploitation of attribute relationships in providing support for the selection and delivery of enterprise data. (In our next article, we will examine attribute relationships in a manner similar to previous articles within this subseries, concentrating in detail upon the properties that underlie them.)

Our examination will include:

  • A review of the nature of an attribute relationship, its purpose within its containing dimension within Analysis Services, and its role in helping to meet the primary objectives of business intelligence.
  • A general overview of attribute relationships, and a discussion of their default configurations, and how we can improve upon those configurations to enhance overall processing performance.
  • A discussion of other best practices and general considerations surrounding attribute relationships.
  • A look forward to the article that follows within our series, where we will perform a detailed examination of the properties underlying attribute relationships, along with a review of the respective settings associated with each property, based upon a representative dimension attribute within our sample UDM.

Attribute Relationships

As we have learned, attributes serve as the foundation for our dimensions and cubes. Each dimension contains a set of attributes, the organization of which is based upon attribute relationships. We create attribute relationships as a part of creating a given dimension. For each table that we include within a dimensional structure, an attribute relationship is established to associate the key attribute to other attributes from the table.

Multiple advantages accrue from the knowledgeable use of attribute relationships, including:

  • Enhancements in dimension, partition, and query processing, based upon a reduction in the total amount of memory required for dimension processing.
  • Enhanced query performance due to more rapid storage access, as well as more optimized execution plans within Analysis Services.
  • Enhanced performance because of the selection of more effective aggregates via the Analysis Server aggregation design algorithms (assuming that user-defined hierarchies have been defined along relationship paths).

One of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of attribute relationships, and the effective use of hierarchies (which we examine in other articles of this subseries) represent the two most important considerations underlying the optimization of dimension design.

Overview

An Analysis Services dimension typically has, as an underlying source, a relational data warehouse dimension table. And within relational data warehouses, each dimension usually contains a primary key, attributes, and, in some instances, foreign key relationships to other tables. The attributes provide descriptive context to the primary key and, from a relational perspective, each attribute has either a many-to-one relationship with the primary key (this is, by far, most frequently the case) or a one-to-one relationship with the primary key. The attribute relationships existing within the Product dimension of the Adventure Works DW sample database (available to anyone installing Analysis Services) appear as depicted in Illustration 1.

Attribute Relationships: Product Dimension Primary Key
Illustration 1: Attribute Relationships: Product Dimension Primary Key

Some attributes also have relationships with other attributes. For example, in the Product dimension we have cited, the Subcategory attribute has a many-to-one relationship with the Category attribute, as shown in Illustration 2.

Attribute Relationships for Subcategory in Adventure Works DW
Illustration 2: Attribute Relationships for Subcategory in Adventure Works DW

A grasp of the concept of relationships within Analysis Services is as important to the OLAP developer / administrator as an understanding of the dependencies among fields in relational databases are to their own respective developers and administrators. To allow it to create useful aggregations, to accurately and completely aggregate data, and to efficiently store and retrieve information, the relationships between attributes must “make sense” to Analysis Services. This is the purpose of attribute relationships: to define and describe the relationship between a given pair of attributes. To optimize our dimension designs, we must first understand how our attributes are related to each other, and then we must take steps to let Analysis Services know, via attribute relationships, what those relationships are.

When we initially create a dimension and add attributes, Analysis Services automatically defines the relationship between the primary key attribute and all other attributes as “many - to – one.” While this default scenario is not “inaccurate,” on the surface, it is likely to be suboptimal from the perspective of Analysis Services performance, as it requires the server to perform extra work to determine that some combinations do not exist, among other activities. With the default scenario in place, anytime a query to the data source includes an attribute within the dimension, Analysis Services will summarize data from the primary key and then group it by the attribute under consideration. This means that if we choose to present our data by a given attribute at one point in time, individual member keys are grouped “on the fly” by the same attribute. If we next request summarized data for another attribute, then the entire “on-the-fly” grouping is performed once again for the attribute requested. Unsurprisingly, inefficiencies can mount in cases where the totals of a given attribute might be easily derived from the totals of another attribute.

An example would be a case where we request a total for a couple of attributes (such as City and State-Province within the Customer dimension of the Adventure Works sample cube) where City “rolls up” into the State-Province attribute of the same dimension: leaving the attribute relationships at default of “many - to – one” would mean that Analysis Services would summarize data from the primary key (CustomerKey in the underlying DimCustomer table), and then perform an “on–the–fly” grouping by the City attribute. To deliver the second total for State-Province, Analysis Services will perform a separate, “on–the–fly” grouping for the State-Province attribute. The processing inefficiencies are obvious, since each State-Province’s total can be derived (more quickly) from the totals of the Cities underlying it. (The same scenario applies to the “rollup” relationship between the State-Province and Country attributes.)

We can see the attribute relationships within the Customer dimension that we have discussed in our geographical attribute examples, appearing as depicted in Illustration 3.

Some Attribute Relationships for Geographical Attributes within the Customer Dimension in Adventure Works DW
Illustration 3: Some Attribute Relationships for Geographical Attributes within the Customer Dimension in Adventure Works DW

The default design has other inherent inefficiencies. Since Analysis Services cannot identify which attribute combinations naturally exist in the dimension, the server must use the fact data to identify meaningful member combinations, as we intimated earlier. As an illustration, at query time, if an information consumer requests data by Country and State-Province, Analysis Services will be forced to perform additional processing to ascertain, for example, that the combination of Country: France and State-Province: Arizona does not exist.

We can turn to the browser for the Customer dimension within the Adventure Works cube to see the structure underlying the aforementioned example attributes in our Customer geography example, as shown in Illustration 4.

Using the Browser for the Customer Dimension to See the Results of the Underlying Attribute Relationships
Illustration 4: Using the Browser for the Customer Dimension to See the Results of the Underlying Attribute Relationships

Here we can see that the Analysis Server has used the attribute relationships depicted in Illustration 3 to align the appropriate State-Provinces to their respective Countries, and therefore to make the example we have offered, Country: France and State-Province: Arizona, a physically invalid combination.

Even though, in the example above, Customer Key is no longer directly related to Country and State-Province, it remains indirectly related to these attributes through chained attribute relationships. More specifically, Customer Key is related to Country using the chain of attribute relationships that link Customer Key to Postal Code, Postal Code to City, City to State-Province, and State-Province to Country. Such a chain of attribute relationships is also known as a cascading attribute relationship.

Cascading attribute relationships enable better performance “decisions” within Analysis Services with regard to several factors, including (as we have intimated earlier) aggregation design, data storage, data retrieval, and MDX calculations. Moreover, attribute relationships also assist us in enforcing dimension security and in associating measure group data with non-primary key granularity attributes. (For example, if we have a measure group that contains Internet – and other - Sales data by Customer Key and forecast data by State-Province, the forecast measure group will only “know how” to roll up data from State-Province to Country if attribute relationships exist only between State-Province and Country.)

As we have noted in Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys, among other articles of this series, the KeyColumns property consists of a source column or combination of source columns (known as a collection) that uniquely identifies the members for a given attribute. Once we define relationships among our attributes, the importance of the KeyColumns property becomes even more prominent. We must ensure that the KeyColumns property of each attribute in the dimension uniquely identifies the respective attribute member. If the KeyColumns property does not uniquely identify each member, duplicates encountered during processing (which are ignored by default), result in incorrect data rollups.

It is important to be aware, also, that if the attribute relationship has a default type of Flexible, Analysis Services will not notify us when it encounters duplicates, and will assign all of the duplicate members incorrectly. (We will examine the relationship types further within the hands-on practice section that appears in our next article.)

As we define new attribute relationships, one important consideration we should keep in mind, from the tandem perspectives of performance and data correctness, is the removal of any redundant relationships. As an example, when the relationships shown in Illustration 3 were established, the Customer Key no longer required the (default) direct relationship to City or State-Province. These attribute relationships would have best been removed at this point.

NOTE: To help us to identify redundant attribute relationships, the Business Intelligence Development Studio provides a warning message, underneath the three panes of the Dimension Structure tab of the dimension, to alert us to any existing redundancies. The Studio, however, does not require us to eliminate the redundancy. It is a best practice to always manually remove each redundant relationship. (Once we have removed the redundancy, the warning disappears.)

Best Practices and Other Considerations Surrounding Attribute Relationships

Best practices dictate that, in designing attribute relationships, we 1) create the most efficient dimension model that 2) best represents the semantics of the business that it represents. Perhaps the most critical element of success we can seek in doing so is a deep familiarity with the organization’s data, together with an extensive understanding of the business requirements that the data must support. Defining attribute relationships incorrectly can cause invalid query results, among other maladies, within our Analysis Services databases.

Anytime the data underlying our Analysis Services UDM support the definition of unique attribute relationships, we should use the Attribute Relationships tab of Dimension Designer to define these unique relationships.

NOTE: We will examine detailed settings that support existing attribute relationships within the sample Adventure Works cube in the hands-on practice section of the article that follows this one.

Any attribute that has an “outbound” relationship must have a unique key relative to its related attribute. In other words, a member in a source attribute must identify one - and only one - member in a related attribute. For example, consider the relationship, City -> State-Province that we used in examples earlier: in this relationship, the source attribute is City and the related attribute is State. The source attribute is the “many” side and the related side is the “one” side of the many-to-one relationship. The key for the source attribute would be City + State. A visual example how this key is established in the Customer dimension of the Adventure Works cube is depicted in Illustration 5.

Establishing a Unique Key for the City Attribute, Relative to the State-Province Attribute
Illustration 5: Establishing a Unique Key for the City Attribute, Relative to the State-Province Attribute

We noted earlier that attributes within a given dimension are always related either directly or indirectly to the key attribute. We also noted that, when we define a dimension based on a star schema (again, typically a scenario where all dimension attributes are derived from the same relational table), an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. Moreover, when we define a dimension based upon a snowflake schema (a scenario where dimension attributes are derived from multiple related tables), an attribute relationship is automatically defined as follows:

  • Between the key attribute and each non-key attribute bound to columns in the main dimension table.
  • Between the key attribute and the attribute associated with the foreign key in the secondary table that links the underlying dimension tables.
  • Between the attribute associated with the foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.

We noted earlier that we might want to modify these default attribute relationships, and cited but one example. Reasons might include the following:

  • To define a natural hierarchy;
  • To define a custom sort order; or
  • To define dimension granularity based upon a non-key attribute.

Natural Hierarchy Relationships

A hierarchy is known as a “natural hierarchy when each attribute included in the user-defined hierarchy has a one-to-many relationship with the attribute immediately below it. For example, consider a Customer dimension (in this case, a “slimmed down” version of the Customer dimension table that underlies the sample Adventure Works cube), based upon a relational source table with these columns:

  • CustomerKey
  • CustomerName
  • Age
  • Gender
  • Email
  • City
  • Country
  • Region

The corresponding Analysis Services dimension has seven attributes:

  • Customer (based on CustomerKey, with CustomerName supplying member names)
  • Age
  • Gender
  • Email
  • City
  • Region
  • Country

Relationships representing natural hierarchies are enforced by creating an attribute relationship between the attribute for a level, and the attribute for the level there under. This specifies a natural relationship and potential aggregation within Analysis Services. In our “slim version” of the Customer dimension, a natural hierarchy exists for the Country, Region, City, and Customer attributes. The natural hierarchy for {Country, Region, City, Customer} is described by adding the following attribute relationships:

  • The Country attribute has an attribute relationship to the Region attribute.
  • The Region attribute has an attribute relationship to the City attribute.
  • The City attribute has an attribute relationship to the Customer attribute.

As we see in other articles of this series, we are not limited to working with natural hierarchies in Analysis Services. We can also create navigational structures via user-defined hierarchies (which represent ad hoc, reporting hierarchies) whose bases reside in logical hierarchical relationships versus associations that exist in the data.

For example, we might create a user-defined hierarchy based on {Age, Gender}. The ultimate information consumer would not see any difference in how the natural and user-defined hierarchies behave, although performance benefits would accrue to the natural hierarchy in terms of aggregating and indexing structures (invisible to the consumer, in this case) based upon the natural associations within the source data.

We will examine the many properties, and their settings that we use in constructing attribute relationships in next month’s column, where we will gain hands-on exposure to these in a working environment.

Conclusion

In this article we introduced attribute relationships in Analysis Services, extending the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. In a manner similar to previous articles within this subseries, we overviewed the general concepts involved and looked ahead to our next article, where we will perform a hands-on, detailed examination of the properties that support attribute relationships existing within a working sample environment. Our focus, we stated, was upon the appropriate use of attribute relationships in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results that can accompany the uninformed use of this component of the conceptual model.

Our introduction included a review of the nature of attribute relationships, and their purposes / uses within a containing dimension within Analysis Services, as well as their role in helping to meet the primary objectives of business intelligence. We performed an overview of attribute relationships, and a discussion of their default configurations, and how we can improve upon those configurations to enhance overall processing performance. We then moved into a discussion of other best practices and general considerations surrounding attribute relationships. Finally, we looked ahead to the next article of this series, where we will examine the individual properties underlying attribute relationships, and conduct a review of the respective settings associated with each property, basing our exercise upon the review of a representative dimension attribute within our sample UDM.

About the MSSQL Server Analysis Services Series

This article is a member of the series Introduction to MSSQL Server Analysis Services. The monthly column is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this series.

» See All Articles by Columnist William E. Pearson, III

Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved