Redundancy, caused by integrating multiple definitions of master data across different lines of business into a data warehouse can cause reports to depict inaccurate results. Master Data Management, introduced in SQL Server 2008 R2, overcomes this problem, allowing you to create, organize and manage master data centrally.
Real world example
We have to set up an enterprise data warehouse for a client, but the problem is, this client has several applications running across different functions of the business. Each of these applications has their own definition of master data. Now, because of this duplication of master date, if we integrate these master data into a data warehouse and generate reports, it would not give accurate results. For example, consider a product called "Mount Bike 6 Gear" it has a different definition with a different name within different applications viz. Sales, Marketing, Inventory etc.
Now we integrate these records in the data warehouse, even though each of the above records represents the same product, the report result would be scattered all over the report because of their different definitions and the interpretation might be wrong if it is based on this incorrect report result.
Multiple definitions of master data across different lines of business would not pose a problem as long as we do analytics or reporting on each of this application data separately. However, when we need to integrate these data into a data warehouse, this redundancy or duplicity of data would cause the report to depict incorrect results. This problem can be overcome with the help of Master Data Management, which allows you to create, organize and manage master data centrally. SQL Server 2008 R2 introduces the Master Data Management platform, which is built on SQL Server and WCF (Windows Communication Foundation). However, before we dig into MDM in detail, that let me explain some of the terminologies related to it.
Defining different terminologies…
Master data represents business entities, for example people (customers, vendors, employees, etc.), places (departments, locations, etc.), products (products, parts, etc.), abstracts (Time, etc.) etc. Master data is non-transactional data, which references the transactional data. If we talk of analogy in OLAP, master data is contained by the dimensions of the cube.
Master Data Management
Master Data Management is a set of tools, processes and policies, which allow an enterprise to create, organize and manage correct and consistent master data centrally. You can learn more about Master Data Management at Wikipedia.
Gearing up for Master Data Services…
Master Data Services (MDS) of SQL Server 2008 R2 provides an extensible platform for Master Data Management, which is built on top of SQL Server 2008 R2 Database Engine and WCF (Windows Communication Foundation). It allows you to create a centralized hub for master data and does transaction logging for all the changes made to these master data, for example what changes have been done, who made the changes, when the changes were made, what was the value before the change and what is value after change. This transaction logging allows user to revert to old values if changes are not required anymore. MDS also allows you to define multiple versions of your master data. For example, you can have one committed version that will be available for consumption by downstream applications and at the same time, you can have an open version of the same master data to do any changes to it. This ensures that downstream applications get consistent data because no changes are permitted on the committed version of master data and at the same time, you are able to do changes to the open version of the master data. A version could be either in open (available for changes), locked (locked for validating master data against defined business rules) and committed (no changes are permitted and hence this version is available for reporting and analytics) version.
Master Data Services allows role-based security for all of its objects as well as for data that it contains. For example, you can have four different roles and each role might have access to data belonging to either of south, north, east and west regions, etc.
You can define different types of business/validation rules and can ensure the master data that MDS contains is validated against these rules. For example, the product about which I talked before, i.e. Mount Bike 6 Gear has been recently launched and because of the high demand for this product, I want to ensure that DaysToManufacture should not be more than 10. MDS allows defining this kind of validation rules and flags the records if they violate any of these defined rules. You can also set up to launch a SharePoint workflow or send notification to the intended audience if these validations fail.
Understanding Master Data Services Architecture…
Master Data Services (MDS) architecture can be divided into three different layer stacks. On the bottom of this architecture stack is DB Layer, which is built on SQL Server 2008 R2 database, which stores all the versions of master data, meta data, system wide setting, security related information and transaction logging. The Service Layer sits on top of the DB Layer and has all of the business logic of MDS. It provides WCF API to programmatically access all of the features of MDS; it also provides an interface for importing master data in MDS through the MDS staging area and for exporting master data to downstream applications for consumption via subscription views. On top of this architecture is the data stewardship portal called Master Data Manager, which provides an ASP.NET UI to access all the features of MDS. Through this UI, you can create different MDS objects, add data to it, define business rules, specify workflow notification, email notification, etc.
Figure 1 - Master Data Services Layers
Master Data Services - Components...
When you start working with Master Data Services, the first object that you need to create is a model. A model is kind of a container that contains all the Master Data Services objects created. Talking of an analogy with a relational database, it's very much like a database or a schema, whereas in an analogy with SQL Server Analysis Services it's like a cube that contains the other objects.
A model contains at least one entity. An entity contains zero or more members with similar attributes. Talking of an analogy with a relational database it's very much like a table whereas in an analogy with SQL Server Analysis Services it's like a dimension; for example Employees, Locations, Customers, etc.
An individual data item or an instance of an entity is called a member. For example, "Employees" is an entity and each employee (John, Steve, Paula, etc.) belonging to this entity is an instance of this entity or member of this entity.
As I said before all of the members of an entity share the same set of attributes. Talking of an analogy with a relational database a member is very much like a row of a table whereas in an analogy with SQL Server Analysis Services it's like a member of a dimension.
A property or description of a member is called an attribute. For example, in "Employees" entity EmpID, EmpName, Address, DOB, DOJ are some of the attributes. Talking of an analogy with a relational database an attribute is very much like a column of a table whereas in an analogy with SQL Server Analysis Services it's like an attribute of a dimension.
An entity of an MDS might contain tens or even hundreds of different attributes but not all of these attributes would be required by all of the reporting or analytical applications. Hence, MDS allows you to create a group of logical related attributes, which make more sense to a specific reporting or analytical application. Because Attribute Group is a securable object, it simplifies and makes the management and security administration of an entities' attributes easy.
MDS allows you to organize members in tree-based structure, which is called Hierarchy, to view it in different ways. For example, each ProductCategory contains several ProductSubCategory and each ProductSubCategory contains several Products. If any member is modified, all of the related hierarchies are updated to reflect the change. A hierarchy is called an explicit hierarchy if all of the members belong to a single entity; it is called derived hierarchy if members come from multiple related entities (domain-based attribute relationships).
Alternatively you can create a collection (a group of explicit hierarchies, collection members and even collections) if you want different groupings of your members for analytics and reporting purposes instead of creating a complete hierarchy.
MDS allows you to create business rules, which are nothing but declarative conditions or actions on a member, which triggers if invalid data gets in. For example, new products being launched that you don't want DaysToManufature to be more than 10. If any record comes with greater than 10, you should be notified and the record should be flagged invalid.
In this article, we saw what master data is and how Master Data Management helps to better create, organize and manage master data. Then we had a look on Master Data Services, which is available with SQL Server 2008 R2, a Master Data Management platform, its architecture and different components of it. In the next article, I am going to show you how to create different MDS objects.
MSDN: Master Data Services