OLAP and Data Warehousing - Data Warehousing Solution Architecture (Part 2)

Wednesday Aug 14th 2002 by Alexzander Nepomnjashiy

Alexzander Nepomnjashiy continues his series on OLAP and Data Warehousing with a focus on Online Analysis Processing (OLAP) and Data Mining technologies.

Analysis is usually accomplished with tools that are built on the base of attendant technologies. Let's briefly describe two of the most popular technologies - Online Analysis Processing (OLAP - the online operating analysis) and Data Mining (mining for hidden information dependences and tendencies).


The resources of the operating analysis are based on the concept of a multivariate data presentation. Really, each numeric value, inclusive in storage, have up to ten attributes (for example, quantity of sales by a certain manager in a certain region for a certain date). Thus, we can consider that we deal with multivariate data structures - multivariate cubes. So, based on this allowance, all OLAP software are build.

OLAP systems usually have two distinctive abilities:

  • The analysis is carried out above multivariate data structures: OLAP -- cubes (the "cube" metaphor provides a new approach to visualizing how data are organized. A cube gives the impression of multiple dimensions)

  • There is, built-in into OLAP software, the ability to navigate on OLAP-cubes -- (so-called OLAP -- manipulation)

OLAP systems are usually realized based on (department) data marts, rather than (enterprise-wide) data warehouses -- because of the many possible attributes. The multivariate structure (OLAP-cube) is created on the base of the relational DBMS with the additional access interface (MRDB - MultiRelational Database), or with usage of a special multivariate DBMS (MDDB - MultiDimensional Database).

The multivariate data model allows for storage of measurable values (or measures) in the uniform data object with parameters, defining their values or arguments. Most of the measurements (parameters from which the measures depend) are inhomogeneous, for example if we speak about the specific product, it is also possible to speak about some product groups and / or product types. Thus, measurement "product" can have a naming hierarchy: «product types» -> «product group» -> «product». The data analysis and/or the decision maker, infrequently need to work with numeric values specific to a concrete product, so «product group» or even «product types» levels are sufficient. For fast access support to this high-level information, data warehouse as a rule contains this information already computed. The process of such calculation is called aggregation and the computed values themselves, aggregates.

For data warehouses/data marts creation, ordinary relational databases frequently will be used. This case storage database scheme is organized as «starflake» (a schema is called a star schema if all the dimension tables can be joined directly to the fact table) or «snowflake» (a schema is called a snowflake schema if one or more dimension tables do not join directly to the fact table but must join through other dimension tables). For example, in the case of a «star» database scheme organization, there is a fact table (measures table), which contains one or several numeric columns that measure business performance (for example: revenue, costs, number of products sold, number of shipments), and a number of dimension tables, which describe the facts (for example; if our fact is the number of products sold, the dimensions might be time, location and/or region and product).

Data Mining

Data Mining is the detection process for hidden tendencies, trends and cross-correlations in DW-storage data with the use of artificial intellect tools and software and/or statistical analysis. The data mining technology literally emerged over the last few years and is now on top of its development. The given technology includes a wide spectrum of methodologies from statistical analysis methods to neuron networks usage.

The main purpose of data mining is to detect dependencies and tendencies that are inconspicuous at first (human) view and to analyze and monitor these tendencies with a focused attention. Thus, data mining resources and tools allow us not only to parse the data, but also to present it in a way that the tendencies and dependencies are obviously monitored on an ongoing basis.

In Summary

The crucial terms for DW-project are a data warehouse, a data mart, data warehousing, and data mining.

A data warehouse is the cohesive data model that defines the central data repository for an organization. An important point is that we don't define a warehouse in terms of the number of databases. Instead, we consider it a complete, integrated data model of the enterprise, regardless of how or where the data is stored.

A data mart is a repository containing data specific to a particular business group (company departments, for example) in an enterprise. All data in a data mart derives from the data warehouse, and all data relates directly to the enterprise-wide data model. Often, data marts contain summarized or aggregated data that the user community can easily consume.
Finally, data warehousing is the process of managing the data warehouses and data marts. This process includes all the ongoing support needs of the refresh cycle, database maintenance, and continual refinements to the underlying data model.

Data mining is a technology that applies sophisticated and complex algorithms to analyze data and expose significant information for analysis by decision makers. Whereas OLAP organizes data in a model suited for exploration by analysts, data mining performs analysis on data and provides the results to decision makers. Thus, OLAP supports model-driven analysis and data mining supports data-driven analysis.

Related Article: Introduction to Analysis Services: Creating Your First Cube from William E. Pearson

See All Articles by Columnist Alexzander Nepomnjashiy

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved