OLAP and Data Warehousing - Data Warehousing Technology Review (Part 2)

Wednesday Jul 3rd 2002 by Alexzander Nepomnjashiy

Alexzander Nepomnjashiy continues his analysis of OLAP and Data Warehousing with Part Two of a review of the technology behind and the construction of Data Warehouses.

Welcome to the follow-up of the first part in Data Warehousing Technology Review. Let's jump right in.

So, the main idea underlying Data Warehousing is that carrying out the analysis directly on OLTP-systems is ineffective and undesirable. Instead, the idea is to extract the necessary data from several OLTP-systems, convert it, and then place it in a single data source -- a Data Warehouse. This process is also known as the ETL (Extract Transform Load) process.

During the ETL process, the data is:

  • Cleared (unnecessary or housekeeping information is eliminated);
  • Aggregated (the sum and average values are calculated);
  • Transformed (data type conversion and storage  structures reorganization are executed);
  • Unified (from various data sources); and
  • Synchronized (resulting in one instant of time).

Data Warehouse is a ready platform for DSS (Decision Support Systems) and OLAP systems construction as it contains data having the following properties:

  • Internal integrity
  • Although the data is preempted from several independent operating data sources (OLTP-systems), they should be joined by unified naming laws, values measurement ways, and common attribute sets. It has the highest value in those organizations where a number of various systems are exploited and/or similar data is presented in different ways (for example, logical value «True» may be stored as 1,-1, True or T. Other variants of this include the same metric within several systems being named differently - for example «balance» and «totals ».)

    At the stage of storage data loading, the above mentioned mismatches should be eliminated (i.e. the data should be checked out, cleared and presented in a uniform view). The Mandatory execution of such a procedure allows us to avoid many potential problems. Also, the analysis of such integrated data can be performed much easier.

  • «Single subject» focus
  • OLTP systems contain gigabytes of data that are rarely interesting for analysis -- addresses, postal codes, record identifiers, etc. Such information is typically not loaded into a warehouse, which limits the considered data spectrum to a necessary minimum.

    The information in DW-storage is organized according to the primary aspects of company business activity (customers, sales, etc.); this distinguishes Data Warehouses from operating databases where the data are organized according to business processes (invoice shipment, calculations with customers, goods shipment, etc.). Subject orientation (due to corresponding data structures) promotes both analysis simplification and faster execution of analytical queries.

  • Historic nature
  • OLTP systems usually cover relatively limited time intervals, whereas Data Warehouses frequently contain data for up to several decades. This makes them an ideal base for revealing trends and long-term tendencies in business.

    From a technical point of view, the historical nature of data means that tables in storage contain a «temporary key», or data are distributed under several tables, each of them referring to a fixed-time interval.

  • «Read-only» availability
  • Depending of business needs, you choose one of three available methods to update data in a Warehouse Full Process - Completely restructures a cube based on its current definition and then its data recalculation; Incremental Update - Adds new data to a partition in the cube and aggregations updating. This method does not process changes to a cube's structure (measures, dimensions) or changes to its existing source data. An incremental update creates a temporary partition from the new data and merges it into an existing partition); and Refresh Data - Clears and reloads a cube's data and recalculates its aggregations. This method is used if the cube's source data has changed but its structure has not.

    The hand-held data modification in storage is practically never performed (actually Microsoft SQL Server 7.0 OLAP Services cubes may be updated by the user (so called «write-back capabilities»), as it results in data integrity violation. Typically, it is not required to minimize the time of DW data loading, so the storage structure may be optimized for predefined queries execution, achieved by relational scheme de-normalization, preliminary data aggregation and, «of most use», indexes creation.

Thus, the data loaded in DW-storage are organized in a completely integrated structure, have natural internal connections, and gain new properties that add them the status of INFORMATION.

See All Articles by Columnist Alexzander Nepomnjashiy

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved