Building a Data Warehouse Blueprint for Success

Wednesday Jul 28th 2010 by Denise Rogers

One of the most integral components and critical success factors of any enterprise data warehousing initiative is the Solutions Architecture document, a high-level conceptual model of a data warehousing solution. Learn why this collaborative effort that addresses the needs of all major stakeholders, including both the business units and Information Technology (IT), is essential.

One of the most integral components and critical success factors of any enterprise data warehousing initiative is the Solutions Architecture document, a high-level conceptual model of a data warehousing solution. Learn why this collaborative effort that addresses the needs of all major stakeholders, including both the business units and Information Technology (IT), is essential.

One of the most integral components and critical success factors of any Data Warehousing initiative is the Solutions Architecture document.

It is the high-level conceptual model of a data warehousing solution. The solutions architecture creates the roadmap of the possible solution, providing a context for the planning and construction of a comprehensive future state environment. It essentially sets the framework for the development of the data warehousing solution. It is a collaborative document that addresses the needs of all major stakeholders including both the business units and the Information Technology (IT). It transfers all of the ideas, concerns, issues and pain points associated with the current infrastructure and project from business questions to possible technologically based answers. It reinforces the scope of the project.

Therefore, the Solutions Architecture document should not be an academic exercise. It must be a high-level design of the data warehousing solution specific to the project and should include the following items.

ETL Framework

Using a complete set of business requirements, the ETL framework will be used to build solutions to cleanse, standardize and integrate data from disparate sources into the data warehouse and/or operational data store.

Staging Area

The staging area is the information hub primarily used in support of the ETL framework were source data goes through a series of enrichment processes for the purpose of populating the data warehouse and/or the operational data store.

Operational Data Store

The operational data store (ODS) is a set of logically related data structures within a database that primarily includes integrated, subject oriented, volatile, non-historical atomic data used specifically in support of transactional processing and timely operational reporting.

Data Warehouse

The data warehouse is the opposite of the ODS in that it is a database that contains integrated, historical, non-volatile, organizational level, atomic data used primarily in downstream decision support initiatives such as business intelligence.

Data Marts

A data mart is a subject oriented set of organizational level data created to support analytical reporting requirements for a specific business unit.

Business Intelligence

Business Intelligence is a logical grouping of applications and repositories that are specifically architected for information delivery to the business community.

Steps in building a data warehouse solutions architecture document

As the saying goes, "a picture is worth a thousand words", this is a great way to start the process. Once a diagram has been developed, building the outline and related detail become almost organic. This is especially true with the use of industry standard enterprise architectural frameworks.

However, the architects on the project team must choose a format for the solutions architecture document. In my years of doing data warehousing, two formats are quite common. It can be a high-level "how-to" guide with the requisite definitions of each of the data warehouse components. Alternatively, it can be a "roadmap" with definitions of each data warehouse component along with the descriptions of related standards and guidelines that need to be adhered to within the Software Development Life Cycle (SDLC).

The format selection is usually based on the culture and the maturity of the organization and the team must fully understand this aspect of the project in order to make the right selection.

The main thing is that independent of the format chosen, the building of the solutions architecture is a collaborative effort where the major project stakeholders from the executive team, business units and IT must provide input into the document to create a meaningful and usable asset to the organization.

In Today's World

However, the audience for this document has dramatically changed over the years. In the mid to late nineties, this document was created only for the executive and the in-house project teams. Since the 21st century, IT has gone through a technological and cultural evolution of sorts and the ways things are done are quite different now. The solutions architecture document now has a different audience that it speaks to. This audience is a project team that is not quite co-located, meaning that the team now includes the executive team, the in-house project team, the outsourced and the offshore project teams.

Since the architects are the primary authors of this document, it is therefore their responsibility to complete a number of exercises specifically created to understand the organizational communication plans, rules of engagement and project management procedures. It may seem like this is too much of an extra effort, however if the whole idea behind the document is to produce a workable solution that the entire project team thinks is realistic and the executive team has bought into, it's worth the extra hours of time spent!

Don't ever forget that the solutions architecture document must be able to stand with other project assets within the organization and also provide a comprehensive guide to the build out of the data warehousing environment and its related components.

And yes! Of course, I have a couple of stories that that I would like to share that illustrate these points!

The one that did not work

A few years ago, I was assigned to a data warehousing project that would completely modernize an aging infrastructure. My role was that of the data architect, I was responsible for the design of the entire data layer. The project charter and scope had been socialized and accepted. So, my thinking is that the easy part is done now the real work begins. I just did not anticipate how real and tough it would be!

The approach was to have the project manager author all documents produced from the project with the team (including the architects) contribute as required in the project plan.

Well what do you think happened?

The project manager completed the entire document and asked each of the architects to contribute only a diagram and set of definitions for each component within the diagram. During the review with the major stakeholders, I just remembered noticing lots of disinterested looks, few questions being asked except from the IT department who were very angry that they were not included in the creation of the document. The sections of the document related to the data layer (including the data warehouse, staging area, ODS and data marts) was criticized as not being based in reality, not taking the existing standards into consideration. The whole document was dismissed as an academic exercise by the IT review team. Needless to say, that review did not go well and after the meeting was over, I basically walked out of the conference room feeling like I was blindsided by a train that I did not see coming!

So what finally happened to that document? It succeeded in being an item that was checked off on a project plan as being completed. Last I recall, it is now a very expensive doorstop or bookshelf filler in somebody's office! Not a good ending for such an important asset to the project and the organization.

The one that did work

Recently, I have been working on assignment as a data architect within a financial company. This project was scoped to modernize and centralize an IT infrastructure that was extended over the IT and business landscape. Meaning, IT supported some of the database applications and the business units supported others.

My role was to complete sections of the Solutions Architecture document related to the data layer. The approach used here was very different to my prior assignment. The project management team assigned the completion of the Solutions Architecture document to the architecture team!

We agreed that with the completion of each component of the solutions architecture, we as architects would socialize, review, address issues, solicit input and provide clarifications with all major stakeholders on the IT and business teams fully aware that this document would be used by a project team that was not co-located and included offshore members as well.

As we stepped through the process and with every review, it became quite clear that this document had "legs" to stand up with all other assets of the project as well as a common belief being shared that the information being produced will be used extensively as the data warehousing environment is rolled out over time! That's a much better ending than the last one! I can assure you this document will NOT be the dusty pile of papers on any bookshelf anywhere at any time!

Additional Resources

Rensselaer Data Warehouse Project
Data Warehouse Architecture

» See All Articles by Columnist Denise Rogers

Mobile Site | Full Site