There has never been a more busy time for data architects, database administrators and database developers than now. Programs and initiatives require more creative solutions and in the push to implement solutions that cost less and do more, some design fundamentals are being dropped from the development lifecycle. Read on to learn more.
Okay, so here we are in the 21st century, there has never been a more busy time for data architects, database administrators and database developers. There are more and more projects, programs and initiatives underway that require us to deploy more creative solutions. There are so many new and improved software solutions making large amounts of data immediately accessible for cloud computing, federating more data that was once buried within legacy applications and enabling data to become more agile.
However, in the push to implement solutions that cost less and do more, it has been my experience that some design fundamentals were left out of the development lifecycle. More specifically, database design fundamentals. Why? It may be a result of aggressive project time frames, the decrease in performance times for service level agreements or the reduction in the window for batch processing. The root cause is yet to be determined.
The fact is that as projects go through the software development lifecycle (SDLC), there are a number of tasks and activities that we as data architects and database administrators should always incorporate that will ensure database applications are designed within scope, cost and service level agreements for performance. These tasks or activities are integral parts of specific layers of the solutions architecture. What are those layers? The data layer and the application layer. The tasks here can make or break the database application. They are the fundamental building blocks to an effective database design where refactoring and rework is reduced to significantly low levels. These tasks enable the construction phase to be completed timely with little or no guesswork about the design. It should be noted that one of the more important threads throughout the execution of these tasks is documentation. It cannot be emphasized enough how important documenting the design is. This point will be discussed further in this article. So, let's look at some of the tasks that should be incorporated to effectively design a database application system.
The Data Layer
This is traditionally the center of all database design activities. Once the logical data model has been created, the database administrators are engaged to create the physical data model, the database and the related objects. The work that is involved in creating the physical data model is focused primarily on translating the logical data model into an artifact that can be used to generate the database definition language (DDL) and of course the database, taking into account the features and constraints of the DBMS whether it's Oracle, SQL Server or UDB. During the work effort to create the physical data model, the team should be mindful of the project scope and business service level agreements related to performance and accessibility. Where does this information exist? It should be defined in the business requirements and functional specification documents for the project. These documents will define the type of database needed. So it should be pretty clear that the database application being designed is data warehouse, a transactional database or a data mart. Using this set of documentation and periodic reviews with the business analyst(s), the database administrators and application development team should be able to create a physical data model, the database and related objects in phases.
The first phase is to ensure that all existing database standards are applied to the physical data model. This includes naming conventions for columns, primary and foreign key constraints, tables and indexes. The second phase is to review the detailed business requirements and functional specification documents to align the physical data model to data accessibility requirements. For example, if the database application being designed is part of a decision support system or a business intelligence framework, having immediate access to data is a high priority. Elements to be considered should include the use of surrogate keys for uniquely identifying each row of data and natural keys (including composite keys) for the way data is accessed through common business data dependent functions. Also included is the cardinality of the columns that will be a part of the alternate keys. Namely, if a composite key is required, the column with the highest cardinality must be the first column in the index followed by the other columns such that the column with the lowest cardinality is the last column of the index. This ensures that when this specific index is used, that a smaller result set of data is created that is specific to the request instead of a much larger result set with only a few rows qualified for the request. Let's also not forget that during these activities, the metadata repository is being augmented with information generated from the physical data model design effort. The third phase is to organize the tables in the physical model based on performance, scalability and availability requirements. What this means in practice is organizing tables, indexes, tablespaces and containers that possess certain characteristics. At one end of the spectrum, tables that include small data volume, frequent access and low volatility (i.e. updated infrequently) should be grouped together in the same tablespace. In this way, the data accessed in these tables can be brought into and kept in memory or cache thus reducing read I/O. At the other end of the spectrum, tables that will possess large data volumes that must be processed and made available within limited windows of time should be partitioned. This ensures that these tables are segmented into manageable chunks such that the data can be spread across multiple CPUs. The application and data management utilities can now leverage parallel processing and to some extent grid computing to ensure that performance SLAs are met and CPU-intensive operations are completed within scheduled timeframes on large data volumes when needed.
The Application Layer
Before I begin, I need to state that the details discussed here are specific only to good coding practices for SQL. So now that I have cleared that up, let's talk about the components of the application layer that actually touch the data and the related tasks.
Designing the data layer is only part of the overall effort. Remember, the database design is only as good as the application that uses it. Here too the business requirements and functional specification documents provides information that is used to design and build out the application layer. These documents contain the specifications for how the data will be used. Additionally, the design of the application layer should be collaborative. The DBA, application developers and business analyst should be part of the design team for the application layer. The creation of the application layer should employ a top-down approach; the top being the scope of the project, with each lower lever defining the components that will become the integral parts of the application. This is an effective way of defining the tasks of creating an application by deconstructing the business requirements and functional specifications in order to define and create the processes required for the project deliverables. The other set of tools needed is a set of guiding principles that governs application design. The theme for these guiding principles is that the application must be maintainable and efficiently use resources.
Within this framework, the team, mindful of the project scope, should begin by defining the application processes that will be required to provide a particular set of services. Once this is done, the last set of tasks should focus on identifying the data needs for each application process and determine the best way to access and/or process that data. This means effectively using resources to retrieve and process the data. It's about employing best practices for coding SQL statements. For example, defining the specific columns to be used in the "SELECT" statement instead of retrieving the entire row by using "SELECT *" statements (only retrieve what is needed), comparisons in the "WHERE" clause with columns that are of the same data type and definition, limited use of negative logic in the "WHERE" clause.
So now all that all these activities have been completed, let's talk about documentation. In my experience, this is always the weakest area in the design phase of SDLC. It could be that as the deadline for delivery looms, it's often considered as one of the products that is not mandatory and can always be completed at a later date. Here's the thing, each phase of the SDLC creates artifacts and documentation is just that, a project artifact. This documentation pulls together the entire design effort. It aligns the design of the data layer and the application layer to the business requirements and the functional specification documents. Design documentation provides the explanations of decisions made. It is also the forerunner of the production runbook completed in the deployment phase of the project.
The Last Word
These activities are a cohesive set that work towards a common goal. That is to ensure that the solution performs within scope and established service level agreements with maintainability and scalability at its core.