Our final excerpt covers data warehouse with multiple SQL requesters, the human roles in DBMS, when to use SQL, scaling, speed, price, universality, and analytic capabilities, among other areas.
Data Warehouse with Multiple SQL Requesters
Last, let us look at an example that is probably realistic for most businesses. We have one datastore for
the enterprise that consists of several data servers and a mainframe, collectively called a Data
Warehouse or a Distributed Data Processing system. That conglomerate of a datastore receives SQL
requests for internal business operations from a business logic server. The business logic server is
processing operations as directed by employees using PCs and front-end software on their desks.
The company also has a website serviced by a web server. The web server interacts with visitors and
then sends SQL requests to the same enterprise data warehouse. Although there are many brands of
software involved and many platforms they are all able to speak and understand SQL.
We can trace the information flow in this graphic by starting with the PCs on the right. They are
connected to either the Department Server or the Web Server. These servers can then create a SQL
request using their business logic software and pass the request to one or more of the Data Servers. This
is usually done through a high-speed LAN which only connects servers (represented on the far left).
After the SQL Request has been processed the results can return from left to right through the servers
and to the PCs.
(Click image for full size)
The Human Roles in a DBMS
With the increasing size and complexity of computing tasks, IT departments frequently grow to have
several specialists contributing to the database solutions of the company. If you are coming from a
background in desktop databases you may not have a clear idea of the titles and roles of the players.
Obviously, in a small company one person may have more than one of these roles as a SQL
programmer you may even be expected to cover responsibilities beyond just programming.
The DataBase Administrator (DBA) specializes in installing and maintaining the DBMS, including the
physical devices, backups and recoveries. Frequently the DBA is the person responsible for maintaining
the security scheme as well as troubleshooting the DBMS. Note that the DBA is not usually specifically
responsible for any given database. Rather, the DBA keeps the DBMS running as a service for all the
users that have databases on that DBMS. Each DBMS vendor has training and certification programs for
its particular systems.
The Systems Administrator (sometimes called the System Operator or "SysOp") specializes in the
Operating System (OS) and connectivity of the servers, both data servers and others. "SysAdmins"
perform back-ups of the OS and monitor the traffic load between servers. "SysAdmins" are likely also to
be involved in the security settings of servers needed by SQL programmers.
A newer position is the Security Administrator. Valuable corporate data is now exposed through the
web to a degree that would horrify the last generation of IT professionals. As hackers improve their
sophistication and means of collaboration, IT shops have responded by hiring individuals with specific
training in establishing hardware and software safeguards and in monitoring the servers for signs of
intrusion. SQL programmers may have to work closely with a Security Administrator to develop ways
of using data that minimize risk, for example, creating Views rather than permitting full table access (see
A Database Designer creates the overall scheme of the datastore, generally using specialized database
design tools. The designer will not only understand the principles of databases but also the client's
business model. The result from the designer's desk will be a layout of tables, relationships and queries
needed to satisfy the client's needs.
The Database Programmer writes code and interfaces to implement the design. The programmer may
be less familiar with the business rules and more familiar with how to write SQL statements and front
ends to implement the design. Whereas Database Designers are more involved at the start of a project,
programmers are involved for as long as the database is in use and changes and amendments need to be
performed. Many programmers joining a team will have no contact with the designer who left after the
database deployment. On legacy systems it may even be difficult to find any manuals regarding how the
system was designed.
Last, never forget the user who is, directly or indirectly, paying your fee.
When to Use SQL
My students frequently ask, "When should I be using SQL?" They may have created a database in a
lower level desktop system and are considering switching to a DBMS that will require SQL. SQL is a
language, not a software product, so the real question is best phrased as "When do I need to use a
heavy-duty DBMS, which would then require me to start communicating in SQL?" There are several
factors to consider which we'll look at in turn in this section.
Many small sites or applications start in desk top systems such as Access. As they grow, a number of
problems become apparent with the Access JET database engine. The primary problem is that it was
never designed to support many users at once. Database software designed for the desktop generally
fails when more than a few people try to use the data at the same time. Microsoft Access will perform
well if five users are on a LAN. When you get to ten or twenty concurrent users, problems begin.
Obviously a desktop database cannot handle the problems introduced by large numbers of concurrent
hits from a web site.
Second, the Access file-based system becomes fragile when the amount of data starts to climb into the
gigabyte range. Lower scale systems like Access also lack the security that more powerful systems offer.
Access databases, for example, are quite easy to copy and walk away with. The individual installations
can also be readily accessed.
So for these reasons companies move from Access type desk top systems to a more robust DBMS.
Almost all of these heavy duty DBMS rely on SQL as the main form of communication. In summary,
although there is nothing about SQL itself that concerns scaling, there is a need to use SQL in order to
communicate with DBMS that scale well.
When SQL-enabled DBMS first appeared they were slower than previous DBMS. However, as they
have taken over the market during the last ten years, SQL data engines have been the focus of an
intense effort to improve performance. There are heated contests between the major vendors for
bragging rights to the fastest machine with the lowest transaction price. But the intense competition
drives the vendors to produce faster, more robust DBMS that work at lower and lower costs per
When you consider your quest for improved speed, consider that pure speed problems will be evident
at all times, not just at peak usage. They are usually the result of more complex queries, particularly
with the advanced techniques we study later like multiple joins on data that cannot be indexed.
Although SQL itself does not cure speed problems, the implementation of faster DBMS does, and those
faster DBMS will probably require communications in SQL.
It is more expensive to run a server-centric DBMS (that only speaks SQL) than it is to use a desktop
system such as Paradox or Access (for which you do not necessarily need to use SQL). First, the
software is more expensive. Second, in most cases you have to run more expensive operating systems in
order to support the DBMS. Third, you have to tune the OS differently to optimize for a DBMS than for
other applications, so you generally need a server dedicated to the DBMS. Last, you will need personnel
with more expensive qualifications.
However, the price of hardware and software is frequently the smallest item in an IT budget. As your
data center grows, at some point the reliability, performance and standardization benefits of a DBMS
that uses SQL will outweigh the cost.
A note on price you can get started on a SQL-centric DBMS for almost nothing. In fact, we have
included the 120-day trial version of Microsoft SQL Server 2000 with this book. You can download
other trial versions such as Oracle Personal Edition. But remember, that these are trial versions; when
you are ready to build your business on a SQL-centric DBMS you will be paying a considerable amount
more than you did for Access.
An alternative to using SQL statements is to write code in a procedural language like C++. The
problem with this approach is that you are then closely tied to the procedural language, the metadata
and the specific DBMS. If there is a change in the structure of the tables the code must change. If a new
DBMS is installed, most of the code must be revised to mesh with the new DBMS system of pointers,
recordset definitions, etc. But by using SQL statements almost all changes are handled by the DBMS,
behind the scenes from the programmer.
An advanced feature of SQL (which is beyond the scope of this book) is the emergence of analytical
tools that allow managers to extract business knowledge from large amounts of data. These tasks
frequently require the constructions of multi-dimensional aggregates of data. These tools are referred to
by the generic names of Online Analytical Processing (OLAP), Decision Support Systems (DSS) and
Executive Information Systems (EIS). Each vendor then has proprietary trademarks for their version.
All of these tools are only available in full-scale DBMS that require SQL for their means of
When to Skip Moving to a SQL-Centric DBMS
As much as we have talked about the power of SQL, we must be honest in describing those situations
where an alternative system would be better. You are probably better off staying with software that does
not require SQL (like Access) in most of these situations:
- Only one or a few people will use the database at once.
- The data can be stored within 50% of the disk capacity available to you.
- You are comfortable with the amount of media and operator time required by your backup plan.
- You do not expect to significantly expand the quantity of data or number of users.
- You do not plan to make the database available on an Internet site.
- The vast majority of the data you use is in a form that is not relational, such as files in folders.
If you the answer is yes to all of the above then a desktop solution like Microsoft Access will be cheaper
and easier to learn. Most of what you will need to do can be performed through wizards and the drag 'n'
drop interface, thus avoiding the need to invest in the hardware, software and learning for a SQL
As we discussed earlier on in this chapter, you can still use SQL with Access even
though the system does not require it. Hence why we can use a desktop system such as
Microsoft Access in order to learn SQL.
Development of a system in a SQL-centric, heavy-duty DBMS is more time-consuming than in a
drag and drop DBMS like Access. You may consider prototyping in Access to get a set of forms and
reports for the revision and approval process with your clients. Then with a model approved you can
create the final product with far fewer hours spent in coding SQL. You may even find that a lot of
the queries you made in Access can be cut and pasted into your final code.
Structured Query Language, tested and true over many years, is the standard way to communicate with
a database. SQL is almost universally understood by both front ends and back ends. Front ends present
a user interface such as Visual Basic, C++, ASP, and Microsoft Office applications. They have the
ability to connect to a datastore and send it a SQL statement. Back ends are Database Management
Systems such as Oracle, Microsoft SQL Server, Sybase and sometimes, desktop systems like Access.
Back ends can accept a SQL statement and return a result. SQL is also used for communication directly
between different DBMS without a user interface.
SQL is a set-based language rather then a procedural language. The programmer creates a single SQL
statement describing the desired result and sends it to the DBMS. The DBMS then uses internal code to
achieve that result. The internal code will vary among DBMS vendors, but that does not matter to the
SQL programmer using ANSI-SQL. Regardless of the DBMS the result will be the same.
In the next chapter we will discuss the specifics of setting up several back ends and front ends. Then we
study the code to create a connection and transport your SQL statements from front to back. We'll also
cover how to handle the results when they return to the front end.