Part 1 of our series of excerpts from Beginning SQL Programming covers the history of SQL, standards, terminology and the current state of SQL.
Structured Query Language is the industry standard for querying
data held in relational databases. SQL can be used to create and
alter the structure of databases, add new data, and access or modify
existing data. Beginning SQL Programming begins by looking at what
SQL is, then moves on to teach you the fundamentals of ANSI SQL,
using the familiar Northwind database that comes with versions of
Microsoft Access and SQL Server.
The book gives step-by-step instructions, alerts readers to common
pitfalls, tricks and traps, and uses real-world examples, including
two extensive case studies, to create a thorough tutorial. It will
furnish readers with a firm grasp of SQL concepts that you can work
with straight away, as well as provide solid foundations and
challenging ideas with which you can later develop more advanced SQL
Included with this book is a 120-day evaluation enterprise edition
of Microsoft's SQL Server 2000. An end user license agreement is
contained in this licensed software.
What does this book cover?
- Database structure from Tables to Normalisation
- Basic SQL, including DDL, DCL, DML
- The only tutorial that is comprehensive and up-to-date with
coverage of SQL '99
- Tutorial exercises tied into each element of database
structure - as it appears
Who is this book for?
- Novice database programmers and administrators
- Highly suitable for Visual Basic novices
- Highly suitable for web developers who have wandered into the
world of dynamic page creation from databases
Wrox provides on-line
discussion for the above title and surrounding technologies on
Introduction to SQL
The computer industry is criss-crossed with languages and standards, most of which are unintelligible to
each other. Here and there, true standards have emerged, and in these cases it is well worth the time of
any programmer to learn them.
Structured Query Language, or SQL as we commonly call it, has, over
the last ten years, emerged as the standard language for programmers to talk with databases through a
Database Management System (DBMS). Oracle, Microsoft SQL Server, Microsoft Access, IBM's DB2,
Sybase, and virtually every other DBMS sold in the last five years use SQL. Knowledge of SQL is
becoming necessary for almost every IT professional. And as the development of basic web sites
becomes common among non-programmers, a grasp of SQL will help them to integrate data into their
This introductory chapter covers topics that students usually ask in the first hour of classes; essentially,
just what do we mean by SQL? We start by explaining what SQL is and also what it is not, and we
provide a brief overview of the history of the language. We'll also clarify some confusing terminology at
this point. Next we look at the types of system set-up in which SQL may be used; discussing front ends,
back ends and how they connect. We then demonstrate some common implementations. We spend
some time explaining why SQL, as a declarative language, is so different from procedural languages like
Visual Basic, C++ or COBOL. We analyze when to move to using SQL, and finally we examine the
human roles within a large data center. So in this chapter we will:
- Learn exactly what SQL is, and its history
- Understand the term ANSI-SQL
- Learn how SQL is implemented via a connection and how this is configured
- Know when to use SQL as a solution
- Understand the human roles in the data center
What is SQL & What Does It Do?
Structured Query Language (abbreviated SQL, pronounced to rhyme with equal) is a computer language
for communication with databases. The communicating parties are typically a "front end" which sends a
SQL Statement across a connection to a "back end" that holds the data. That statement contains
instructions to create, read, change or delete data. The universal rules of the language have been
established by ANSI (American National Standards Institute); a standards committee composed of
database experts from industry, academia and software vendors. Therefore the SQL language is open,
meaning it is not owned or controlled by any single company.
SQL is a non-proprietary (open) language whose rules have been set by a standards committee.
The strength of SQL is its universal acceptance by database vendors. By learning SQL you have a
language that can be used in Visual Basic or C++ to talk to an Oracle database. You can use SQL in an
ASP page to talk to Microsoft SQL Server. You can send a request for data from IBM's DB2 to a Sybase
datastore. You can even use SQL within Access to describe the items you want to include in a form's list
box. There has been a lot of talk and marketing about "write once, run anywhere" languages like Java.
For database programmers, understanding SQL is the ticket to "learn once, profit anywhere."
SQL has many capabilities, but the most common needs in business are to:
- Read existing data
- Create new records holding data
- Change existing data
- Delete data
SQL contains key words or parts to perform these basic tasks. Learning the basics and embellishments
of those commands will consume most of this book. But before we begin to look at the syntax and lists
of common mistakes, we'll look at some examples of each these operations in the next few paragraphs.
Reading data is the most common task. An ANSI-SQL statement requesting a list of names of all
members of your society that live in New York, can be sent from a Visual Basic application to an Oracle
database. If the database is later changed to IBM's DB2, the SQL statement is still valid. The SQL
language offers many permutations of the request, including the ability to return the names in various
orders, only the first or last few names, a list of names without duplicates and various other requests
where people require specific information from their database.
Records can be created in a datastore using SQL. A form page on a web site can gather information
from a visitor and then put that data into a SQL statement. The SQL statement will instruct the
datastore to insert a new record into a Microsoft SQL Server database. Since SQL is universally
accepted, the same SQL statement could, for example, be used for clerks that create new records from,
say, a Visual Basic application on their local network.
Data can also be changed using SQL. As in the examples above, a front end user interface such as a web
page can accept changes to data and send them via a SQL statement to the datastore. But there does not
have to be direct user interaction. A DB2 database running on an IBM mainframe could have a
procedure to connect to another corporate mainframe running Sybase. The IBM can generate and send
a SQL statement to modify the data in certain records in the Sybase database. Although the systems are
from different vendors and have different ways of storing and using data, they both understand the SQL
Deleting data can be performed using SQL statements. In fact SQL can accommodate very complex sets
of conditions for which records to delete and which to leave intact. Portions of data within a record can
What Does SQL Not Do?
First, SQL is not a program or a development environment such as Access or VB. SQL is a pure
language. There is no front end built into SQL, that is, the language does not have user forms like an
Access application or Visual Basic, and SQL has no intrinsic way to talk with web pages. SQL
statements are mainly generated by a separate front-end product. Many SQL-enabled DBMS do have a
tool that allows you to type a SQL statement and run it against the data. But these tools are only for
design time, not deployment.
Second, SQL does not have a back end. There are no tools intrinsic to the language that can actually
store data. SQL is only a standard means of communicating with software products that can hold data (a
DBMS as we will see later). In other words, the data itself is contained within a DBMS such as Oracle or
SQL Server. You will also need a front end, such as VB or C++. Then you can use SQL as the language
for the front end to send instructions to the DBMS.
To take these first two points together, SQL is a language and not a software product. Consider a
spoken language; the language itself does not contain a speaker or a listener. The language only
contains the vocabulary, grammatical rules and idioms to be used by speakers and listeners. The people
are not part of, or specified in any way, by the language. SQL is like a spoken language in another way.
The listener and speaker can have any mother tongue, as long as they can translate that to SQL. So a
computer can use any operating system and any database software, as long as the software can translate
from its internal language to SQL.
SQL is a language, not a software product. Front-end software interfaces with the
user. Back-end software holds data. SQL is the standard language for the two pieces
of software to communicate with each other.
Third, SQL is not a procedural programming language. We will discuss the concepts of declarative
versus procedural languages shortly. SQL is a set-based language, which communicates in statements
that define an outcome. This is very different to procedural languages that instruct the computer how to
proceed step by step to reach an objective.
Fourth, SQL does not have its own specific development environment. When you work with Access or
Power Builder, for example, you have a highly evolved set of tools for:
- laying out your user interface
- rapid entry of code
- code reuse
But pure SQL does not include any of those tools (although most vendors include some tools in their
products, they are not part of pure SQL). To go back to our spoken language analogy, SQL is the
language. It is not a dictionary, grammar guide, printing press, loudspeaker, postal service, filing cabinet
or any of the other tools we use to work with human languages. In the sense that "English" does not
include these tools, neither does the SQL language.
Last, SQL is not network-aware. In the same sense that written English is not dependent on or aware of
being used in telephones or e-mails, neither is SQL. Whenever a SQL statement is issued there must be
a way for it to be conveyed, or connected, to its destination. Programmers create, maintain and fine-
tune those connections with code in the front and back ends.
A Brief History of SQL
In this chapter we want to emphasize that SQL is both deep and wide. Deep in the sense that it is
implemented at many levels of database communication, from a simple Access form list box right up to
high-volume communications between mainframes. SQL is widely implemented in that almost every
DBMS supports SQL statements for communication. The reason for this level of acceptance is partially
explained by the amount of effort that went into the theory and development of the standards.
The father of relational databases, and thus SQL, is Dr. E.F. "Ted" Codd who worked for IBM. After
Codd described a relational model for databases in 1970, IBM spent a lot of time and money
researching how to implement his ideas. IBM came to market with a product named System/R in 1978.
But other companies had formed and created relational database products before IBM was ready to
release System/R. The first to market was Relational Software's product named Oracle and the second
was Relational Technology's Ingres. IBM then released improved products in 1982 named SQL/DS and
DB2. Oracle (now from Oracle Inc.) and DB2 are still available today in nth generation forms while the
Ingres technology was bought by Computer Associates.
As we said at the beginning, SQL is a standard, open language without corporate ownership. The
commercial acceptance of SQL was precipitated by the formation of SQL Standards committees by the
American National Standards Institute and the International Standards Organization in 1986 and 1987.
Two years later they published a specification known as SQL-89. An improvement and expansion (to
some 600 pages) to the standard gave the world SQL-92. We now have the third generation standard,
SQL 99. The existence of standards is important for the general portability of SQL statements.
Who is ANSI? The American National Standards Institute is an administrator and
coordinator of voluntary systems of standardization for the United States private
sector. About 80 years ago a group of engineering societies and government agencies
formed the institute to enhance the "quality of life by promoting and facilitating
voluntary consensus standards and conformity." Today the Institute represents the
interests of about 1,000 companies, organizations and government agencies. ANSI
does not itself develop standards; rather it facilitates development by establishing
consensus among qualified groups.
So the ANSI-SQL group has published three standards over the years:
- SQL89 (SQL1)
- SQL92 (SQL2)
- SQL99 (SQL3)
The vast majority of the language has not changed through these updates. We can all profit from the
fact that almost all of the code we wrote to SQL standards of 1989 is still perfectly usable. Or in other
words, as a new student of SQL there is over ten years of SQL code out there that needs your expertise
to maintain and expand.
Most DBMS are designed to meet the SQL92 standard. Virtually all of the material in this book was
available in the earlier standards as well. Since many of the advanced features of SQL92 have yet to be
implemented by DBMS vendors, there has been little pressure for a new version of the standard.
Nevertheless a SQL99 standard was developed to address advanced issues in SQL. All of the core
functions of SQL, such as adding, reading and modifying data, are the same. Therefore, the topics in
this book are not affected by the new standard. As of early 2001, no vendor has implemented the
There are three areas where there is current development in SQL standards. First entails improving
Internet access to data, particularly to meet the needs of the emerging XML standards. Second is
integration with Java, either through Sun's Java Database Connectivity (JDBC) or through internal
implementations. Last, the groups that establish SQL standards are considering how to integrate object-
based programming models.
Flavors of SQL
The computer industry (like most industries) both benefits and suffers from standards. We said that SQL
is an open standard, not owned by a company, and the standard comes from ANSI. Therefore the SQL
standard from ANSI is considered the "pure" SQL and called ANSI-SQL.
Two problems emerge to sully this pureness. First is that every DBMS vendor wants to differentiate
their DBMS products. So if you look at the feature set of each DBMS product you see that not only does
the product support ANSI-SQL but it also offers extra features, enhancements or extensions that are
available only from individual vendors. For example, most vendors offer a field type which auto-
increments even though this is not described in the SQL standards. These additions to ANSI-SQL are
generally proprietary and will not work if you try to use them on competitor's SQL products. At the
level we discuss in this book there are only very minor differences between the vendors that we will
note throughout the book.
Many of these features are powerful and robust, but since they vary from vendor to vendor,
programmers should use them with caution. It is always safest to stick with pure SQL whenever
possible; if you stray it should be with full knowledge that you are losing the portability of your
statements (and perhaps even your data).
Such enhancements are not all bad because these extensions are very useful. For example, ANSI-SQL
does not contain an automatic way to assign a serial number to each new record but most DBMS sold
today have added this feature. Since serial numbering is so common programmers are happy to have
the enhancement. However, the method of implementation is not uniform, so code written to get the
serial number from data in one DBMS may not work when used with another vendor's DBMS.
In this book we are helping people that use all of the flavors of SQL. As a beginner
most of what you learn is pure ANSI-SQL and thus is consistent across all vendor's
products. When we talk about vendor-specific features in this book we note it.
Certainly as a student, not knowing what DBMS you will be using in a job, it is best to
study the purest form of the language. Learning the extensions available in T-SQL,
PL/SQL, Access SQL and others will come later. For an introduction to some of these
variations see our Appendix D.
Structured Query Language and the science of databases is no different from other highly-evolved
schools of thought, in that there are many words which are specific to the discipline. In the case of SQL
some of these words are similar but actually refer to entirely different concepts. For example we use
several words and phrases that contain the term "data." There are some shades of meaning to these
terms that involve what layer of information or processes we are discussing.
Data: A set of information with some aspect in common
For example data for employees might include social security numbers: "123-45-6789 Abe Adams, 234-
56-7890 Beth Barrett, 345-67-8901 Chris Cao"
Metadata: Information about the structure and organization of data in a database
In database terms (which we will cover in Chapter 3), metadata typically contains descriptions of the
tables and their constraints, the fields and their sizes and rules, and the relationships between the tables.
Some people have phrased metadata as "data about the data". An example of metadata for an employee
database would be something like:
|Table #1 ||name= Employees ||field count = 3|
|Field #1 ||name= EmployeeID ||type=integer size= 9 digits (exactly)|
|Field #2 ||name= NameFirst ||type= string size <=15 characters|
|Field #3 ||name= NameLast ||type=string size <=30 characters|
This metadata describes that we have a set of data about employees that is organized into a table.
Within that table we hold three kinds of information about each employee, their ID number, and their
first and last names. The metadata goes on to describe how each type of information is maintained. By
the end of Chapter 7, you will have a lot of experience with these terms, but for now note that the
metadata does not include the data, only a description of how the data is organized.
Database: A database is the data and the metadata
In other words a database is a set of related information as well as a description of how that information
is organized. A database for employees would be the metadata we just looked at, plus the information
described above under Data.
Note that we frequently display data as being in a table. The table is only a logical
structure, not a representation of exactly how the data sits on the disk. Different
DBMS will physically store the data in different configurations, usually associated
with the order that the records were added. As you will see later, the storage
methodology is of no concern to us.
Datastore: A source of data
(for our purposes, a source of data that has a way of responding to SQL statements)
Since about 1985, the dominant form of database has been of a type called relational. We will study this
term later in the book, but for now it is enough to understand that relational databases follow the rules
for data organization established by Dr. E.F. "Ted" Codd. However, in the past few years more and
more data has accumulated in non-relational forms, primarily hierarchical forms. Think, for example, of
the amount of information in word-processing documents stored on your company's desktop hard
drives. Clearly your company has data in those documents, but it is organized into drives, folders,
subfolders and files; a hierarchical system with no rules about the data contained at each level. In an
effort to emphasize data techniques that work with any source of data (relational or not), the term
Datastore can be used instead of Database. The term datastore also includes older databases (generally
called "flat file") that predate the relational rules of Dr. Codd. In summary, whereas the term database
today generally means a modern relational database, the term datastore is much broader, including
information contained in non-relational databases. Software companies have created many tools to
allow SQL to talk with all types of datastores as well as proper relational databases.
Data Server: A Data Server manages data and is typically one or more of many servers in an environment
Modern business computing relies on one or more central servers to hold data and handle centralized
processing. In the past a single machine might perform data storage, security, print services and all
other functions of the office. But as the number of computing jobs has increased (e-mail, Web Site
Support, etc.), IT departments have split the job across multiple servers, each optimized to perform a
very specific job. There may also be servers handling the business rules, web or network services and
other functions. Any of the other servers can pass a SQL statement to the Data Server, which will
execute the statement and then pass the results back to the requesting server. In simpler cases Data
Servers pass information directly to clients such as desktop PCs.
The other point to note here is that a Data Server differs from a File Server. The latter holds files
organized into folders and subfolders, with the intention of keeping employee's documents on one
central server. A Data Server generally abandons the folders in favor of a disk organization scheme
more efficient for relational data. In a small shop a single machine may function as both a file and data
server by using different organization schemes on different disk areas.
Note that the terms "server" and "client" are relative in multiple server
environments. When a web server sends a SQL statement to a data server the web
server is considered to be the "client." However the web server is then considered the
"server" in relation to the browser. In other words, we refer to the client as the
machine or software that sent the SQL statement to the DBMS and we refer to the
server as the machine (or group of machines) that receives the statement and
Database Management System (DBMS): Software which handles most aspects of data
management including physical storage, reading and writing data, security, replication, error
correction and other functions
Common DBMS include IBM DB2, Microsoft SQL Server, Oracle, Sybase and Informix. Although it is
not a true DBMS and isn't designed for large numbers of users, Microsoft Access can also accept SQL
statements. Each DBMS has a way of receiving a SQL statement and forwarding the statement to its
Database Engine. The DBMS will then have a way to take a result from the engine and send it back out
to the requester. Larger and more complex DBMS will be closely integrated with a specific Operating
System and some actually contain an OS optimized for data management.
Database Engine: The part of the DBMS that works with the data
An engine typically has code that can search, read, write, index and otherwise execute the actual
interaction with the information. A database engine performs the largest share of work in most
databases and thus it is the subject of an intense optimization effort by the DBMS creators. Some of the
most advanced DBMS have engines which can self-optimize after monitoring the actual conditions after
deployment. A DBMS will have additional features that are not part of the data engine, including query
analyzing, replication and back-up tools, user management, security tools and performance monitoring.
(click image to view full size)
Results: The term "Result" frequently confuses folks learning SQL because the term can be
used in three ways:
First, and most common, we mean a set of data that is returned after a front end sends a
SQL statement to a DBMS. You ask for "The first and last names of all the employees that
started after 1995" and you get back a block of characters (a recordset) containing those
But what if the SQL statement instructs the DBMS to modify data rather than just read it?
The second case is when the "result" refers to a change in the data. We do not necessarily
receive any characters back from the DBMS although we may get a status message noting
that the operation was successful.
Third, we may make changes to the data and have the DBMS report back to us on the
changes. Frequently this is a notice of how many records were changed or what errors
arose or a True/False that the SQL statement was executed successfully. In this case
we get a result back, but it is not actual data; it is a message or code indicating the
number of records changed, an error or success.
In this book we use the term all three ways since there is no common alternative vocabulary.
"SQL" and "SQL Server": SQL is the name of an open-standard language for communicating
with databases. Microsoft SQL Server and Sybase SQL Server are proprietary DBMS products
that can handle SQL statements. Microsoft SQL Server is popular, but it is only one of many
DBMS that can handle your data and your SQL statements.