Beginning SQL Programming: Pt. 1

Thursday Jun 28th 2001 by DatabaseJournal.com Staff
Share:

Part 1 of our series of excerpts from Beginning SQL Programming covers the history of SQL, standards, terminology and the current state of SQL.

Book overview

Structured Query Language is the industry standard for querying data held in relational databases. SQL can be used to create and alter the structure of databases, add new data, and access or modify existing data. Beginning SQL Programming begins by looking at what SQL is, then moves on to teach you the fundamentals of ANSI SQL, using the familiar Northwind database that comes with versions of Microsoft Access and SQL Server.

The book gives step-by-step instructions, alerts readers to common pitfalls, tricks and traps, and uses real-world examples, including two extensive case studies, to create a thorough tutorial. It will furnish readers with a firm grasp of SQL concepts that you can work with straight away, as well as provide solid foundations and challenging ideas with which you can later develop more advanced SQL techniques.

Included with this book is a 120-day evaluation enterprise edition of Microsoft's SQL Server 2000. An end user license agreement is contained in this licensed software.

What does this book cover?

  • Database structure from Tables to Normalisation
  • Basic SQL, including DDL, DCL, DML
  • The only tutorial that is comprehensive and up-to-date with coverage of SQL '99
  • Tutorial exercises tied into each element of database structure - as it appears

Who is this book for?

  • Novice database programmers and administrators
  • Highly suitable for Visual Basic novices
  • Highly suitable for web developers who have wandered into the world of dynamic page creation from databases

Wrox provides on-line discussion for the above title and surrounding technologies on P2P.

Title: Beginning SQL Programming
ISBN: 1861001800
US Price: $ 49.99
Canadian Price: C$ 74.95
UK Price: £ 38.99
Publication Date: March 2001
Pages: 723
Wrox Press Limited, US and UK.

Introduction to SQL

The computer industry is criss-crossed with languages and standards, most of which are unintelligible to each other. Here and there, true standards have emerged, and in these cases it is well worth the time of any programmer to learn them. Structured Query Language, or SQL as we commonly call it, has, over the last ten years, emerged as the standard language for programmers to talk with databases through a Database Management System (DBMS). Oracle, Microsoft SQL Server, Microsoft Access, IBM's DB2, Sybase, and virtually every other DBMS sold in the last five years use SQL. Knowledge of SQL is becoming necessary for almost every IT professional. And as the development of basic web sites becomes common among non-programmers, a grasp of SQL will help them to integrate data into their HTML pages.

This introductory chapter covers topics that students usually ask in the first hour of classes; essentially, just what do we mean by SQL? We start by explaining what SQL is and also what it is not, and we provide a brief overview of the history of the language. We'll also clarify some confusing terminology at this point. Next we look at the types of system set-up in which SQL may be used; discussing front ends, back ends and how they connect. We then demonstrate some common implementations. We spend some time explaining why SQL, as a declarative language, is so different from procedural languages like Visual Basic, C++ or COBOL. We analyze when to move to using SQL, and finally we examine the human roles within a large data center. So in this chapter we will:

  • Learn exactly what SQL is, and its history
  • Understand the term ANSI-SQL
  • Learn how SQL is implemented via a connection and how this is configured
  • Know when to use SQL as a solution
  • Understand the human roles in the data center

What is SQL & What Does It Do?

Structured Query Language (abbreviated SQL, pronounced to rhyme with equal) is a computer language for communication with databases. The communicating parties are typically a "front end" which sends a SQL Statement across a connection to a "back end" that holds the data. That statement contains instructions to create, read, change or delete data. The universal rules of the language have been established by ANSI (American National Standards Institute); a standards committee composed of database experts from industry, academia and software vendors. Therefore the SQL language is open, meaning it is not owned or controlled by any single company.

SQL is a non-proprietary (open) language whose rules have been set by a standards committee.

The strength of SQL is its universal acceptance by database vendors. By learning SQL you have a language that can be used in Visual Basic or C++ to talk to an Oracle database. You can use SQL in an ASP page to talk to Microsoft SQL Server. You can send a request for data from IBM's DB2 to a Sybase datastore. You can even use SQL within Access to describe the items you want to include in a form's list box. There has been a lot of talk and marketing about "write once, run anywhere" languages like Java. For database programmers, understanding SQL is the ticket to "learn once, profit anywhere."

SQL has many capabilities, but the most common needs in business are to:

  • Read existing data
  • Create new records holding data
  • Change existing data
  • Delete data

SQL contains key words or parts to perform these basic tasks. Learning the basics and embellishments of those commands will consume most of this book. But before we begin to look at the syntax and lists of common mistakes, we'll look at some examples of each these operations in the next few paragraphs.

Reading data is the most common task. An ANSI-SQL statement requesting a list of names of all members of your society that live in New York, can be sent from a Visual Basic application to an Oracle database. If the database is later changed to IBM's DB2, the SQL statement is still valid. The SQL language offers many permutations of the request, including the ability to return the names in various orders, only the first or last few names, a list of names without duplicates and various other requests where people require specific information from their database.

Records can be created in a datastore using SQL. A form page on a web site can gather information from a visitor and then put that data into a SQL statement. The SQL statement will instruct the datastore to insert a new record into a Microsoft SQL Server database. Since SQL is universally accepted, the same SQL statement could, for example, be used for clerks that create new records from, say, a Visual Basic application on their local network.

Data can also be changed using SQL. As in the examples above, a front end user interface such as a web page can accept changes to data and send them via a SQL statement to the datastore. But there does not have to be direct user interaction. A DB2 database running on an IBM mainframe could have a procedure to connect to another corporate mainframe running Sybase. The IBM can generate and send a SQL statement to modify the data in certain records in the Sybase database. Although the systems are from different vendors and have different ways of storing and using data, they both understand the SQL statement.

Deleting data can be performed using SQL statements. In fact SQL can accommodate very complex sets of conditions for which records to delete and which to leave intact. Portions of data within a record can be deleted.

What Does SQL Not Do?

First, SQL is not a program or a development environment such as Access or VB. SQL is a pure language. There is no front end built into SQL, that is, the language does not have user forms like an Access application or Visual Basic, and SQL has no intrinsic way to talk with web pages. SQL statements are mainly generated by a separate front-end product. Many SQL-enabled DBMS do have a tool that allows you to type a SQL statement and run it against the data. But these tools are only for design time, not deployment.

Second, SQL does not have a back end. There are no tools intrinsic to the language that can actually store data. SQL is only a standard means of communicating with software products that can hold data (a DBMS as we will see later). In other words, the data itself is contained within a DBMS such as Oracle or SQL Server. You will also need a front end, such as VB or C++. Then you can use SQL as the language for the front end to send instructions to the DBMS.

To take these first two points together, SQL is a language and not a software product. Consider a spoken language; the language itself does not contain a speaker or a listener. The language only contains the vocabulary, grammatical rules and idioms to be used by speakers and listeners. The people are not part of, or specified in any way, by the language. SQL is like a spoken language in another way. The listener and speaker can have any mother tongue, as long as they can translate that to SQL. So a computer can use any operating system and any database software, as long as the software can translate from its internal language to SQL.

SQL is a language, not a software product. Front-end software interfaces with the user. Back-end software holds data. SQL is the standard language for the two pieces of software to communicate with each other.

Third, SQL is not a procedural programming language. We will discuss the concepts of declarative versus procedural languages shortly. SQL is a set-based language, which communicates in statements that define an outcome. This is very different to procedural languages that instruct the computer how to proceed step by step to reach an objective.

Fourth, SQL does not have its own specific development environment. When you work with Access or Power Builder, for example, you have a highly evolved set of tools for:

  • laying out your user interface
  • troubleshooting
  • rapid entry of code
  • code reuse

But pure SQL does not include any of those tools (although most vendors include some tools in their products, they are not part of pure SQL). To go back to our spoken language analogy, SQL is the language. It is not a dictionary, grammar guide, printing press, loudspeaker, postal service, filing cabinet or any of the other tools we use to work with human languages. In the sense that "English" does not include these tools, neither does the SQL language.

Last, SQL is not network-aware. In the same sense that written English is not dependent on or aware of being used in telephones or e-mails, neither is SQL. Whenever a SQL statement is issued there must be a way for it to be conveyed, or connected, to its destination. Programmers create, maintain and fine- tune those connections with code in the front and back ends.

A Brief History of SQL

In this chapter we want to emphasize that SQL is both deep and wide. Deep in the sense that it is implemented at many levels of database communication, from a simple Access form list box right up to high-volume communications between mainframes. SQL is widely implemented in that almost every DBMS supports SQL statements for communication. The reason for this level of acceptance is partially explained by the amount of effort that went into the theory and development of the standards.

Early History

The father of relational databases, and thus SQL, is Dr. E.F. "Ted" Codd who worked for IBM. After Codd described a relational model for databases in 1970, IBM spent a lot of time and money researching how to implement his ideas. IBM came to market with a product named System/R in 1978.

But other companies had formed and created relational database products before IBM was ready to release System/R. The first to market was Relational Software's product named Oracle and the second was Relational Technology's Ingres. IBM then released improved products in 1982 named SQL/DS and DB2. Oracle (now from Oracle Inc.) and DB2 are still available today in nth generation forms while the Ingres technology was bought by Computer Associates.

Standards

As we said at the beginning, SQL is a standard, open language without corporate ownership. The commercial acceptance of SQL was precipitated by the formation of SQL Standards committees by the American National Standards Institute and the International Standards Organization in 1986 and 1987. Two years later they published a specification known as SQL-89. An improvement and expansion (to some 600 pages) to the standard gave the world SQL-92. We now have the third generation standard, SQL 99. The existence of standards is important for the general portability of SQL statements.

Who is ANSI? The American National Standards Institute is an administrator and coordinator of voluntary systems of standardization for the United States private sector. About 80 years ago a group of engineering societies and government agencies formed the institute to enhance the "quality of life by promoting and facilitating voluntary consensus standards and conformity." Today the Institute represents the interests of about 1,000 companies, organizations and government agencies. ANSI does not itself develop standards; rather it facilitates development by establishing consensus among qualified groups.

Current State

So the ANSI-SQL group has published three standards over the years:

  • SQL89 (SQL1)
  • SQL92 (SQL2)
  • SQL99 (SQL3)

The vast majority of the language has not changed through these updates. We can all profit from the fact that almost all of the code we wrote to SQL standards of 1989 is still perfectly usable. Or in other words, as a new student of SQL there is over ten years of SQL code out there that needs your expertise to maintain and expand.

Most DBMS are designed to meet the SQL92 standard. Virtually all of the material in this book was available in the earlier standards as well. Since many of the advanced features of SQL92 have yet to be implemented by DBMS vendors, there has been little pressure for a new version of the standard. Nevertheless a SQL99 standard was developed to address advanced issues in SQL. All of the core functions of SQL, such as adding, reading and modifying data, are the same. Therefore, the topics in this book are not affected by the new standard. As of early 2001, no vendor has implemented the SQL99 standard.

There are three areas where there is current development in SQL standards. First entails improving Internet access to data, particularly to meet the needs of the emerging XML standards. Second is integration with Java, either through Sun's Java Database Connectivity (JDBC) or through internal implementations. Last, the groups that establish SQL standards are considering how to integrate object- based programming models.

Flavors of SQL

The computer industry (like most industries) both benefits and suffers from standards. We said that SQL is an open standard, not owned by a company, and the standard comes from ANSI. Therefore the SQL standard from ANSI is considered the "pure" SQL and called ANSI-SQL.

Two problems emerge to sully this pureness. First is that every DBMS vendor wants to differentiate their DBMS products. So if you look at the feature set of each DBMS product you see that not only does the product support ANSI-SQL but it also offers extra features, enhancements or extensions that are available only from individual vendors. For example, most vendors offer a field type which auto- increments even though this is not described in the SQL standards. These additions to ANSI-SQL are generally proprietary and will not work if you try to use them on competitor's SQL products. At the level we discuss in this book there are only very minor differences between the vendors that we will note throughout the book.

Many of these features are powerful and robust, but since they vary from vendor to vendor, programmers should use them with caution. It is always safest to stick with pure SQL whenever possible; if you stray it should be with full knowledge that you are losing the portability of your statements (and perhaps even your data).

Such enhancements are not all bad because these extensions are very useful. For example, ANSI-SQL does not contain an automatic way to assign a serial number to each new record but most DBMS sold today have added this feature. Since serial numbering is so common programmers are happy to have the enhancement. However, the method of implementation is not uniform, so code written to get the serial number from data in one DBMS may not work when used with another vendor's DBMS.

In this book we are helping people that use all of the flavors of SQL. As a beginner most of what you learn is pure ANSI-SQL and thus is consistent across all vendor's products. When we talk about vendor-specific features in this book we note it. Certainly as a student, not knowing what DBMS you will be using in a job, it is best to study the purest form of the language. Learning the extensions available in T-SQL, PL/SQL, Access SQL and others will come later. For an introduction to some of these variations see our Appendix D.

Terminology

Structured Query Language and the science of databases is no different from other highly-evolved schools of thought, in that there are many words which are specific to the discipline. In the case of SQL some of these words are similar but actually refer to entirely different concepts. For example we use several words and phrases that contain the term "data." There are some shades of meaning to these terms that involve what layer of information or processes we are discussing.

  • Data: A set of information with some aspect in common For example data for employees might include social security numbers: "123-45-6789 Abe Adams, 234- 56-7890 Beth Barrett, 345-67-8901 Chris Cao"

  • Metadata: Information about the structure and organization of data in a database In database terms (which we will cover in Chapter 3), metadata typically contains descriptions of the tables and their constraints, the fields and their sizes and rules, and the relationships between the tables. Some people have phrased metadata as "data about the data". An example of metadata for an employee database would be something like:

    Table #1 name= Employees field count = 3
    Field #1 name= EmployeeID type=integer size= 9 digits (exactly)
    Field #2 name= NameFirst type= string size <=15 characters
    Field #3 name= NameLast type=string size <=30 characters

    This metadata describes that we have a set of data about employees that is organized into a table. Within that table we hold three kinds of information about each employee, their ID number, and their first and last names. The metadata goes on to describe how each type of information is maintained. By the end of Chapter 7, you will have a lot of experience with these terms, but for now note that the metadata does not include the data, only a description of how the data is organized.

  • Database: A database is the data and the metadata

    In other words a database is a set of related information as well as a description of how that information is organized. A database for employees would be the metadata we just looked at, plus the information described above under Data.

    Employee IDNameFirstNameLast
    12345AbeAdams
    56789BethBarrett
    01234ChrisCao

    Note that we frequently display data as being in a table. The table is only a logical structure, not a representation of exactly how the data sits on the disk. Different DBMS will physically store the data in different configurations, usually associated with the order that the records were added. As you will see later, the storage methodology is of no concern to us.

  • Datastore: A source of data

    (for our purposes, a source of data that has a way of responding to SQL statements)

    Since about 1985, the dominant form of database has been of a type called relational. We will study this term later in the book, but for now it is enough to understand that relational databases follow the rules for data organization established by Dr. E.F. "Ted" Codd. However, in the past few years more and more data has accumulated in non-relational forms, primarily hierarchical forms. Think, for example, of the amount of information in word-processing documents stored on your company's desktop hard drives. Clearly your company has data in those documents, but it is organized into drives, folders, subfolders and files; a hierarchical system with no rules about the data contained at each level. In an effort to emphasize data techniques that work with any source of data (relational or not), the term Datastore can be used instead of Database. The term datastore also includes older databases (generally called "flat file") that predate the relational rules of Dr. Codd. In summary, whereas the term database today generally means a modern relational database, the term datastore is much broader, including information contained in non-relational databases. Software companies have created many tools to allow SQL to talk with all types of datastores as well as proper relational databases.

  • Data Server: A Data Server manages data and is typically one or more of many servers in an environment

    Modern business computing relies on one or more central servers to hold data and handle centralized processing. In the past a single machine might perform data storage, security, print services and all other functions of the office. But as the number of computing jobs has increased (e-mail, Web Site Support, etc.), IT departments have split the job across multiple servers, each optimized to perform a very specific job. There may also be servers handling the business rules, web or network services and other functions. Any of the other servers can pass a SQL statement to the Data Server, which will execute the statement and then pass the results back to the requesting server. In simpler cases Data Servers pass information directly to clients such as desktop PCs.

    The other point to note here is that a Data Server differs from a File Server. The latter holds files organized into folders and subfolders, with the intention of keeping employee's documents on one central server. A Data Server generally abandons the folders in favor of a disk organization scheme more efficient for relational data. In a small shop a single machine may function as both a file and data server by using different organization schemes on different disk areas.

    Note that the terms "server" and "client" are relative in multiple server environments. When a web server sends a SQL statement to a data server the web server is considered to be the "client." However the web server is then considered the "server" in relation to the browser. In other words, we refer to the client as the machine or software that sent the SQL statement to the DBMS and we refer to the server as the machine (or group of machines) that receives the statement and processes it.

  • Database Management System (DBMS): Software which handles most aspects of data management including physical storage, reading and writing data, security, replication, error correction and other functions

    Common DBMS include IBM DB2, Microsoft SQL Server, Oracle, Sybase and  Informix. Although it is not a true DBMS and isn't designed for large numbers of users, Microsoft Access can also accept SQL statements. Each DBMS has a way of receiving a SQL statement and forwarding the statement to its Database Engine. The DBMS will then have a way to take a result from the engine and send it back out to the requester. Larger and more complex DBMS will be closely integrated with a specific Operating System and some actually contain an OS optimized for data management.

  • Database Engine: The part of the DBMS that works with the data

    An engine typically has code that can search, read, write, index and otherwise execute the actual interaction with the information. A database engine performs the largest share of work in most databases and thus it is the subject of an intense optimization effort by the DBMS creators. Some of the most advanced DBMS have engines which can self-optimize after monitoring the actual conditions after deployment. A DBMS will have additional features that are not part of the data engine, including query analyzing, replication and back-up tools, user management, security tools and performance monitoring.

    Database Terminology Diagram
    (click image to view full size)

  • Results: The term "Result" frequently confuses folks learning SQL because the term can be used in three ways:

    • First, and most common, we mean a set of data that is returned after a front end sends a SQL statement to a DBMS. You ask for "The first and last names of all the employees that started after 1995" and you get back a block of characters (a recordset) containing those names.

      But what if the SQL statement instructs the DBMS to modify data rather than just read it?

    • The second case is when the "result" refers to a change in the data. We do not necessarily receive any characters back from the DBMS although we may get a status message noting that the operation was successful.

    • Third, we may make changes to the data and have the DBMS report back to us on the changes. Frequently this is a notice of how many records were changed or what errors arose or a True/False that the SQL statement was executed successfully. In this case we get a result back, but it is not actual data; it is a message or code indicating the number of records changed, an error or success.

    In this book we use the term all three ways since there is no common alternative vocabulary.

  • "SQL" and "SQL Server": SQL is the name of an open-standard language for communicating with databases. Microsoft SQL Server and Sybase SQL Server are proprietary DBMS products that can handle SQL statements. Microsoft SQL Server is popular, but it is only one of many DBMS that can handle your data and your SQL statements.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved