Many web developers are self-taught, learning HTML, then moving on to a programming language such as PHP. From there, they often learn to integrate this with a database. Too few though have a good theoretical knowledge of databases. Mention foreign keys, or referential integrity, and you're met with a blank stare. Small databases can be easily designed with little database theory knowledge. But large databases can easily get out of hand when badly designed, leading to poor performance, and resulting in the whole database needing to be rebuilt later. This article from Ian Gilfillan is a brief introduction to the topic of relational databases and will hopefully whet your appetite for further exploration.
Page 2: Table Keys
Many web developers are self-taught, learning HTML, then moving
on to a programming language such as PHP. From there, they often
learn to integrate this with a database. Too few though have a
good theoretical knowledge of databases. Mention foreign keys, or
referential integrity, and you're met with a blank stare. Small
databases can be easily designed with little database theory
knowledge. But large databases can easily get out of hand when
badly designed, leading to poor performance, and resulting in
the whole database needing to be rebuilt later. This article is
a brief introduction to the topic of relational databases, and
will hopefully whet your appetite for further exploration.
The Relational Database Model
A database can be understood as a collection of related files.
How those files are related depends on the model used. Early
models included the hierarchical model (where files are related in
a parent/child manner, with each child file having at most one
parent file), and the network model (where files are related as
owners and members, similar to the network model except that each
member file can have more than one owner).
The relational database model was a huge step forward, as it
allowed files to be related by means of a common field. In order
to relate any two files, they simply need to have a common field,
which makes the model extremely flexible.
|Thrones of Darkness||2|
These two tables relate through the code field in the poet table,
and the poet field in the poem table. We can see who wrote the
poem 'Once' by following the relationship, and see that it was
poet 3, or Tatumkhulu Watson.
In 1970, when E.F. Codd developed the model, it was thought to be
hopelessly impractical, as the machines of the time could not
cope with the overhead necessary to maintain the model. Of course,
hardware since then has come on in huge strides, so that today
even the most basic of PC's can run sophisticated relational
database management systems. Together with this went the
SQL is relatively easy to learn and allows
people to quickly learn how to perform queries on a relational
database. This simplicity is part of the reason that relational
databases now form the majority of databases to be found.
An understanding of relational databases requires an understanding
of some of the basic terms.
- Data are the values stored in the database. On its own, data
means very little. "43156" is an example.
- Information is data that is processed to have a meaning. For
example, "43156" is the population of the town of Littlewood.
- A database is a collection of tables.
- Each table contains records, which are the horizontal rows
in the table. These are also called tuples.
- Each record contains fields, which are the vertical columns
of the table. These are also called attributes. An example would
be a product record.
- Fields can be of many different types. There are many standard
types, and each DBMS (database management system, such as
can also have their own specific types, but generally they fall
into at least three kinds - character, numeric and date. For
example, a product description would be a character field, a
product release date would be a date field, and a product
quantity in stock would be a numeric field.
- The domain refers to the possible values each field can
contain (it's sometimes called a field specification). For
example, a field entitled "marital_status" may be limited to the
values "Married" and "Unmarried".
- A field is said to contain a null value when it contains
nothing at all. Fields can create complexities in calculations
and have consequences for data accuracy. For this reason, many
fields are specifically set not to contain NULL values.
- A key is a logical way to access a record in a table. For
example, in the product table, the product_id field could allow
us to uniquely identify a record. A key that uniquely identifies
a record is called a primary key.
- An index is a physical mechanism that improves the performance
of a database. Indexes are often confused with keys. However,
strictly speaking they are part of the physical structure, while
keys are part of the logical structure.
- A view is a virtual table made up of a subset of the actual
- A one-to-one (1:1) relationship occurs where, for each instance
of table A, only one instance of table B exists, and vice-versa.
For example, each vehicle registration is associated with only one
engine number, and vice-versa
- A one-to-many (1:m) relationship is where, for each instance
of table A, many instances of the table B exist, but for each
instance of table B, only once instance of table A exists. For
example, for each artist, there are many paintings. Since it is
a one-to-many relationship, and not many-to-many, in this case
each painting can only have been painted by one artist.
- A many to many (m:n) relationship occurs where, for each
instance of table A, there are many instances of table B, and for
each instance of table B, there are many instances of the table A.
For example, a poetry anthology can have many authors, and each
author can appear in many poetry anthologies.
- A mandatory relationship exists where, for each instance of
table A, one or more instances of table B must exist. For example,
for a poetry anthology to exist, there must exist at least one
poem in the anthology. The reverse is not necessarily true though,
as for a poem to exist, there is no need for it to appear in a
- An optional relationship is where, for each instance of table
A, there may exist instances of table B. For example, a poet does
not necessarily have to appear in a poetry anthology. The reverse
isn't necessarily true though, for example for the anthology to
be listed, it must have some poets.
- Data integrity describes the accuracy, validity and
consistency of data. An example of poor integrity would be where
a poet's name is stored differently in two different places.
- Database normalization
is a technique that helps us to reduce
the occurrence of data anomalies and poor data integrity.
A key is the tool to unlock access to database tables. By knowing
the key, we know how to locate specific records, and traverse the
relationships between tables.
A candidate key is any field, or combination of fields, that
uniquely identifies a record. The field/s of the candidate key
must contain unique values (if the values were duplicated, they
would be no longer identify unique records), and cannot contain a
A primary key is the candidate key that has been chosen to
identify unique records in a particular table.
Examine the following table:
At first it seems there are two candidate keys for this table.
Both code and the combination of first_name and
surname would suffice. It is always better to choose the
candidate key with the least number of fields for the actual
primary key, so we would choose code in this case. Also, if we
thought about it some more, we'd soon realize that there is a
possibility of the second combination not being unique. The
combination of first_name and surname could be duplicated. So to
consider this for a choice of primary key, we'd have to be sure
that none of our poets could ever have the same name. This is the
reason we assign code fields. Codes are assigned by our system,
so we can ensure there are never any duplicate codes. After the
primary key has been assigned, any remaining candidate keys are
labeled alternate keys.
A relation between two tables is created by creating a common
field to the two tables. The common field must be a primary key
to the one table (the table that would be the one component of
the one-to-many relationship). Consider a relation between a poet
table and a poetry anthology table. The relation is of little use
if instead of using the primary key from the poet table, code, to
create the relationship with the anthology table, we use another
field that is not unique, such as the poet's surname. We would
never know for certain which poet we're referring to in the
poetry anthology. The poet_code
field is called the foreign
key in the anthology table, which means it's the primary key(code) in
the poet table.
Foreign keys allow us to ensure what is called referential
integrity. This means that if a foreign key contains a value,
the value must refer to an existing record in the related table.
For example, take a look at these two tables:
|Thrones of Darkness||2|
Referential integrity exists here, as all the poets listed in the
poem table exist in the poet table. Stephen Serote now pulls out
of the anthology, and we delete him from the poet table. In a
situation where referential integrity is not enforced, he would
not appear in the poet table, but the code he used to have would
still appear in the poem table. So, when we look up the poet who
wrote Thrones of Darkness (which is poet code 2) we are sent to a non-existent record.
|Thrones of Darkness||2|
The delete results in poor data integrity.
Foreign keys also allow what are called cascading deletes and
updates. This means that we can delete Stephen Serote from the
poet table, and all poems written by him, with one SQL statement.
The delete "cascades" through the relevant tables, removing all
Foreign keys can contain null values if the relationship is optional,
which indicates that no relationship exists. If the relationship is mandatory, the foreign key cannot
Page 3: Views
Views are virtual tables. They do not contain any data themselves -
rather they're a structure to allow us to access data, or a subset
of the data. A view can consist of a subset of one table, such
as in this example:
- First name
This is the complete list of fields from the poet table
This view could be used to allow others to see the poet's code,
name and surname, but not allow them access to personal
information. Or, a view could be a combination of a number of
tables, such as in this example
- First name
- Poem code
- Poet code
- Poem title
Views are often used for security purposes. Junior developers
may need access to certain portions of a table, but they do not
need access to all the data. What they don't need, even if it is
from the same table, is hidden and safe from manipulation or viewing.
Also, views allow SQL queries to be much simpler. For example,
without views, a developer may have to use the following query:
SELECT first_name,surname,poem FROM poet,poem
WHERE poem.poet_code=poet.code AND poet.title='Once';
With the view, a developer could do the same with:
SELECT first_name,surname,poem from subscriber_view;
Much more simple for a junior developer who hasn't yet learnt how
joins across multiple tables,
and less hassle for a senior developer too!
This has been a brief introduction to relational databases.
Hopefully it's put some of the terms you've come across in
context, and whetted your appetite to explore further.