Normalization is almost universally applied to relational databases such as MySQL in order to optimize tables for general-purpose querying and to rid them of certain undesirable characteristics that could lead to a loss of data integrity. Doing so tends to promote better accuracy of queries, but it also leads to queries that take a little more work to develop, as the data may be spread amongst several tables. In todays article, well learn how to fetch data from multiple tables by using joins.
There are two accepted syntax styles for writing joins: ANSI-style joins and theta-style joins. ANSI syntax uses the JOIN and ON keywords, as in the following example:
SELECT field1, field2, FROM my_table t1 JOIN my_other_table t2 ON t1.primary_id_field = t2.foreign_key_id_field WHERE t1.lastname = 'Smith';
The JOIN keyword is used to separate the names of the tables being joined, and the ON clause contains the relation showing which column is being used as the join key.
In theta-style syntax, the table joins are simply added to the WHERE clause:
SELECT field1, field2, FROM my_table t1, My_other_table t2 WHERE t1.primary_id_field = t2.foreign_key_id_field AND t1.lastname = 'Smith';
MySQL also supports a nonstandard extension of the ANSI syntax that can be used as a sort of shorthand for when the join column has the same name in both joined tables:
SELECT field1, field2, FROM my_table t1 JOIN my_other_table t2 USING (t1.id_field) WHERE t1.lastname = 'Smith';
So Which Syntax is Best?
The ANSI syntax is generally preferable to theta style because its usually easier to read and understand, particularly when writing joins involving numerous tables. There are also some types of joins that cant be written using theta-style notation in MySQL.
In order to join tables together, there has to be some common data that allow those tables to be connected in some meaningful way. Although its possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.
To illustrate, well perform queries against the following sample tables:
*Note: In a normalized database, the Manufacturer ID would be stored in the Models table. Here, I included the full description to better illustrate which manufacturers are associated with which models.
The first table contains automobile manufacturers; the second, some models that are built by the first several auto manufacturers. The common data between the two tables is the manufacturer, which is linked by manufacturer ID.
Now well extract some data from the tables, using different join types in ANSI syntax.
An inner join is defined as a join in which unmatched rows from either table are not to be returned. In other words, the rows must match in both tables in order to be included in the result set.
SELECT t1.description AS 'Manufacturer', t2.description AS 'Model' FROM manufacturer t1 INNER JOIN model t2 ON t1.id = t2.manufacturer_id WHERE t1.description = 'ACURA';
The "INNER" keyword is not required, but it is considered good practice to include it.
Typing the query above in the MySQL Command Line Client produces the following:
mysql> SELECT t1.description AS 'Manufacturer', -> t2.description AS 'Model' -> FROM manufacturer t1 -> INNER JOIN model t2 -> ON t1.id = t2.manufacturer_id -> WHERE t1.description = 'ACURA'; +--------------+---------+ | Manufacturer | Model | +--------------+---------+ | ACURA | INTEGRA | | ACURA | CL | | ACURA | LEGEND | | ACURA | RL | | ACURA | NSX | | ACURA | TL | | ACURA | VIGOR | | ACURA | EL | | ACURA | NSX-T | +--------------+---------+ 9 rows in set (0.00 sec)
Outer joins will return records in one table that arent matched in another. Outer joins can be further divided into the two types of left and right. In a left outer join, all records from the first (left-hand) table in a join that meet any conditions set in the WHERE clause are returned, whether or not theres a match in the second (right-hand) table:
mysql> SELECT t1.description AS 'Manufacturer', -> t2.description AS 'Model' -> FROM manufacturer t1 -> LEFT JOIN model t2 -> ON t1.id = t2.manufacturer_id -> WHERE t1.description = 'New Car Co'; +--------------+----------------+ | Manufacturer | Model | +--------------+----------------+ | New Car Co | (NULL) | +--------------+----------------+ 19 rows in set (0.00 sec)
The New Car Co is returned even though there are no associated models in the model table.
Similar to the left outer join, a right outer join returns all records from the second (right-hand) table in a join that meet any conditions set in the WHERE clause, whether or not theres a match in the first (left-hand) table:
mysql> SELECT t1.description AS 'Manufacturer', -> t2.description AS 'Model' -> FROM manufacturer t1 -> RIGHT JOIN model t2 -> ON t1.id = t2.manufacturer_id -> WHERE t2.description = 'Custom'; +--------------+--------+ | Manufacturer | Model | +--------------+--------+ | (NULL) | Custom | +--------------+--------+ 1 row in set (0.00 sec)
The Custom model is returned even though there is no associated manufacturer.
The cross-join, also referred to as a Cartesian product, returns all the rows in all the tables listed in the query. Each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables.
Note that, most of the time, we do not want a Cartesian join, and we end up with one because we failed to provide a filter on the join. Result sets can get large quickly because the amount of data in the select is the number of rows in Table A multiplied by the number of rows in Table B. If you have more than two tables this multiplies at an exponential rate.
If we actually want a Cartesian join, then we should use the ANSI cross join to tell others reading the script that we actually wanted a Cartesian join. So why would we want one? One reason might be to produce all the combinations of 1, 2 and 3, which could be used as part of a password or ID generation process:
mysql> SELECT CONCAT( CAST(t1.num AS CHAR), -> CAST(t2.num AS CHAR)) AS combinations -> FROM numbers t1, numbers t2; +--------------+ | combinations | +--------------+ | 11 | | 21 | | 31 | | 12 | | 22 | | 32 | | 13 | | 23 | | 33 | +--------------+ 9 rows in set (0.00 sec)
Now youve got every permutation of number combinations for two digits!
Knowing how to link tables is of great assistance in extracting data from normalized databases, but it may not always be enough. There will be times that no combination of joins will suffice to properly filter the data. In those cases, it may be necessary to use temporary tables. Well be looking at those in the next article.