A fundamental concept when working with relational databases
is that of Joins. Simply put, a Join allows you to combine records from two
separate tables. In Access, the most common Join is the Inner Join, which combines
records from two tables whenever there are matching values in a common field.
Another common Join is the Outer Join, which comes in two forms. A Left Outer
Join includes all of the records from the first (left) of two tables, even if there
are no matching values for records in the second (right) table, while a Right
Outer Join includes all of the records from the second (right) of two tables,
even if there are no matching values for records in the first (left) table.
Less well known, perhaps, is the Cartesian product, which produces every
possible combination of records between the two tables. In other words, if one
table contains five records and the other table contains four records, the Cartesian
product would contain twenty (5 x 4).
Creating a Cartesian Product
Queries are how Joins are used in Access, and most people use
the Query Builder to create their queries. While you're probably familiar with
building queries in Access, just allow me to briefly discuss how to do this, so
that we're all on the same page.
Figure 1 illustrates the details of two simple tables.
Figure 1: Details of two tables, each with two fields.
Table1 has five rows, Table2 has four rows.
If your tables have been created with Relationships (as they
should), when you drag related tables into the Query Builder, they will already
be joined by default. If no lines are present, you can add them by dragging the
field from one table and dropping on the related field in the second table.
Figure 2 shows how you would create a query on the two tables shown in Figure
1. In this case, I'm joining the two tables on Field2, and displaying all four
fields in the results.
The SQL that's generated for that case is shown in Listing 1,
while Figure 3 shows the results of running the query.
SELECT Table1.Field1,
Table1.Field2,
Table2.Field1,
Table2.Field2
FROM Table1 INNER JOIN Table2
ON Table1.Field2 = Table2.Field2;
Listing 1: The SQL for the query shown in Figure 2.
Note how the INNER JOIN operation relates the two tables.
Figure 3: The results of running the INNER JOIN query.
Since the query joins the two tables on Field2, it shows four rows: the rows in
each table with the same values for Field2.
To create a Cartesian product, you do not want any lines
connecting your tables: delete any that might be present.
Figure 4 illustrates how you can create a Cartesian product
through the Query Builder, while Listing 2 shows the SQL that's generated for
that case. Note that even if you do not explicitly refer to one (or more) of
the tables (i.e.: you do not include any fields from the table in the list of
fields to be displayed), the query will still result in a Cartesian product.
Figure
4: Creating a Cartesian product in the Query Builder. Note that there is no
line joining the two tables
SELECT Table1.Field1,
Table1.Field2,
Table2.Field1,
Table2.Field2
FROM Table1, Table2;
Listing 2: The SQL for the Cartesian product shown in Figure 4
As is shown in Figure 5, running this query will result in a
total of twenty rows.
Figure
5: The results of running the Cartesian product query. Each row in Table1 is
joined to each row in Table2, resulting in a total of twenty rows.
The sample database that accompanies
this article also shows examples of using Left Join and Right Join on the same
two tables, so that you can see the differences between the four queries.
Using Cartesian Products
While some of the examples I'm about to present may seem a
little contrived, it's my intent to show you that there are legitimate uses for
Cartesian products.
Example 1: Making a query read-only
By definition, Cartesian products are always read-only. That
means should you want to guarantee that the users can't just change the AllowAdditions/AllowDeletions
or AllowEdits properties of a form and change the data, you could set the RecordSource
of the report to a query that's a Cartesian product.
The simplest way to do this would be to have a table with a
single row in it. (What's in that row is irrelevant). You'd simply have to add
a reference to that single row table in the SQL of the query, and you've made
the query read-only.
In Example 3, I'll talk about the Anniversaries table that's
included in the sample database that accompanies this article. For now, suffice
it to say that it's a table that has three fields: a MonthNumber, a DayNumber
and a Description that's intended to be used to represent events that occur
each year on a given date.
Query qryExample1a_UpdatableQuery is a typical query such as you'd
use for the RecordSource of a form to allow you to update the table. Its SQL is
shown in Listing 3.
SELECT Anniversaries.MonthNumber,
Anniversaries.DayNumber,
Anniversaries.Description
FROM Anniversaries
ORDER BY MonthNumber, DayNumber;
Listing 3: The SQL for qryExample1a_UpdatableQuery, an updatable query.
Note, though, that if I have a simple table with a single row
in it (such as is shown in Figure 6), I can use it in qryExample1b_NonupdatableQuery
(as shown in Listing 4), thus making it read-only.

Figure
6: As its name implies, TableWithOneRow has a single row of data in it. Since
we do not use that data, what's stored in that row is irrelevant.
SELECT Anniversaries.MonthNumber,
Anniversaries.DayNumber,
Anniversaries.Description
FROM Anniversaries, TableWithOneRow
ORDER BY MonthNumber, DayNumber;
Listing 4: The SQL for qryExample1b_NonupdatableQuery, an
read-only query. Note that the only difference from Listing 3 is the inclusion
of ", TableWithOneRow" in the FROM clause.
As I've said, the content of TableWithOneRow is irrelevant. It's
not uncommon to have a requirement for a table with a single row to hold
important pieces of information (client name, path to a logo file, etc.), so
you may already have such a table in your application.
Example 2: Making multiple copies of a record
In Example 1, I used a table with a single row so that I wouldn't
duplicate the actual data of interest. Sometimes, though, you may want to
create multiple copies of the data in your tables.
A request I see in the newsgroups is along the lines of
"We're going from a dot matrix printer that was capable of handling three
part carbon paper to a laser printer, which cannot handle multipage forms. How
can I print three copies of each report?" The easiest way to do this is to
ensure that the RecordSource of the report has three copies of each record
(sorted, of course, so that all of copy one are together, followed by all of
copy two, followed by all of copy three).
For this specific requirement, therefore, you could introduce
a table with three rows in it, but in this case, the data contained in the
three rows would matter, since you need to be able to sort the three copies of
the data appropriately. What you might do is have a field with values of, say,
"Original", "Customer Copy" and "File Copy".
Include that field (and sort on it) in the RecordSource, and your requirements
are met.
In the sample database that accompanies this article, I've
shown a different example of making multiple copies of a record. Access, like
other Office products, makes it very easy to print to a wide variety of
purchased labels, as well as to other formats such as business cards, binder
spine templates and so on. Sometimes you may wish to print multiple copies of a
particular label or card.
From the preceding discussion, it should be obvious that
should you want twenty copies of a given label, you could simply include a table
with twenty rows of data in the query being used as a RecordSource for the
report. However, do you really want to have to include tables with multiple
rows to handle every eventuality? Well, it turns out that you don't really need
dummy tables with a huge number of rows: one table, with ten rows (the digits 0
through 9) can be used in a Cartesian product query to give you virtually every
possibility.

Figure
7: Table Digits simply contains the digits 0 through 9.
For instance, the sample database contains a query qryExample2a_NumericValuesFrom0To999
that makes three references to the Digits table in a Cartesian product query
illustrated in Listing 5.
SELECT H.WhatDigit*100+T.WhatDigit*10+U.WhatDigit AS NumericValue,
H.WhatDigit AS Hundreds,
T.WhatDigit AS Tens,
U.WhatDigit AS Units
FROM Digits AS H, Digits AS T, Digits AS U
ORDER BY 1;
Listing 5: The SQL for qryExample2a_NumericValuesFrom0To999,
which results in one thousand rows (the values 0 through 999). Note that it's
not necessary to include the individual digit fields as I did.
Now, rather than having the RecordSource of the label report
simply be the typical query against my UserAddress table, I can use qryExample2b_QueryForLabelReport,
shown in Listing 6.
PARAMETERS [How many labels do you want? (max 1000)] Long;
SELECT C.NumericValue,
U.UserName,
U.UserAddressLine1,
U.UserAddressLine2,
U.UserCity,
U.UserStateOrProv,
U.UserPostCode
FROM UserAddress AS U,
qryExample2a_NumericValuesFrom0To999 AS C
WHERE C.NumericValue<[How many labels do you want? (max 1000)]))
ORDER BY 1;
Listing 6: The SQL for qryExample2b_QueryForLabelReport.
Note that it prompts the user for the number of copies required.
Note that in the query, I'm explicitly declaring that the
prompt [How many labels do you want? (max 1000)] is a Long Integer. Including
Parameters Declarations is usually a good idea (and is actually required in
certain situations, such as Crosstab queries)
Example 3: Creating a calendar
I really agonized about including this particular example,
because it's going to be difficult to explain and the intent of this article is
the use of Cartesian products, not how to produce a calendar. However, this is
one of my most common uses of Cartesian products, so in the end I decided I had
no choice but to include it!
I often had the need to produce reports that look like a
calendar: one box for each day within a given period, with details of what's
happening that day inside the box. Producing the report itself isn't
particularly difficult: the easiest way is to have a report that represents the
details for a single day, and use that report seven times as a subreport (once
for each day of the week), and link the subreport to the parent report on the
specific date. The tricky part is to produce the appropriate RecordSet for the
report being used as the parent!
Before I get too carried away describing how to produce that
particular query, let's talk about the other information that's required for
the calendar report. Let's discuss a family planning calendar: the one that
hangs on the wall in the kitchen where you write all your upcoming events.
While I may be over-simplifying things, I find that the majority of events,
which need to be reported fall into one of two categories:
-
Those that are on the same day each year (such as birthdays and
anniversaries)
-
Those which vary from year to year
(Although it might not be obvious, I include statutory
holidays in the list of activities whose dates may vary from year to year.
Easter, for example, falls on a different day each year. Even holidays that
have fixed dates (like Christmas or New Year's Day) can have implications that
vary from year to year. For instance, when January 1st is a Saturday or Sunday,
I get the Monday after as the holiday off.)
Bottom line, I include two tables, with the remarkably
creative names of "Anniversaries" and "Holidays".
The Anniversaries table, which we saw in Example 1, consists of three fields: MonthNumber,
DayNumber and Description. Because the values of MonthNumber and DayNumber will
never exceed 31, I chose to make them Byte fields. Description is intended to
contain a brief description of the event suitable to appear on the calendar
("Grandma's birthday", "Mom & Dad's anniversary"), so I
set it as a 50-character Text field. Moreover, because it's possible for there
to be two or more events on the same day, I didn't set a Primary Key for the
table. The purists among you might want to add an Autonumber field that you can
use as a Primary Key.
The Holidays table only requires two fields: HolidayDate (a
Date/Time field, because it contains a complete date), and Description (again,
a 50 character Text field). Again, I saw no need for a Primary Key for this
table.
In addition to the events handled by the two tables above,
another category of events are those that occur on the same day for a finite
number of weeks during the year. These are the activities such as the class
every Tuesday night from 7:00 to9:00, the Wednesday night soccer game and the
Monday night Brownie meeting. These events are handled in a table named RecurringEvents,
which consists of four fields: DayOfWeek, StartDate, EndDate and Description.
The DayOfWeek field will indicate the day of the week on which the event
occurs: 1 for Sunday, 2 for Monday and so on up to 7 for Saturday. Because this
field is only going to hold values between 1 and 7, I made this a Byte field
also. The StartDate and EndDate Date/Time fields are the date of the first
occurrence of the event and the date of the last occurrence of the event. As
before, the Description field is a 50-character Text field.
Okay, now how do we generate a recordset that has one entry
for each day of the year? One approach, of course, is to create a table with
one row for each day. However, you can easily use a Cartesian product query to
produce this for you, using far smaller tables. Specifically, create three
tables: one named Days, one named Months, and one named Years. (Note the
plural: this is necessary since Day, Month and Year are all reserved words in
Access!)
The Days table consists of a single Byte field named DayNumber.
The table itself has 31 rows in it, containing the number 1 through 31. The
Months table consists of two fields: a Byte field named MonthNumber, and a Text
field named MonthNumber (although to be perfectly honest, there's no real need
for this text field), and contains 12 rows: one for each month. The Years table
consists of a single Byte field named YearNumber and contains as many rows as
you think is necessary, one for each year. (In the sample database that
accompanies this article, I included five rows, for the years 2008, 2009, 2010,
2011 and 2012). Figure 8 shows the details of these three tables.
Figure
8: Details of the Years, Months and Days tables.
Listing 7 shows how you'd use those three tables in a query
that returns one row for each day. Of course, since the Days table has 31 rows
in it, the Months table has 12 rows in it and the Years table has 5 rows in it,
the Cartesian product would return 1860 rows (31*12*5), whereas there are
actually only 1827 days between 1 January, 2008 and 31 December, 2012 (the
range represented by the three tables). Unfortunately, the DateSerial function
is a little too helpful: it will cheerfully accept numbers that aren't valid
combinations of month and day, which could end up causing duplicate rows. (For
example, DateSerial(2009, 11, 31) returns 1 December, 2009, the same as DateSerial(2009, 12, 1)) To prevent that, it's necessary to include a Where
clause to eliminate potential duplicates. Unlike the accommodating DateSerial
function, the IsDate function is more practical: if you pass it an invalid
date, it tells you so! To avoid any ambiguity between mm/dd/yyyy and dd/mm/yyyy
formats (is 02/10/2009 February 10 or October 2?), I use the yyyy-mm-dd format
in the call to the IsDate function.
SELECT DateSerial([YearNumber],[MonthNumber],[DayNumber]) AS WhatDate,
YearNumber,
MonthNumber,
DayNumber
FROM Days, Months, Years
WHERE IsDate([YearNumber] & "-" & [MonthNumber] & "-" & [DayNumber]
ORDER BY 1;
Listing 7: The SQL for qryExample3a_DaysOfTheYear. The
WHERE condition using the IsDate function eliminates invalid dates such as
February 30 that the DateSerial function would actually convert to duplicate
entries.
We're now ready to return to the recordset required for the
Calendar report! Figure 9 shows what this recordset needs to look like. Note
that not only do we need to show the seven days of the week on each row, but
when the month changes in the middle of the week, we need to use two separate
rows for the week: one to go on the previous month's calendar page and one to
go on the current month's calendar page.

Figure
9: The recordset required for the Calendar report.
Listing 8 shows the SQL for qryExample3b_Calendar, the query
that produces the recordset in Figure 9.
SELECT WhatDate AS SundayDate,
WhatDate + 1 AS MondayDate,
WhatDate + 2 AS TuesdayDate,
WhatDate + 3 AS WednesdayDate,
WhatDate + 4 AS ThursdayDate,
WhatDate + 5 AS FridayDate,
WhatDate + 6 AS SaturdayDate,
YearNumber AS SortYear,
MonthNumber AS SortMonth
FROM qryExample3a_DaysOfTheYear
WHERE Weekday([WhatDate])=1
UNION
SELECT WhatDate AS SundayDate,
WhatDate + 1 AS MondayDate,
WhatDate + 2 AS TuesdayDate,
WhatDate + 3 AS WednesdayDate,
WhatDate + 4 AS ThursdayDate,
WhatDate + 5 AS FridayDate,
WhatDate + 6 AS SaturdayDate,
YearNumber AS SortYear,
Month([WhatDate] +6 ) AS SortMonth
FROM qryExample3a_DaysOfTheYear
WHERE Weekday([WhatDate])=1
AND Month([WhatDate]) <> Month([WhatDate]+6)
AND Year([WhatDate] + 6) = YearNumber
UNION
SELECT Sunday(WhatDate) AS SundayDate,
Sunday(WhatDate) + 1 AS MondayDate,
Sunday(WhatDate) + 2 AS TuesdayDate,
Sunday(WhatDate) + 3 AS WednesdayDate,
Sunday(WhatDate) + 4 AS ThursdayDate,
Sunday(WhatDate) + 5 AS FridayDate,
Sunday(WhatDate) + 6 AS SaturdayDate,
Year(WhatDate) AS SortYear,
1 AS SortMonth
FROM qryExample3a_DaysOfTheYear
WHERE Month([WhatDate]) = 1
AND Day([WhatDate]) =1
Listing 8: The SQL to take the data from qryExample3a_DaysOfTheYear
and display each week as a single row.
Producing a query that returns the seven days of the week as
a single row isn't that difficult. By using a WHERE clause of Weekday([WhatDate])
= 1, you can return one row for each Sunday date. By adding the appropriate
number of days to that date, you can get the dates for Monday, Tuesday, and so
on. In order to be able to sort the data correctly, you also need to add a SortYear
and SortMonth, which will be the Year and Month of the SundayDate value. This
is what the first subquery in the Union query shown in Listing 8 does.
To handle the case where a week includes dates in two
different months, the second subquery in the Union query shown in Listing 8 calculates
SortMonth by determining the month for the Saturday of each week and only
returns those rows where Sunday and Saturday are in different months. The
query's third condition (Year([WhatDate] + 6) = YearNumber) eliminates the
second instance of the week if the year ends during the week. In other words, I
don't need a duplication of the week where Saturday is in the next year.
The third subquery in the Union query shown in Listing 8
produces the row corresponding to the first days of the year for those years
that don't start on Sunday. For example, in 2008, the first Sunday isn't until
January 6th, so the results of running the two subqueries already discussed
will not include the first five days of the year. I had to turn to some VBA
processing to handle this case.
Listing 9 shows a function that returns the Sunday before a
given date:
Function Sunday(WhatDate As Date) As Date
Sunday = WhatDate - WeekDay(WhatDate) + 1
End Function
Listing 9: For a given date, the Sunday function
returns the date of the previous Sunday.
This code takes advantage of the Weekday function that
returns 1 for Sunday, 2 for Monday and so on, until 7 for Saturday. Subtracting
the weekday number for a date from the date itself returns the date for the
preceding Saturday (e.g., because January 8, 2008 was a Tuesday, its weekday
number is 3, and subtracting that gives January 5, 2008 -- a Saturday). Adding 1
to that result gives you the date of the Sunday preceding the original date.
The third subquery uses the Sunday function to provide the
missing week (where necessary) by using the function to determine the Sunday
before January 1st of the given year, and then calculate all the other days of
that week.
Because the UNION operator automatically eliminates duplicate
rows, the query will eliminate rows where Sunday and Saturday are both in the
same month and where the week returned by the third subquery is also returned
by the first subquery.
It's almost anticlimactic discussing the last query required
for the Calendar report, the one that will be the RecordSource for the subreport
that shows the details for a given day!
SELECT DateSerial([YearNumber],[MonthNumber],[DayNumber]) AS WhatDate,
Anniversaries.Description
FROM Anniversaries, Years
UNION ALL
SELECT DISTINCT HolidayDate,
Description
FROM Holidays
UNION ALL
SELECT DISTINCT qryExample3a_DaysOfTheYear.WhatDate,
RecurringEvents.Description
FROM RecurringEvents INNER JOIN qryExample3a_DaysOfTheYear
ON Weekday(qryExample3a_DaysOfTheYear.WhatDate) = RecurringEvents.DayOfWeek
WHERE qryExample3a_DaysOfTheYear.WhatDate BETWEEN RecurringEvents.StartDate AND RecurringEvents.EndDate;
Listing 10: The SQL for query qryExample3c_ToDisplay,
which produces the details for a given day to be displayed in the subreport.
The first subquery in the Union query produces one entry for
each row in the Anniversaries table for each year in the Years table.
The second subquery adds one row for each row in the Holidays
table. Since my Holiday table actually contains holidays for a number of
different countries, I used SELECT DISTINCT to eliminate duplicates (i.e. dates
that are the same holiday in multiple countries).
Finally, the third subquery adds one row for each row in the RecurringEvents
table for each actual occurrence. It does this by joining the RecurringEvents
table to the results of qryExample3a_DaysOfTheYear using what I refer to as a
non-equijoin. In other words, it's an INNER JOIN, but the join condition
specified in the ON clause is not a simple equality statement, as it usually
is.
I know that was a long-winded explanation. Hopefully when you
look at the sample database, it'll all make sense to you!
Example 4: Creating a long running query
Believe it or not, I actually had a requirement recently to
create a query that was guaranteed to run for a long period of time!
In my current job, we're busy preparing to roll Windows 7
out. As you're all probably aware, one of the current "hot topics" is
Green Computing: reducing the power requirements, while not reducing the
computing abilities. Because of that, we're taking the Power Management
settings fairly seriously.
One concern that was raised to our team was what happens if
the user starts a long running query then goes to lunch. There was worry that
the machine would go into hibernation before the query completed, thereby
wasting all the work already done. (There was also concern that it could cause
database corruption)
To test this theory out, I constructed a Cartesian product
similar to the query qryExample2a_NumericValuesFrom0To999 we've already
discussed, only this time I allowed it to go to 9999.
SELECT U.WhatDigit+10*T.WhatDigit+100*H.WhatDigit+1000*Th.WhatDigit AS NumericValue
FROM Digits AS U, Digits AS T, Digits AS H, Digits AS Th;
Listing 11: The SQL for query qryExample4a_FirstPart,
which produces 10,000 rows.
I then created a second query, which joined the query above
to itself.
SELECT Count(*) AS TotalRows
FROM qryExample4a_FirstPart AS T1
INNER JOIN qryExample4a_FirstPart AS T2
ON T1.NumericValue = T2.NumericValue;
Listing 12: The SQL for query qryExample4b_SecondPart,
which joins qryExample4a_FirstPart to itself.
Since there are no indexes involved, joining a ten thousand
row table to itself is a fairly time-consuming exercise. (I find it takes about
11 to 12 minutes on my desktop). By setting the Hibernation time to a smaller
value (like 5 minutes), we were able to see the effects of the machine trying
to hibernate on the query.
(For those of you who care, the machine doesn't hibernate
while the query is running, but does hibernate five minutes after the query has
completed. However, we're running into some anomalies that appear to be related
to either the settings on our network switches or to the firewall product we're
using, so I can't really give you a definitive answer on what your Power
Management settings should be!)
Conclusion
I'll be the first to admit that some of the examples
presented above were a little frivolous. However, the fact of the matter is
that there can be legitimate uses for Cartesian products, and hopefully the
samples I provided will give you ideas for those times where they're
appropriate.
»
See All Articles by Columnist Doug Steele