Uses for Cartesian Products in MS Access

Friday Sep 18th 2009 by Doug Steele
Share:

Less well known than inner and outer joins is the Cartesian product, which produces every possible combination of records between the two tables. Doug Steele offers four examples to demonstrate some legitimate uses for Cartesian products.

A fundamental concept when working with relational databases is that of Joins. Simply put, a Join allows you to combine records from two separate tables. In Access, the most common Join is the Inner Join, which combines records from two tables whenever there are matching values in a common field. Another common Join is the Outer Join, which comes in two forms. A Left Outer Join includes all of the records from the first (left) of two tables, even if there are no matching values for records in the second (right) table, while a Right Outer Join includes all of the records from the second (right) of two tables, even if there are no matching values for records in the first (left) table. Less well known, perhaps, is the Cartesian product, which produces every possible combination of records between the two tables. In other words, if one table contains five records and the other table contains four records, the Cartesian product would contain twenty (5 x 4).

Creating a Cartesian Product

Queries are how Joins are used in Access, and most people use the Query Builder to create their queries. While you're probably familiar with building queries in Access, just allow me to briefly discuss how to do this, so that we're all on the same page.

Figure 1 illustrates the details of two simple tables.

Details of two tables
Figure 1: Details of two tables, each with two fields. Table1 has five rows, Table2 has four rows.

If your tables have been created with Relationships (as they should), when you drag related tables into the Query Builder, they will already be joined by default. If no lines are present, you can add them by dragging the field from one table and dropping on the related field in the second table. Figure 2 shows how you would create a query on the two tables shown in Figure 1. In this case, I'm joining the two tables on Field2, and displaying all four fields in the results.

Click for larger image
View in the Query Builder

Figure 2: View in the Query Builder, with the two tables joined on Field2

The SQL that's generated for that case is shown in Listing 1, while Figure 3 shows the results of running the query.

SELECT Table1.Field1,
       Table1.Field2,
       Table2.Field1,
       Table2.Field2
  FROM Table1 INNER JOIN Table2
      ON Table1.Field2 = Table2.Field2;
Listing 1: The SQL for the query shown in Figure 2. Note how the INNER JOIN operation relates the two tables.

The results of running the INNER JOIN query
Figure 3: The results of running the INNER JOIN query. Since the query joins the two tables on Field2, it shows four rows: the rows in each table with the same values for Field2.

To create a Cartesian product, you do not want any lines connecting your tables: delete any that might be present.

Figure 4 illustrates how you can create a Cartesian product through the Query Builder, while Listing 2 shows the SQL that's generated for that case. Note that even if you do not explicitly refer to one (or more) of the tables (i.e.: you do not include any fields from the table in the list of fields to be displayed), the query will still result in a Cartesian product.

Creating a Cartesian product in the Query Builder
Figure 4: Creating a Cartesian product in the Query Builder. Note that there is no line joining the two tables

SELECT Table1.Field1,
       Table1.Field2,
       Table2.Field1,
       Table2.Field2
  FROM Table1, Table2;
Listing 2: The SQL for the Cartesian product shown in Figure 4

As is shown in Figure 5, running this query will result in a total of twenty rows.

The results of running the Cartesian product query
Figure 5: The results of running the Cartesian product query. Each row in Table1 is joined to each row in Table2, resulting in a total of twenty rows.

The sample database that accompanies this article also shows examples of using Left Join and Right Join on the same two tables, so that you can see the differences between the four queries.

Using Cartesian Products

While some of the examples I'm about to present may seem a little contrived, it's my intent to show you that there are legitimate uses for Cartesian products.

Example 1: Making a query read-only

By definition, Cartesian products are always read-only. That means should you want to guarantee that the users can't just change the AllowAdditions/AllowDeletions or AllowEdits properties of a form and change the data, you could set the RecordSource of the report to a query that's a Cartesian product.

The simplest way to do this would be to have a table with a single row in it. (What's in that row is irrelevant). You'd simply have to add a reference to that single row table in the SQL of the query, and you've made the query read-only.

In Example 3, I'll talk about the Anniversaries table that's included in the sample database that accompanies this article. For now, suffice it to say that it's a table that has three fields: a MonthNumber, a DayNumber and a Description that's intended to be used to represent events that occur each year on a given date.

Query qryExample1a_UpdatableQuery is a typical query such as you'd use for the RecordSource of a form to allow you to update the table. Its SQL is shown in Listing 3.

  SELECT Anniversaries.MonthNumber,
         Anniversaries.DayNumber,
         Anniversaries.Description
    FROM Anniversaries
ORDER BY MonthNumber, DayNumber;
Listing 3: The SQL for qryExample1a_UpdatableQuery, an updatable query.

Note, though, that if I have a simple table with a single row in it (such as is shown in Figure 6), I can use it in qryExample1b_NonupdatableQuery (as shown in Listing 4), thus making it read-only.

TableWithOneRow has a single row of data in it
Figure 6: As its name implies, TableWithOneRow has a single row of data in it. Since we do not use that data, what's stored in that row is irrelevant.

  SELECT Anniversaries.MonthNumber,
         Anniversaries.DayNumber,
         Anniversaries.Description
    FROM Anniversaries, TableWithOneRow
ORDER BY MonthNumber, DayNumber;
Listing 4: The SQL for qryExample1b_NonupdatableQuery, an read-only query. Note that the only difference from Listing 3 is the inclusion of ", TableWithOneRow" in the FROM clause.

As I've said, the content of TableWithOneRow is irrelevant. It's not uncommon to have a requirement for a table with a single row to hold important pieces of information (client name, path to a logo file, etc.), so you may already have such a table in your application.

Example 2: Making multiple copies of a record

In Example 1, I used a table with a single row so that I wouldn't duplicate the actual data of interest. Sometimes, though, you may want to create multiple copies of the data in your tables.

A request I see in the newsgroups is along the lines of "We're going from a dot matrix printer that was capable of handling three part carbon paper to a laser printer, which cannot handle multipage forms. How can I print three copies of each report?" The easiest way to do this is to ensure that the RecordSource of the report has three copies of each record (sorted, of course, so that all of copy one are together, followed by all of copy two, followed by all of copy three).

For this specific requirement, therefore, you could introduce a table with three rows in it, but in this case, the data contained in the three rows would matter, since you need to be able to sort the three copies of the data appropriately. What you might do is have a field with values of, say, "Original", "Customer Copy" and "File Copy". Include that field (and sort on it) in the RecordSource, and your requirements are met.

In the sample database that accompanies this article, I've shown a different example of making multiple copies of a record. Access, like other Office products, makes it very easy to print to a wide variety of purchased labels, as well as to other formats such as business cards, binder spine templates and so on. Sometimes you may wish to print multiple copies of a particular label or card.

From the preceding discussion, it should be obvious that should you want twenty copies of a given label, you could simply include a table with twenty rows of data in the query being used as a RecordSource for the report. However, do you really want to have to include tables with multiple rows to handle every eventuality? Well, it turns out that you don't really need dummy tables with a huge number of rows: one table, with ten rows (the digits 0 through 9) can be used in a Cartesian product query to give you virtually every possibility.

Table Digits simply contains the digits 0 through 9
Figure 7: Table Digits simply contains the digits 0 through 9.

For instance, the sample database contains a query qryExample2a_NumericValuesFrom0To999 that makes three references to the Digits table in a Cartesian product query illustrated in Listing 5.

  SELECT H.WhatDigit*100+T.WhatDigit*10+U.WhatDigit AS NumericValue,
         H.WhatDigit AS Hundreds,
         T.WhatDigit AS Tens,
         U.WhatDigit AS Units
    FROM Digits AS H, Digits AS T, Digits AS U
ORDER BY 1;
Listing 5: The SQL for qryExample2a_NumericValuesFrom0To999, which results in one thousand rows (the values 0 through 999). Note that it's not necessary to include the individual digit fields as I did.

Now, rather than having the RecordSource of the label report simply be the typical query against my UserAddress table, I can use qryExample2b_QueryForLabelReport, shown in Listing 6.

PARAMETERS [How many labels do you want? (max 1000)] Long;
SELECT C.NumericValue,
U.UserName,
U.UserAddressLine1,
U.UserAddressLine2,
U.UserCity,
U.UserStateOrProv,
U.UserPostCode
FROM UserAddress AS U,
qryExample2a_NumericValuesFrom0To999 AS C
WHERE C.NumericValue<[How many labels do you want? (max 1000)]))
ORDER BY 1;
Listing 6: The SQL for qryExample2b_QueryForLabelReport. Note that it prompts the user for the number of copies required.

Note that in the query, I'm explicitly declaring that the prompt [How many labels do you want? (max 1000)] is a Long Integer. Including Parameters Declarations is usually a good idea (and is actually required in certain situations, such as Crosstab queries)

Example 3: Creating a calendar

I really agonized about including this particular example, because it's going to be difficult to explain and the intent of this article is the use of Cartesian products, not how to produce a calendar. However, this is one of my most common uses of Cartesian products, so in the end I decided I had no choice but to include it!

I often had the need to produce reports that look like a calendar: one box for each day within a given period, with details of what's happening that day inside the box. Producing the report itself isn't particularly difficult: the easiest way is to have a report that represents the details for a single day, and use that report seven times as a subreport (once for each day of the week), and link the subreport to the parent report on the specific date. The tricky part is to produce the appropriate RecordSet for the report being used as the parent!

Before I get too carried away describing how to produce that particular query, let's talk about the other information that's required for the calendar report. Let's discuss a family planning calendar: the one that hangs on the wall in the kitchen where you write all your upcoming events. While I may be over-simplifying things, I find that the majority of events, which need to be reported fall into one of two categories:

  • Those that are on the same day each year (such as birthdays and anniversaries)

  • Those which vary from year to year

(Although it might not be obvious, I include statutory holidays in the list of activities whose dates may vary from year to year. Easter, for example, falls on a different day each year. Even holidays that have fixed dates (like Christmas or New Year's Day) can have implications that vary from year to year. For instance, when January 1st is a Saturday or Sunday, I get the Monday after as the holiday off.)

Bottom line, I include two tables, with the remarkably creative names of "Anniversaries" and "Holidays".

The Anniversaries table, which we saw in Example 1, consists of three fields: MonthNumber, DayNumber and Description. Because the values of MonthNumber and DayNumber will never exceed 31, I chose to make them Byte fields. Description is intended to contain a brief description of the event suitable to appear on the calendar ("Grandma's birthday", "Mom & Dad's anniversary"), so I set it as a 50-character Text field. Moreover, because it's possible for there to be two or more events on the same day, I didn't set a Primary Key for the table. The purists among you might want to add an Autonumber field that you can use as a Primary Key.

The Holidays table only requires two fields: HolidayDate (a Date/Time field, because it contains a complete date), and Description (again, a 50 character Text field). Again, I saw no need for a Primary Key for this table.

In addition to the events handled by the two tables above, another category of events are those that occur on the same day for a finite number of weeks during the year. These are the activities such as the class every Tuesday night from 7:00 to9:00, the Wednesday night soccer game and the Monday night Brownie meeting. These events are handled in a table named RecurringEvents, which consists of four fields: DayOfWeek, StartDate, EndDate and Description. The DayOfWeek field will indicate the day of the week on which the event occurs: 1 for Sunday, 2 for Monday and so on up to 7 for Saturday. Because this field is only going to hold values between 1 and 7, I made this a Byte field also. The StartDate and EndDate Date/Time fields are the date of the first occurrence of the event and the date of the last occurrence of the event. As before, the Description field is a 50-character Text field.

Okay, now how do we generate a recordset that has one entry for each day of the year? One approach, of course, is to create a table with one row for each day. However, you can easily use a Cartesian product query to produce this for you, using far smaller tables. Specifically, create three tables: one named Days, one named Months, and one named Years. (Note the plural: this is necessary since Day, Month and Year are all reserved words in Access!)

The Days table consists of a single Byte field named DayNumber. The table itself has 31 rows in it, containing the number 1 through 31. The Months table consists of two fields: a Byte field named MonthNumber, and a Text field named MonthNumber (although to be perfectly honest, there's no real need for this text field), and contains 12 rows: one for each month. The Years table consists of a single Byte field named YearNumber and contains as many rows as you think is necessary, one for each year. (In the sample database that accompanies this article, I included five rows, for the years 2008, 2009, 2010, 2011 and 2012). Figure 8 shows the details of these three tables.

Details of the Years, Months and Days tables
Figure 8: Details of the Years, Months and Days tables.

Listing 7 shows how you'd use those three tables in a query that returns one row for each day. Of course, since the Days table has 31 rows in it, the Months table has 12 rows in it and the Years table has 5 rows in it, the Cartesian product would return 1860 rows (31*12*5), whereas there are actually only 1827 days between 1 January, 2008 and 31 December, 2012 (the range represented by the three tables). Unfortunately, the DateSerial function is a little too helpful: it will cheerfully accept numbers that aren't valid combinations of month and day, which could end up causing duplicate rows. (For example, DateSerial(2009, 11, 31) returns 1 December, 2009, the same as DateSerial(2009, 12, 1)) To prevent that, it's necessary to include a Where clause to eliminate potential duplicates. Unlike the accommodating DateSerial function, the IsDate function is more practical: if you pass it an invalid date, it tells you so! To avoid any ambiguity between mm/dd/yyyy and dd/mm/yyyy formats (is 02/10/2009 February 10 or October 2?), I use the yyyy-mm-dd format in the call to the IsDate function.

SELECT DateSerial([YearNumber],[MonthNumber],[DayNumber]) AS WhatDate, 
YearNumber,
MonthNumber, 
DayNumber
FROM Days, Months, Years
WHERE IsDate([YearNumber] & "-" & [MonthNumber] & "-" & [DayNumber]
ORDER BY 1;
Listing 7: The SQL for qryExample3a_DaysOfTheYear. The WHERE condition using the IsDate function eliminates invalid dates such as February 30 that the DateSerial function would actually convert to duplicate entries.

We're now ready to return to the recordset required for the Calendar report! Figure 9 shows what this recordset needs to look like. Note that not only do we need to show the seven days of the week on each row, but when the month changes in the middle of the week, we need to use two separate rows for the week: one to go on the previous month's calendar page and one to go on the current month's calendar page.

The recordset required for the Calendar report
Figure 9: The recordset required for the Calendar report.

Listing 8 shows the SQL for qryExample3b_Calendar, the query that produces the recordset in Figure 9.

SELECT WhatDate AS SundayDate,
       WhatDate + 1 AS MondayDate,
       WhatDate + 2 AS TuesdayDate,
       WhatDate + 3 AS WednesdayDate,
       WhatDate + 4 AS ThursdayDate,
       WhatDate + 5 AS FridayDate,
       WhatDate + 6 AS SaturdayDate,
       YearNumber AS SortYear,
       MonthNumber AS SortMonth
  FROM qryExample3a_DaysOfTheYear
 WHERE Weekday([WhatDate])=1
UNION
SELECT WhatDate AS SundayDate,
       WhatDate + 1 AS MondayDate,
       WhatDate + 2 AS TuesdayDate,
       WhatDate + 3 AS WednesdayDate,
       WhatDate + 4 AS ThursdayDate, 
       WhatDate + 5 AS FridayDate,
       WhatDate + 6 AS SaturdayDate,
       YearNumber AS SortYear,
       Month([WhatDate] +6 ) AS SortMonth
  FROM qryExample3a_DaysOfTheYear
 WHERE Weekday([WhatDate])=1
   AND Month([WhatDate]) <> Month([WhatDate]+6)
   AND Year([WhatDate] + 6) = YearNumber
UNION
SELECT Sunday(WhatDate) AS SundayDate,
       Sunday(WhatDate) + 1 AS MondayDate,
       Sunday(WhatDate) + 2 AS TuesdayDate,
       Sunday(WhatDate) + 3 AS WednesdayDate,
       Sunday(WhatDate) + 4 AS ThursdayDate,
       Sunday(WhatDate) + 5 AS FridayDate,
       Sunday(WhatDate) + 6 AS SaturdayDate,
       Year(WhatDate) AS SortYear,
       1 AS SortMonth
  FROM qryExample3a_DaysOfTheYear
 WHERE Month([WhatDate]) = 1
   AND Day([WhatDate]) =1
Listing 8: The SQL to take the data from qryExample3a_DaysOfTheYear and display each week as a single row.

Producing a query that returns the seven days of the week as a single row isn't that difficult. By using a WHERE clause of Weekday([WhatDate]) = 1, you can return one row for each Sunday date. By adding the appropriate number of days to that date, you can get the dates for Monday, Tuesday, and so on. In order to be able to sort the data correctly, you also need to add a SortYear and SortMonth, which will be the Year and Month of the SundayDate value. This is what the first subquery in the Union query shown in Listing 8 does.

To handle the case where a week includes dates in two different months, the second subquery in the Union query shown in Listing 8 calculates SortMonth by determining the month for the Saturday of each week and only returns those rows where Sunday and Saturday are in different months. The query's third condition (Year([WhatDate] + 6) = YearNumber) eliminates the second instance of the week if the year ends during the week. In other words, I don't need a duplication of the week where Saturday is in the next year.

The third subquery in the Union query shown in Listing 8 produces the row corresponding to the first days of the year for those years that don't start on Sunday. For example, in 2008, the first Sunday isn't until January 6th, so the results of running the two subqueries already discussed will not include the first five days of the year. I had to turn to some VBA processing to handle this case.

Listing 9 shows a function that returns the Sunday before a given date:

Function Sunday(WhatDate As Date) As Date
  Sunday = WhatDate - WeekDay(WhatDate) + 1
End Function
Listing 9: For a given date, the Sunday function returns the date of the previous Sunday.

This code takes advantage of the Weekday function that returns 1 for Sunday, 2 for Monday and so on, until 7 for Saturday. Subtracting the weekday number for a date from the date itself returns the date for the preceding Saturday (e.g., because January 8, 2008 was a Tuesday, its weekday number is 3, and subtracting that gives January 5, 2008 -- a Saturday). Adding 1 to that result gives you the date of the Sunday preceding the original date.

The third subquery uses the Sunday function to provide the missing week (where necessary) by using the function to determine the Sunday before January 1st of the given year, and then calculate all the other days of that week.

Because the UNION operator automatically eliminates duplicate rows, the query will eliminate rows where Sunday and Saturday are both in the same month and where the week returned by the third subquery is also returned by the first subquery.

It's almost anticlimactic discussing the last query required for the Calendar report, the one that will be the RecordSource for the subreport that shows the details for a given day!

SELECT DateSerial([YearNumber],[MonthNumber],[DayNumber]) AS WhatDate,
       Anniversaries.Description
  FROM Anniversaries, Years
UNION ALL
SELECT DISTINCT HolidayDate,
       Description
  FROM Holidays
UNION ALL
SELECT DISTINCT qryExample3a_DaysOfTheYear.WhatDate,
       RecurringEvents.Description
  FROM RecurringEvents INNER JOIN qryExample3a_DaysOfTheYear
    ON Weekday(qryExample3a_DaysOfTheYear.WhatDate) = RecurringEvents.DayOfWeek
 WHERE qryExample3a_DaysOfTheYear.WhatDate BETWEEN RecurringEvents.StartDate AND RecurringEvents.EndDate;
Listing 10: The SQL for query qryExample3c_ToDisplay, which produces the details for a given day to be displayed in the subreport.

The first subquery in the Union query produces one entry for each row in the Anniversaries table for each year in the Years table.

The second subquery adds one row for each row in the Holidays table. Since my Holiday table actually contains holidays for a number of different countries, I used SELECT DISTINCT to eliminate duplicates (i.e. dates that are the same holiday in multiple countries).

Finally, the third subquery adds one row for each row in the RecurringEvents table for each actual occurrence. It does this by joining the RecurringEvents table to the results of qryExample3a_DaysOfTheYear using what I refer to as a non-equijoin. In other words, it's an INNER JOIN, but the join condition specified in the ON clause is not a simple equality statement, as it usually is.

I know that was a long-winded explanation. Hopefully when you look at the sample database, it'll all make sense to you!

Example 4: Creating a long running query

Believe it or not, I actually had a requirement recently to create a query that was guaranteed to run for a long period of time!

In my current job, we're busy preparing to roll Windows 7 out. As you're all probably aware, one of the current "hot topics" is Green Computing: reducing the power requirements, while not reducing the computing abilities. Because of that, we're taking the Power Management settings fairly seriously.

One concern that was raised to our team was what happens if the user starts a long running query then goes to lunch. There was worry that the machine would go into hibernation before the query completed, thereby wasting all the work already done. (There was also concern that it could cause database corruption)

To test this theory out, I constructed a Cartesian product similar to the query qryExample2a_NumericValuesFrom0To999 we've already discussed, only this time I allowed it to go to 9999.

SELECT U.WhatDigit+10*T.WhatDigit+100*H.WhatDigit+1000*Th.WhatDigit AS NumericValue
FROM Digits AS U, Digits AS T, Digits AS H, Digits AS Th;
Listing 11: The SQL for query qryExample4a_FirstPart, which produces 10,000 rows.

I then created a second query, which joined the query above to itself.

SELECT Count(*) AS TotalRows
FROM qryExample4a_FirstPart AS T1
INNER JOIN qryExample4a_FirstPart AS T2
ON T1.NumericValue = T2.NumericValue;
Listing 12: The SQL for query qryExample4b_SecondPart, which joins qryExample4a_FirstPart to itself.

Since there are no indexes involved, joining a ten thousand row table to itself is a fairly time-consuming exercise. (I find it takes about 11 to 12 minutes on my desktop). By setting the Hibernation time to a smaller value (like 5 minutes), we were able to see the effects of the machine trying to hibernate on the query.

(For those of you who care, the machine doesn't hibernate while the query is running, but does hibernate five minutes after the query has completed. However, we're running into some anomalies that appear to be related to either the settings on our network switches or to the firewall product we're using, so I can't really give you a definitive answer on what your Power Management settings should be!)

Conclusion

I'll be the first to admit that some of the examples presented above were a little frivolous. However, the fact of the matter is that there can be legitimate uses for Cartesian products, and hopefully the samples I provided will give you ideas for those times where they're appropriate.

» See All Articles by Columnist Doug Steele

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved