All About the Crosstab Query

A cross tabulation query, or cross tab for short, is used to display
the calculation of a sum, average, count, or other type of total on two or
more fields. The results are then presented in a table format in which one field
is displayed down the left side of a matrix (row headings) and the other across
the top (column headings).

Funnily enough, most relational databases don’t provide an easy way to
create crosstabs, the exception being good ole Microsoft Access. It features
its own SQL language constructs for creating crosstabs such as TRANSFORM
and PIVOT. Today, I’m going to show you one of several ways to create
crosstabs using regular SQL. Although I will be working in MySQL, the
techniques discussed here could be applied to just about any database.

A Basic Crosstab

One of my many responsibilities of my job is to construct canned
reports for our clients. For each report, there is a stored procedure that
fetches the required data from the database. Every so often I get a request
for a report that necessitates generating a crosstab query. I recognize the
need for a crosstab when I hear requirements such as “I want to display this
field by that field”. Most often, it’s a grouping of a field by dates,
although other combinations come up from time to time. Here is what one of
those crosstab reports might look like:

Cases by Date and
Region Code

Month

01

02

03

04

05

April

13

33

76

2

47

May

17

55

209

1

143

June

8

63

221

1

127

July

13

104

240

6

123

August

18

121

274

9

111

September

25

160

239

2

88

October

9

88

295

2

127

November

2

86

292

2

120

December

1

128

232

6

155

The above crosstab displays the count of cases for each Region
by Month. Note that only months that contain records are displayed.

It is also common to display totals for both columns and rows. The
following report sample shows Case totals for each Region broken
down by Travel Mode. The latter is further broken down by the Nationality
of passengers. Sub and grand totals are tabulated for each field:

Totals for each Region broken down by travel mode

Crosstab Query Solutions

Now it’s time to delve into the primary reason that you’re reading this
article: how to write SQL that will display data in tabular format. There are
several ways to get the job done; it’s really a matter of deciding what it is
exactly that you need.

Solution #1: Using the GROUP By Clause

The GROUP BY clause groups a selected set of records into a set of
summary rows by the values of one or more columns or expressions. One row is
returned for each group, so aggregate functions in the SELECT list will provide
information about each group rather than individual rows:


SELECT MONTHNAME(CREATION_DATE) AS Month,
COUNT(*) AS CASES
FROM TA_CASES
WHERE MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE);

If you leave out the GROUP BY, the above query would throw an error on
execution such as the following:

“You tried to execute a query that does not include the specified
expression as part of an aggregate function or grouping.”

That happens because you normally can’t mix aggregate functions with
scalar field values, that is unless you specify the scalar field in the GROUP
BY clause.

Here are the results for the Month of April, as defined by the query above:

Cases for April

Month

CASES

April

171

That gives us the basic information that we want for one month. The
next steps are to break down the cases by region and include the other months.
Let’s deal with the regions first.

We can get the counts for each region by adding the REGION_CODE
to the field list:


SELECT MONTHNAME(CREATION_DATE) AS Month,
COUNT(*) AS CASES,
REGION_CODE
FROM TA_CASES
WHERE MONTH(CREATION_DATE) = 4 AND Year(CREATION_DATE) = 1998
GROUP BY MONTH(CREATION_DATE),
REGION_CODE;

Don’t forget to add it to the GROUP BY clause as well, or you’ll get an
error similar to the following:

You tried to execute a
query that does not include the specified expression ‘REGION_CODE’ as part of
an aggregate function.

The extra field will now cause the counts to be applied to each REGION_CODE:

Cases for April by
Region Code

Month

CASES

REGION_CODE

April

13

01

April

33

02

April

76

03

April

2

04

April

47

05

Although our query now captures the counts for each region, they are
displayed in separate rows, rather than columns. That’s because GROUP BY
applies to row data. To show the counts in one row requires adding each REGION_CODE
to the field list. Therefore, where we previously had one Count(*) in
the field list, we will now require one for each of our five regions. The
trick to making this work, is to filter the counts by the same criteria as in
the main WHERE clause:


SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=4
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE MONTH(CREATION_DATE)=4 AND YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);

Now our query displays all of the counts in one row. We can even
include a TOTAL column by keeping the original Count(*) in the
field list:

Cases for April by
Region Code

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

To show all months, we must remove the MONTH(CREATION_DATE) = 4
criteria from the main WHERE clause and replace the MONTH(CREATION_DATE) = 4
with the more generic MONTH(CREATION_DATE) = MONTH(CA.CREATION_DATE), where
CA is an alias for the TA_CASES table of the outer query:


SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);

Here is the resulting output from the above query, which shows counts
for every month that contains data:

Cases by Region Code
and Month

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

May

17

55

209

1

143

425

June

8

63

221

1

127

420

July

13

104

240

6

123

486

August

18

121

274

9

111

533

September

25

160

239

2

88

514

October

9

88

295

2

127

521

November

2

86

292

2

120

502

December

1

128

232

6

155

522

Our final task today will be to add a row that will show the totals for
each Region.

This is not an especially easy thing to accomplish because the GROUP BY
clause is summing the counts by month. You can’t include an aggregate function
to the GROUP BY so we have to come up with another way to add another row. One
way to add rows to a query recordset is to use the UNION statement. It appends
the second query results to the first; the only catch is that they must both possess
the same number of columns. Our second query will be almost identical to the
first, except that we will remove the GROUP BY, so that the counts are now
applied to the entire year:


SELECT MONTHNAME(CREATION_DATE) AS ‘Month’,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND MONTH(CREATION_DATE)=MONTH(CA.CREATION_DATE)
AND YEAR(CREATION_DATE)=1998) AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998
GROUP BY MONTH(CREATION_DATE);
UNION
SELECT ‘TOTAL’ AS Month,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’01’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 1′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’02’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 2′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’03’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 3′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’04’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 4′,
(SELECT count(*)
FROM TA_CASES
WHERE REGION_CODE = ’05’
AND YEAR(CREATION_DATE)=1998 AS ‘REGION 5’,
Count(*) AS TOTAL
FROM TA_CASES CA
WHERE YEAR(CREATION_DATE)=1998;

Here is the finished crosstab query, including the Region totals:

Cases by Region Code
and Month with Totals

Month

REGION 1

REGION 2

REGION 3

REGION 4

REGION 5

TOTAL

April

13

33

76

2

47

171

August

18

121

274

9

111

533

December

1

128

232

6

155

522

July

13

104

240

6

123

486

June

8

63

221

1

127

420

May

17

55

209

1

143

425

November

2

86

292

2

120

502

October

9

88

295

2

127

521

September

25

160

239

2

88

514

TOTAL

106

838

2078

31

1041

4094

The technique that we covered today is ideal for those times that you
know how many columns you’ll be needing. To accommodate an unknown number of columns,
you’ll need something a bit more flexible, such as a stored procedure. We’ll
be looking at that solution next time.

»


See All Articles by Columnist
Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles