Sometimes the criteria for determining which set of records will be affected by a SELECT, UPDATE, DELETE and/or INSERT statement cannot be obtained by hard coding the selection criteria. Occasionally there is a need to use the results of a SELECT statement to help determine which records are returned or are affected by a T-SQL statement. When a SELECT statement is used inside another statement, the inside SELECT statement is known as a subquery. Subqueries can help to dynamically control the records affected by an UPDATE, DELETE or INSERT statement, as well as to help determine the records that will be returned by a SELECT statement. This article will show different examples of how to use a subquery to help clarify the records affected or selected by a T-SQL statement.
What is a Subquery?
A subquery is a SELECT statement that is nested within another T-SQL statement. A subquery SELECT statement if executed independently of the T-SQL statement, in which it is nested, will return a result set. Meaning a subquery SELECT statement can standalone and is not depended on the statement in which it is nested. A subquery SELECT statement can return any number of values, and can be found in, the column list of a SELECT statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A Subquery can also be used as a parameter to a function call. Basically a subquery can be used anywhere an expression can be used.
Use of a Subquery in the Column List of a SELECT Statement
Suppose you would like to see the last OrderID and the OrderDate for the last order that was shipped to Paris. Along with that information, say you would also like to see the OrderDate for the last order shipped regardless of the ShipCity. In addition to this, you would also like to calculate the difference in days between the two different OrderDates. Here is my T-SQL SELECT statement to accomplish this:
select top 1 OrderId,convert(char(10), OrderDate,121) Last_Paris_Order, (select convert(char(10),max(OrderDate),121) from Northwind.dbo.Orders) Last_OrderDate, datediff(dd,OrderDate, (select Max(OrderDate)from Northwind.dbo.Orders)) Day_Diff from Northwind.dbo.Orders where ShipCity = 'Paris' order by OrderDate desc
The above code contains two subqueries. The first subquery gets the OrderDate for the last order shipped regardless of ShipCity, and the second subquery calculates the number of days between the two different OrderDates. Here I used the first subquery to return a column value in the final result set. The second subquery was used as a parameter in a function call. This subquery passed the "max(OrderDate)" date to the DATEDIFF function.
Use of a Subquery in the WHERE clause
A subquery can be used to control the records returned from a SELECT by controlling which records pass the conditions of a WHERE clause. In this case the results of the subquery would be used on one side of a WHERE clause condition. Here is an example:
select distinct country from Northwind.dbo.Customers where country not in (select distinct country from Northwind.dbo.Suppliers)
Here I have returned a list of countries where customers live, but there is no supplier located in that country. I suppose if you where trying to provide better delivery time to customers, then you might target these countries to look for additional suppliers.
Suppose a company would like to do some targeted marketing. This targeted marketing would contact customers in the country with the fewest number of orders. It is hoped that this targeted marketing will increase the overall sales in the targeted country. Here is an example that uses a subquery to return the customer contact information for the country with the fewest number of orders:
select Country,CompanyName, ContactName, ContactTitle, Phone from Northwind.dbo.Customers where country = (select top 1 country from Northwind.dbo.Customers C join Northwind.dbo.Orders O on C.CustomerId = O.CustomerID group by country order by count(*))
Here I have written a subquery that joins the Customer and Orders Tables to determine the total number of orders for each country. The subquery uses the "TOP 1" clause to return the country with the fewest number of orders. The country with the fewest number of orders is then used in the WHERE clause to determine which Customer Information will be displayed.
Use of a Subquery in the FROM clause
The FROM clause normally identifies the tables used in the T-SQL statement. You can think of each of the tables identified in the FROM clause as a set of records. Well, a subquery is just a set of records, and therefore can be used in the FROM clause just like a table. Here is an example where a subquery is used in the FROM clause of a SELECT statement:
select au_lname, au_fname, title from (select au_lname, au_fname, au_id from pubs.dbo.authors where state = 'CA') as a join pubs.dbo.titleauthor ta on a.au_id=ta.au_id join pubs.dbo.titles t on ta.title_id = t.title_id
Here I have used a subquery to select only the author record information, if the author's record has a state column equal to "CA." I have named the set returned from this subquery with a table alias of "a." I can then use this alias elsewhere in the T-SQL statement to refer to the columns from the subquery by prefixing them with an "a", as I did in the "ON" clause of the "JOIN" criteria. Sometimes using a subquery in the FROM clause reduces the size of the set that needs to be joined. Reducing the number of records that have to be joined enhances the performance of joining rows, and therefore speeds up the overall execution of a query.
Here is an example where I used a subquery in the FROM clause of an UPDATE statement:
set nocount on create table x(i int identity, a char(1)) insert into x values ('a') insert into x values ('b') insert into x values ('c') insert into x values ('d') select * from x update x set a = b.a from (select max(a) as a from x) b where i > 2 select * from x drop table x
Here I created a table named "x," that has four rows. Then I proceeded to update the rows where "i" was greater than 2 with the max value in column "a". I used a subquery in the FROM clause of the UPDATE statement to identity the max value of column "a."
Use of a Subquery in the HAVING clause
In the following example, I used a subquery to find the number of books a publisher has published where the publisher is not located in the state of California. To accomplish this I used a subquery in a HAVING clause. Here is my code:
select pub_name, count(*) bookcnt from pubs.dbo.titles t join pubs.dbo.publishers p on t.pub_id = p.pub_id group by pub_name having p.pub_name in (select pub_name from pubs.dbo.publishers where state <> 'CA')
Here my subquery returns the pub_name values for all publishers that have a state value not equal to "CA." The HAVING condition then checks to see if the pub_name is in the set returned by my subquery.
The subquery is a valuable concept to understand. Creative use of a subquery allows the desired results to be returned in a single T-SQL statement, and helps avoid the use of temporary tables and cursors to solve complicated selection criteria. In addition, depending on the query, a subquery might improve performance by reducing the number of records that SQL Server needs to process. In my next article, I will expand on the idea of subqueries by looking at correlated subqueries.