Analytic Enhancements in SQL Server 2012 – Part II

When I wrote the first installment of this series, I used Denali CTP3. Since then, SQL Server 2012 has been released. In this second installment, I am going to show you more new analytic functions and their usage in the RTM version of SQL Server 2012.  

LAG and LEAD

The LAG and LEAD functions are added in SQL Server 2012 to easily access a previous and subsequent row in the same result set. Compared to SQL Server 2005, no ROW_NUMBER function or other ordering methods need to be explicitly used. The data set doesn’t need to be self-joined in order to figure out the row proceeding or following a particular row.  Although, as in the OVER clause shown in my first article, the Row_Number() function is still used under the hook to order the rows.

For example, if we would like to calculate the daily return of stocks based on their adjusted close prices, it is very easy to do now. Remember our Equity database, Equity.bak, which has a table EquityPrice containing the historical stock prices from year 2010 to Aug 5th, 2011, of four stocks, AAPL, CSCO, HPQ and IBM.

USE Equity;
 
WITH PrevCurr
AS
(
       SELECT Ticker, TradeDate, AdjClose, LAG(AdjClose, 1, NULL) OVER (PARTITION BY Ticker ORDER BY TradeDate) as PrevAdjClose
       FROM EquityPrice
)
SELECT Ticker, TradeDate, (AdjClose - PrevAdjClose)/PrevAdjClose as DailyReturn
FROM PrevCurr
WHERE PrevAdjClose IS NOT NULL;

As you can see, we specify 1 as the offset in the LAG function to get the row right before the current row.  We also set NULL as the default value if no previous row is available.  We also exclude these rows in our result set by using the WHERE clause.

In SQL Server 2005, we have to write the query as follows.

USE Equity;
 
WITH PrevCurr
AS
(
       SELECT Ticker, TradeDate, AdjClose, ROW_NUMBER() OVER (PARTITION BY Ticker ORDER BY TradeDate) as rn
       FROM EquityPrice
)
SELECT p1.Ticker, p1.TradeDate, (p1.AdjClose - p2.AdjClose)/p2.AdjClose as DailyReturn
FROM PrevCurr p1
JOIN PrevCurr p2
ON p1.Ticker = p2.Ticker and p1.rn = p2.rn + 1;

If you run both queries in SQL Server Management Studio, you would see the cost of the first query is much smaller than the second query (9% vs. 91%).

Query Cost
Query Cost (Full size image)

 The first query with the LAG function only scans the EquityPrice table once, while the second query scans the table twice before doing the self join. The number of logical reads is only 13 by the first query versus 26 by the second query.

FIRST_VALUE and LAST_VALUE

The FIRST_VALUE and LAST_VALUE are added in SQL Server 2012 to easily access the first and last value in an ordered result set. Compared to SQL Server 2005, no ROW_NUMBER() function or TOP clause need to be used before self-joining the data set. For example, if we would like to look at the open, high, low and close prices (OHLC), then we need to get the open and close prices based on the order of the trade time, and the highest and lowest prices based on the order of the trade price.  Looking at the ticks of the future Emini S&P 500 (ES) on Mar 9th, 2012 in the FutureTick table. This table has three columns: the FutureSymbol column contains the symbols, (in our database, we only have ES), the TickTime column contains the time when a tick/transaction occurred, and the price column contains the traded price.

SELECT [FutureSymbol]
      ,[TickTime]
      ,[Price]
FROM [Equity].[dbo].[FutureTick]
 

For example, we would like to build the minute bars, i.e., the OHLC of each minute throughout the day. In SQL Server 2012, it is easy to implement.

WITH OHLC
AS
(
SELECT [FutureSymbol]
         ,TickTime
         ,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, TickTime), 0) as TradeMinute
         ,[Price]
  FROM FutureTick
)
SELECT Distinct FutureSymbol
       , TradeMinute
       , FIRST_VALUE(Price) OVER (PARTITION BY FutureSymbol, TradeMinute ORDER BY TickTime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [MinuteOpen]
       , LAST_VALUE(Price) OVER (PARTITION BY FutureSymbol, TradeMinute ORDER BY TickTime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [MinuteClose]
       , MIN(Price) OVER (PARTITION BY FutureSymbol, TradeMinute) AS [MinuteLow]
       , MAX(Price) OVER (PARTITION BY FutureSymbol, TradeMinute) AS [MinuteHigh]
FROM OHLC
ORDER BY FutureSymbol, TradeMinute;
 

As you can see above, the ROWS clause limits each analytic partition to the rows in the same minute, and the rows are ordered by the TickTime. The FIRST_VALUE function gets the first price in a minute, while the LAST_VALUE function gets the last price. The MIN and MAX functions get the lowest and highest prices.

If you would like to see how the OHLC changes after a tick throughout a day, similar to what you see on a financial website during trading hours, it is also easy to implement in SQL Server 2012. You just change the partitions to the first row of the day to the current row.

WITH OHLC
AS
(
SELECT [FutureSymbol]
         ,TickTime
         ,CONVERT(Date, TickTime) as TradeDate
         ,[Price]
  FROM FutureTick
)
SELECT FutureSymbol
       , TickTime
       , Price
       , FIRST_VALUE(Price) OVER (PARTITION BY FutureSymbol, TradeDate ORDER BY TickTime ROWS UNBOUNDED PRECEDING) AS [Open]
       , LAST_VALUE(Price) OVER (PARTITION BY FutureSymbol, TradeDate ORDER BY TickTime ROWS UNBOUNDED PRECEDING) AS [Close]
       , MIN(Price) OVER (PARTITION BY FutureSymbol, TradeDate ORDER BY TickTime ROWS UNBOUNDED PRECEDING) AS [Low]
       , MAX(Price) OVER (PARTITION BY FutureSymbol, TradeDate ORDER BY TickTime ROWS UNBOUNDED PRECEDING) AS [High]
FROM OHLC
ORDER BY FutureSymbol, TickTime;

In SQL Server 2005, to build the minute bars, you have to write something much more complicated as shown below.  You can easily see the benefits the FIRST_VALUE and LAST_VALUE functions provide.

WITH PriceMinute
AS
(
SELECT [FutureSymbol]
         ,TickTime
         ,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, TickTime), 0) as TradeMinute
         ,[Price]
  FROM FutureTick
)
,
PriceOpen AS
(
SELECT [FutureSymbol]
         ,TradeMinute
         ,ROW_NUMBER() OVER (PARTITION BY FutureSymbol, TradeMinute ORDER BY TickTime) as TimeRN
         ,[Price]
  FROM PriceMinute
),
PriceClose AS
(
SELECT [FutureSymbol]
         ,TradeMinute
         ,ROW_NUMBER() OVER (PARTITION BY FutureSymbol, TradeMinute ORDER BY TickTime DESC) as TimeRN
         ,[Price]
  FROM PriceMinute
),
PriceHighLow AS
(
SELECT [FutureSymbol]
         ,TradeMinute
         ,MIN(Price) as Low
         ,MAX(Price) as High
FROM PriceMinute
GROUP BY [FutureSymbol]
         ,TradeMinute
)
SELECT o.FutureSymbol
       , o.TradeMinute
       , o.Price AS [MinuteOpen]
       , c.Price AS [MinuteClose]
       , hl.Low AS [MinuteLow]
       , hl.High AS [MinuteHigh]
FROM PriceOpen o
JOIN PriceClose c
ON o.FutureSymbol = c.FutureSymbol and o.TradeMinute = c.TradeMinute
JOIN PriceHighLow hl
ON hl.FutureSymbol = o.FutureSymbol and hl.TradeMinute = o.TradeMinute
WHERE o.TimeRN = 1 and c.TimeRN = 1
ORDER BY FutureSymbol, TradeMinute;

Conclusion

The introduction of the new analytic functions, such as LAG, LEAD, FIRST_VALUE and LAST_VALUE, with the improved over clause, greatly facilitate the analysis on ordered data sets, such as time-series data.

See all articles by Yan Pan

Yan Pan
Yan Pan
Yan Pan (MCITP SQL Server 2008, MCITP SQL Server 2005, MCDBA SQL Server 2000, OCA Oracle 10g) is a Senior DBA for a leading global financial services firm, where her daily duties include administering hundreds of SQL Server and Oracle servers of every possible version, working with business units on software development, troubleshooting database issues, and tuning database performance. She has written a Wrox book titled “Microsoft SQL Server 2008 Administration with Windows PowerShell” with MAK who is also a columnist for DatabaseJournal.com. You can check out the book at many book stores, such as Barnes & Noble, Borders. Previously, Yan worked as a SQL Server DBA and a .NET developer at Data Based Ads, Inc., and developed two .NET Web applications. Before that, she worked at AT&T Research Labs. She designed OLAP cubes with SQL Server Analysis Services and developed PivotTable reports for regional managers. She has master’s degrees in Computer Science and Physics.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles