dcsimg
 

Introduction to the Partition By Window Function

Wednesday Oct 2nd 2019 by Hannes Du Preez
Introduction to the Partition By Window Function

Window functions can be life savers by making a complicated SQL calculation easy. A window function combines that logic and provides row by row or window by window feedback. Read on to learn more!

T-SQL window functions perform calculations over a set of rows (known as a "window") and return a single value for each row from the underlying query. A window (or windowing or windowed) function makes use of the values from the rows in a window to calculate the returned values.

A window is defined by using the OVER() clause. The OVER() T-SQL clause has the following functions:

  1. It defines window partitions by using the PARTITION BY clause.
  2. It orders rows within partitions by using the ORDER BY clause.

The OVER() clause can accept three different arguments:

  1. PARTITION BY - PARTITION BY resets its counter every time a given column changes values.
  2. ORDER BY - ORDER BY orders the rows (in the window only) the function evaluates.
  3. ROWS BETWEEN - ROWS BETWEEN enables you to further limit the rows in the window.

The main focus of this article is the PARTITION BY function, but I may touch some other clauses as well.

A Small Script

Let’s assume you are an avid car-sport fan, and you have kept track of the different drivers, different cars, speeds accomplished, and speeds with different cars accomplished at specific dates. A query such as the following would give you detailed results.

SELECT
       SpeedTestID,
       SpeedTestDate,
       CarID,
       CarSpeed,
 
       ROW_NUMBER() OVER (PARTITION BY SpeedTestDate, CarID ORDER BY SpeedTestID) AS SpeedTestsDoneToday, --Lists the number of the row, ordered by SpeedTestID
 
       SUM(CarSpeed) OVER () AS CarSpeedTotal, --Grand Total of Carspeed for entire result set
       SUM(CarSpeed) OVER (PARTITION BY SpeedTestDate) AS SpeedTotal --Total CarSpeed for row on SpeedTestdate,
       SUM(CarSpeed) OVER (PARTITION BY SpeedTestDate, CarID) AS SpeedTotalPerCar, --Total CarSpeed for row's SpeedTestDate AND Car
 
       AVG(CarSpeed) OVER (PARTITION BY SpeedTestDate) AS SpeedAvg, --Average CarSpeed for row on SpeedTestdate,
       AVG(CarSpeed) OVER (PARTITION BY SpeedTestDate, CarID) AS SpeedAvgPerCar, --Average CarSpeed for row's SpeedTestDate AND Car
 
FROM SpeedTests

Let’s break the code down.

The first few lines should be obvious; you should already know (especially if you’re reading this article) how to select data and specify the fields needed in the select query.

ROW_NUMBER() will display the current number of the row being calculated by the window statement. In this case the date of the Speed test, the car’s ID and it will be ordered by the SpeedTestID field.

The next few lines make use of aggregate functions to work out the grand total carspeed of the speed tests, total speed on a specific row, total car speed, for a specific date, total car speed for a specific car for a specific date, as well as their Averages.

Let’s take the example a few steps further!

Add the following into the query:

SUM(CarSpeed) OVER (ORDER BY SpeedTestDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS SpeedRunningTotal, -- Add all the CarSpeed values in rows up to and including the current row

SUM(CarSpeed) OVER (ORDER BY SpeedTestDate ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS SpeedSumLast4 --add all the CarSpeed values in rows between the current row and the 3 rows before it

 

By using ROWS BETWEEN you narrow the scope to be evaluated by the window function. The function will simply begin and end where ROWS BETWEEN specify.

Let’s get crazy. Add the next few lines to the script:

FIRST_VALUE(CarSpeed) OVER (ORDER BY SpeedTestDate) AS FirstSpeed, --FIRST_VALUE function will return the first CarSpeed in the result set

LAST_VALUE(CarSpeed) OVER (ORDER BY SpeedTestDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastSpeed, --LAST_VALUE function will return the last CarSpeed in the result set

Here, you made use of First_Value and Last_Value, which are quite aptly named. Get the first value and get the last value.

Even wackier… Add the next code:

LAG(CarSpeed, 1, 0) OVER (ORDER BY SpeedTestID) AS PrevSpeed, --LAG function will return the CarSpeed from 1 row behind it

LEAD(CarSpeed, 3) OVER (ORDER BY SpeedTestID) AS NextSpeed, --LEAD function will get the CarSpeed from 3 rows ahead

Lag gets the speed from one row before and Lead 3 rows after the current result set.

The reason I put these in is so that you can see the true power of window functions, as this is just the tip of the iceberg

Conclusion

Window functions can be life savers by making a complicated SQL calculation easy. Instead of writing massive SQL statements trying to figure out certain logic, a window function combines that logic and provides row by row or window by window feedback.

Home
Mobile Site | Full Site