in my work, I have written countless SQL queries to extract insights from data. It’s always a challenging task because it’s not only important to write efficient queries, but also simple enough to maintain over time.
With each new problem comes a new lesson, and recently, I’ve been diving into SQL window functions. These powerful tools are incredibly useful when you need to perform calculations across a set of rows without losing the granularity of individual records.
In this article, I’ll break down SQL window functions step by step. They might seem complex or unintuitive at first, but once you understand how they work, you’ll see how indispensable they can be. Are you ready? Let’s dive in and master them together!
Table of contents
- Why do we need Window Functions?
- Syntax of Window Function
- Four Simple Examples
Why do we need Window Functions?
To understand the power of Window Functions, let’s start with a simple example. Imagine we have a table containing six orders from an e-commerce website. Each row includes the order id, the date, the product, its brand and price.
Let’s suppose that we want to calculate the total price for each brand. Using the GROUP BY clause, we can write a query like this:
SELECT
brand,
SUM(price) as total_price
FROM Orders
GROUP BY brand
This returns a result where each row represents one brand, along with the total price of all orders under that brand.
|brand |total_price|
|-------|-----------|
|carpisa|30 |
|nike |175 |
|parfois|25 |
|zara |65 |
This aggregation removes the details of individual orders, since the output only includes one row for brand. What if we want to keep all the original rows and add the total price for each brand as an extra field?
By using SUM(price) OVER (PARTITION BY brand)
, we can calculate the total price for each brand without collapsing the rows:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
We have obtained a result like this:
|order_id|date |product|brand |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|6 |2025/05/01|bag |carpisa|30 |30 |
|1 |2024/02/01|shoes |nike |90 |175 |
|3 |2024/06/01|shoes |nike |85 |175 |
|5 |2025/04/01|bag |parfois|25 |25 |
|2 |2024/05/01|dress |zara |50 |65 |
|4 |2025/01/01|t-shirt|zara |15 |65 |
This query returns all six rows, preserving every individual order, and adds a new column showing the total price per brand. For example, the order with brand Carpisa shows a total of 30, since it’s the only Carpisa order, the two orders from Nike show 175 (90+85), and so on.
You may notice that the table is no longer ordered by order_id. That’s because the window function partitions by brand, and SQL doesn’t guarantee row order unless explicitly specified. To restore the original order, we need to simply add an ORDER BY
clause:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
ORDER BY order_id
Finally, we have the output containing all the required details:
|order_id|date |product|brand |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|1 |2024/02/01|shoes |nike |90 |175 |
|2 |2024/05/01|dress |zara |50 |65 |
|3 |2024/06/01|shoes |nike |85 |175 |
|4 |2025/01/01|t-shirt|zara |15 |65 |
|5 |2025/04/01|bag |parfois|25 |25 |
|6 |2025/05/01|bag |carpisa|30 |30 |
Now, we have added the same aggregation as GROUP BY
, while keeping all the individual order details.
Syntax of Window Functions
In general, the window function has a syntax that looks like this:
f(col2) OVER(
[PARTITION BY col1]
[ORDER BY col3]
)
Let’s break it down. f(col2)
is the operation you want to perform, such as sum, count and ranking. OVER
clause defines the “window” or the subset of rows over which the window function operates. PARTITION BY col1
divides the data into groups and ORDER BY col1
determines the order of rows within each partition.
Moreover, window functions fall into three main categories:
- aggregate function:
COUNT
,SUM
,AVG
,MIN
andMAX
- rank function:
ROW_NUMBER
,RANK
,DENSE_RANK
,CUME_DIST
,PERCENT_RANK
andNTILE
- value function:
LEAD
,LAG
,FIRST_VALUE
andLAST_VALUE
Four Simple Examples
Let’s show different examples to master window functions.
Example 1: Simple Window Function
To understand the concept of window functions, let’s start with a straightforward example. Suppose we want to calculate the total price of all the orders in the table. Using a GROUP BY
clause would give us a single value: 295
. However, that would collapse the rows and lose the individual order details. Instead, if we want to display the total price alongside each record, we can use a window function like this:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER () as tot_price
FROM Orders
This is the output:
|order_id|date |product|brand |price|tot_price|
|--------|----------|-------|-------|-----|---------|
|1 |2024-02-01|shoes |nike |90 |295 |
|2 |2024-05-01|dress |zara |50 |295 |
|3 |2024-06-01|shoes |nike |85 |295 |
|4 |2025-01-01|t-shirt|zara |15 |295 |
|5 |2025-04-01|bag |parfois|25 |295 |
|6 |2025-05-01|bag |carpisa|30 |295 |
In this way, we obtained the sum of all prices over the entire dataset and repeated it for each row.
Example 2: Partition by clause
Let’s now calculate the average price per year while still keeping all the details. We can do this by using the PARTITION BY
clause inside a window function to group rows by year and compute the average within each group:
SELECT
order_id,
date,
product,
brand,
price,
round(AVG(price) OVER (PARTITION BY YEAR(date) as avg_price
FROM Orders
Here’s what the output looks like:
|order_id|date |product|brand |price|avg_price|
|--------|----------|-------|-------|-----|---------|
|1 |2024-02-01|shoes |nike |90 |75 |
|2 |2024-05-01|dress |zara |50 |75 |
|3 |2024-06-01|shoes |nike |85 |75 |
|4 |2025-01-01|t-shirt|zara |15 |23.33 |
|5 |2025-04-01|bag |parfois|25 |23.33 |
|6 |2025-05-01|bag |carpisa|30 |23.33 |
That’s great! We see the average price for each year alongside each row.
Example 3: Order by clause
One of the best ways to understand how ordering works within window functions is to apply a ranking function. Let’s say we want to rank all orders from highest to lowest price. Here’s how we can do it using the RANK()
function:
SELECT
order_id,
date,
product,
brand,
price,
RANK() OVER (ORDER BY price DESC) as Rank
FROM Orders
We obtain an output like this:
|order_id|date |product|brand |price|Rank|
|--------|----------|-------|-------|-----|----|
|1 |2024-02-01|shoes |nike |90 |1 |
|3 |2024-06-01|shoes |nike |85 |2 |
|2 |2024-05-01|dress |zara |50 |3 |
|6 |2025-05-01|bag |carpisa|30 |4 |
|5 |2025-04-01|bag |parfois|25 |5 |
|4 |2025-01-01|t-shirt|zara |15 |6 |
As shown, the order with the highest price gets rank 1, and the rest follow in descending order.
Example 4: Combine Partition by and Group by clauses
In the previous example, we ranked all orders from the highest to the lowest price across the entire dataset. But what if we want to restart the ranking for each year? We can do this by adding the PARTITION BY
clause in the window function. This allows for splitting the data into separate groups by year and sorting the orders from the highest to the lowest price.
SELECT
order_id,
date,
product,
brand,
price,
RANK() OVER (PARTITION BY YEAR(date) ORDER BY price DESC) as Rank
FROM Orders
The result should look like this:
|order_id|date |product|brand |price|Rank|
|--------|----------|-------|-------|-----|----|
|1 |2024-02-01|shoes |nike |90 |1 |
|3 |2024-06-01|shoes |nike |85 |2 |
|2 |2024-05-01|dress |zara |50 |3 |
|6 |2025-05-01|bag |carpisa|30 |1 |
|5 |2025-04-01|bag |parfois|25 |2 |
|4 |2025-01-01|t-shirt|zara |15 |3 |
Now, the ranking restarts for each year, as we decided.
Final thoughts:
I hope this guide helped you get a clear and practical introduction to SQL window functions. At first, they might feel a bit unintuitive, but once you compare them side by side with the GROUP BY
clause, the value they bring becomes much easier to understand.
From my own experience, window functions have been incredibly powerful for extracting insights without losing row-level detail, something that traditional aggregations hide. They are incredibly useful when extracting metrics like totals, rankings, year-over-year or month-over-month comparisons.
However, there are some limitations. Window functions can be computationally expensive, especially over large datasets or complex partitions. It’s important to evaluate whether the added flexibility justifies the performance tradeoff in your specific use case.
Thanks for reading! Have a nice day!
Useful resources:
Source link
#Mastering #SQL #Window #Functions #Data #Science