...

Mastering SQL Window Functions | Towards Data Science


in my work, I have written countless SQL queries to extract insights from data. It’s always a challenging task because it’s not only important to write efficient queries, but also simple enough to maintain over time.

With each new problem comes a new lesson, and recently, I’ve been diving into SQL window functions. These powerful tools are incredibly useful when you need to perform calculations across a set of rows without losing the granularity of individual records

In this article, I’ll break down SQL window functions step by step. They might seem complex or unintuitive at first, but once you understand how they work, you’ll see how indispensable they can be. Are you ready? Let’s dive in and master them together!


Table of contents

  • Why do we need Window Functions?
  • Syntax of Window Function 
  • Four Simple Examples

Why do we need Window Functions?

To understand the power of Window Functions, let’s start with a simple example. Imagine we have a table containing six orders from an e-commerce website. Each row includes the order id, the date, the product, its brand and price.

Illustration by Author. Example table to demonstrate the power of Window Functions.

Let’s suppose that we want to calculate the total price for each brand. Using the GROUP BY clause, we can write a query like this:

SELECT 
      brand, 
      SUM(price) as total_price 
FROM Orders 
GROUP BY brand

This returns a result where each row represents one brand, along with the total price of all orders under that brand. 

|brand  |total_price|
|-------|-----------|
|carpisa|30         |
|nike   |175        |
|parfois|25         |
|zara   |65         |

This aggregation removes the details of individual orders, since the output only includes one row for brand. What if we want to keep all the original rows and add the total price for each brand as an extra field? 

By using SUM(price) OVER (PARTITION BY brand), we can calculate the total price for each brand without collapsing the rows:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders

We have obtained a result like this:

|order_id|date      |product|brand  |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|6       |2025/05/01|bag    |carpisa|30   |30         |
|1       |2024/02/01|shoes  |nike   |90   |175        |
|3       |2024/06/01|shoes  |nike   |85   |175        |
|5       |2025/04/01|bag    |parfois|25   |25         |
|2       |2024/05/01|dress  |zara   |50   |65         |
|4       |2025/01/01|t-shirt|zara   |15   |65         |

This query returns all six rows, preserving every individual order, and adds a new column showing the total price per brand. For example, the order with brand Carpisa shows a total of 30, since it’s the only Carpisa order, the two orders from Nike show 175 (90+85), and so on. 

You may notice that the table is no longer ordered by order_id. That’s because the window function partitions by brand, and SQL doesn’t guarantee row order unless explicitly specified. To restore the original order, we need to simply add an ORDER BY clause:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
ORDER BY order_id

Finally, we have the output containing all the required details:

|order_id|date      |product|brand  |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|1       |2024/02/01|shoes  |nike   |90   |175        |
|2       |2024/05/01|dress  |zara   |50   |65         |
|3       |2024/06/01|shoes  |nike   |85   |175        |
|4       |2025/01/01|t-shirt|zara   |15   |65         |
|5       |2025/04/01|bag    |parfois|25   |25         |
|6       |2025/05/01|bag    |carpisa|30   |30         |

Now, we have added the same aggregation as GROUP BY, while keeping all the individual order details.

Syntax of Window Functions

In general, the window function has a syntax that looks like this:

f(col2) OVER(
[PARTITION BY col1] 
[ORDER BY col3]
)

Let’s break it down. f(col2) is the operation you want to perform, such as sum, count and ranking. OVER clause defines the “window” or the subset of rows over which the window function operates. PARTITION BY col1 divides the data into groups and ORDER BY col1 determines the order of rows within each partition.

Moreover, window functions fall into three main categories:

  • aggregate function:COUNT, SUM, AVG, MINand MAX
  • rank function: ROW_NUMBER, RANK, DENSE_RANK, CUME_DIST, PERCENT_RANKandNTILE
  • value function: LEAD, LAG, FIRST_VALUE and LAST_VALUE

Four Simple Examples

Let’s show different examples to master window functions.

Example 1: Simple Window Function

To understand the concept of window functions, let’s start with a straightforward example. Suppose we want to calculate the total price of all the orders in the table. Using a GROUP BY clause would give us a single value: 295. However, that would collapse the rows and lose the individual order details. Instead, if we want to display the total price alongside each record, we can use a window function like this:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER () as tot_price
FROM Orders

This is the output:

|order_id|date      |product|brand  |price|tot_price|
|--------|----------|-------|-------|-----|---------|
|1       |2024-02-01|shoes  |nike   |90   |295      |
|2       |2024-05-01|dress  |zara   |50   |295      |
|3       |2024-06-01|shoes  |nike   |85   |295      |
|4       |2025-01-01|t-shirt|zara   |15   |295      |
|5       |2025-04-01|bag    |parfois|25   |295      |
|6       |2025-05-01|bag    |carpisa|30   |295      |

In this way, we obtained the sum of all prices over the entire dataset and repeated it for each row.

Example 2: Partition by clause

Let’s now calculate the average price per year while still keeping all the details. We can do this by using the PARTITION BY clause inside a window function to group rows by year and compute the average within each group:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    round(AVG(price) OVER (PARTITION BY YEAR(date) as avg_price
FROM Orders

Here’s what the output looks like:

|order_id|date      |product|brand  |price|avg_price|
|--------|----------|-------|-------|-----|---------|
|1       |2024-02-01|shoes  |nike   |90   |75       |
|2       |2024-05-01|dress  |zara   |50   |75       |
|3       |2024-06-01|shoes  |nike   |85   |75       |
|4       |2025-01-01|t-shirt|zara   |15   |23.33    |
|5       |2025-04-01|bag    |parfois|25   |23.33    |
|6       |2025-05-01|bag    |carpisa|30   |23.33    |

That’s great! We see the average price for each year alongside each row.

Example 3: Order by clause

One of the best ways to understand how ordering works within window functions is to apply a ranking function. Let’s say we want to rank all orders from highest to lowest price. Here’s how we can do it using the RANK() function:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    RANK() OVER (ORDER BY price DESC) as Rank
FROM Orders

We obtain an output like this:

|order_id|date      |product|brand  |price|Rank|
|--------|----------|-------|-------|-----|----|
|1       |2024-02-01|shoes  |nike   |90   |1   |
|3       |2024-06-01|shoes  |nike   |85   |2   |
|2       |2024-05-01|dress  |zara   |50   |3   |
|6       |2025-05-01|bag    |carpisa|30   |4   |
|5       |2025-04-01|bag    |parfois|25   |5   |
|4       |2025-01-01|t-shirt|zara   |15   |6   |

As shown, the order with the highest price gets rank 1, and the rest follow in descending order.

Example 4: Combine Partition by and Group by clauses

In the previous example, we ranked all orders from the highest to the lowest price across the entire dataset. But what if we want to restart the ranking for each year? We can do this by adding the PARTITION BY clause in the window function. This allows for splitting the data into separate groups by year and sorting the orders from the highest to the lowest price.

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    RANK() OVER (PARTITION BY YEAR(date) ORDER BY price DESC) as Rank
FROM Orders

The result should look like this:

|order_id|date      |product|brand  |price|Rank|
|--------|----------|-------|-------|-----|----|
|1       |2024-02-01|shoes  |nike   |90   |1   |
|3       |2024-06-01|shoes  |nike   |85   |2   |
|2       |2024-05-01|dress  |zara   |50   |3   |
|6       |2025-05-01|bag    |carpisa|30   |1   |
|5       |2025-04-01|bag    |parfois|25   |2   |
|4       |2025-01-01|t-shirt|zara   |15   |3   |

Now, the ranking restarts for each year, as we decided. 

Final thoughts:

I hope this guide helped you get a clear and practical introduction to SQL window functions. At first, they might feel a bit unintuitive, but once you compare them side by side with the GROUP BY clause, the value they bring becomes much easier to understand.

From my own experience, window functions have been incredibly powerful for extracting insights without losing row-level detail, something that traditional aggregations hide. They are incredibly useful when extracting metrics like totals, rankings, year-over-year or month-over-month comparisons.

However, there are some limitations. Window functions can be computationally expensive, especially over large datasets or complex partitions. It’s important to evaluate whether the added flexibility justifies the performance tradeoff in your specific use case.

Thanks for reading! Have a nice day!


Useful resources:

Source link

#Mastering #SQL #Window #Functions #Data #Science