Window Functions
Perform calculations across rows with ROW_NUMBER, RANK, and DENSE_RANK.
Window Functions
Window functions are one of the most powerful features in SQL for performing calculations across sets of rows that are related to the current row. Unlike aggregate functions that collapse multiple rows into one, window functions preserve the individual rows while adding calculated values.
Think of window functions as looking through a "window" at a subset of your data while processing each row. You can rank rows, calculate running totals, find moving averages, compare values across rows, and much more - all without losing the detail of individual records.
Why Use Window Functions?
- Analytics without grouping: Calculate aggregates while keeping all rows visible
- Ranking and numbering: Assign ranks, row numbers, or dense ranks to rows
- Comparisons across rows: Compare current row with previous/next rows
- Running calculations: Calculate running totals, moving averages, cumulative sums
- Partitioning: Perform calculations within subgroups without GROUP BY
- Performance: More efficient than self-joins for many analytical queries
Window functions are essential for data analysis, reporting, and business intelligence. They're supported by all modern databases (MySQL 8.0+, PostgreSQL, SQL Server, Oracle, SQLite 3.25+).
Basic Window Function Syntax
Window functions use the OVER() clause to define the "window" of rows to operate on. The basic syntax is:
function_name() OVER (
[PARTITION BY column1, column2, ...]
[ORDER BY column3, column4, ...]
[ROWS or RANGE frame_specification]
)
Key components:
- Function: The window function (ROW_NUMBER, RANK, SUM, AVG, etc.)
- PARTITION BY: Divides data into groups (optional)
- ORDER BY: Defines the order of rows within each partition (required for many functions)
- Frame clause: Defines which rows are included in the window (advanced, optional)
ROW_NUMBER() - Sequential Numbering
ROW_NUMBER() assigns a unique sequential number to each row within a partition, starting from 1. It's one of the most commonly used window functions.
-- Number all employees, ordered by salary (highest first)
SELECT
employee_id,
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS overall_rank
FROM employees
ORDER BY salary DESC;Now let's use PARTITION BY to number employees within each department:
-- Number employees within each department by salary
SELECT
employee_id,
name,
department,
salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees
ORDER BY department, dept_rank;Notice how PARTITION BY department restarts the numbering at 1 for each department. This is incredibly useful for finding the top N in each group.
Key points:
ROW_NUMBER()always assigns unique numbers, even for tiesPARTITION BYdivides data into groups and restarts numbering for each groupORDER BYdetermines which row gets which number- Without
PARTITION BY, the entire result set is one window
RANK() and DENSE_RANK() - Handling Ties
While ROW_NUMBER() always assigns unique numbers, RANK() and DENSE_RANK() handle tied values differently:
- RANK(): Assigns the same rank to ties, then skips numbers
- DENSE_RANK(): Assigns the same rank to ties, but doesn't skip numbers
- ROW_NUMBER(): Never assigns the same number, even for ties
Let's see them in action:
-- Compare ROW_NUMBER, RANK, and DENSE_RANK
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank_num,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num
FROM employees
ORDER BY salary DESC, name;Notice the difference for employees with $58,000 salary (Eve Martinez and Helen White):
- ROW_NUMBER: Assigns 7 and 8 (unique numbers, arbitrary tie-breaking)
- RANK: Both get 7, then next rank is 9 (skips 8)
- DENSE_RANK: Both get 7, then next rank is 8 (no gaps)
Here's a practical example - finding the top 3 paid employees in each department:
-- Find top 3 highest paid employees per department
WITH ranked_employees AS (
SELECT
employee_id,
name,
department,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_rank
FROM employees
)
SELECT
employee_id,
name,
department,
salary,
salary_rank
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank, name;Using DENSE_RANK() ensures that if there are ties for rank 2, we still get rank 3 results. With RANK(), tied rank 2 values would make the next rank 4, potentially missing our "top 3" goal.
Function | Handles Ties | Gaps in Sequence | Use Case |
|---|---|---|---|
| ROW_NUMBER() | Assigns unique numbers | No gaps | Need unique IDs, pagination |
| RANK() | Same rank for ties | Gaps after ties | Olympic-style ranking (1, 2, 2, 4) |
| DENSE_RANK() | Same rank for ties | No gaps | Find "top N distinct values" |
Aggregate Window Functions
You can use aggregate functions (SUM, AVG, COUNT, MIN, MAX) as window functions to calculate running totals, moving averages, and cumulative statistics while preserving individual rows.
-- Calculate running total and moving average of sales
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
) AS running_total,
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day,
COUNT(*) OVER (ORDER BY order_date) AS days_count
FROM (
SELECT
DATE(order_date) AS order_date,
SUM(total_amount) AS daily_revenue
FROM orders
WHERE order_date >= '2026-01-01' AND order_date <= '2026-01-07'
GROUP BY DATE(order_date)
) daily_sales
ORDER BY order_date;This query demonstrates:
- Running total:
SUM() OVER (ORDER BY order_date)adds up all previous rows - Moving average:
AVG() OVER (... ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)calculates 3-day average - Row count:
COUNT(*) OVER (ORDER BY order_date)counts rows up to current row
The ROWS BETWEEN clause defines the window frame - which specific rows to include:
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW: Current row + 2 rows before itROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: All rows from start to current (default for ORDER BY)ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: Previous row + current + next row
Here's an example with partitioning to calculate department-level statistics:
-- Calculate salary statistics within each department
SELECT
employee_id,
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
MAX(salary) OVER (PARTITION BY department) AS dept_max_salary,
MIN(salary) OVER (PARTITION BY department) AS dept_min_salary,
COUNT(*) OVER (PARTITION BY department) AS dept_employee_count,
ROUND(salary / AVG(salary) OVER (PARTITION BY department) * 100, 1) AS pct_of_avg
FROM employees
ORDER BY department, salary DESC;Each row shows individual employee data alongside department-level aggregates. This would be impossible with GROUP BY without losing the individual row details.
LAG() and LEAD() - Access Adjacent Rows
LAG() and LEAD() allow you to access data from previous or following rows without self-joins. This is incredibly useful for calculating changes, growth rates, or comparing consecutive periods.
- LAG(column, offset, default): Gets value from previous row(s)
- LEAD(column, offset, default): Gets value from next row(s)
-- Calculate month-over-month sales growth
SELECT
month,
total_sales,
LAG(total_sales, 1) OVER (ORDER BY month) AS prev_month_sales,
total_sales - LAG(total_sales, 1) OVER (ORDER BY month) AS sales_change,
ROUND(
100.0 * (total_sales - LAG(total_sales, 1) OVER (ORDER BY month))
/ LAG(total_sales, 1) OVER (ORDER BY month),
1
) AS growth_pct
FROM (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(total_amount) AS total_sales
FROM orders
WHERE order_date >= '2025-07-01' AND order_date < '2026-02-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
) monthly_sales
ORDER BY month;Notice the first row has NULL for previous month (no previous row exists). You can provide a default value as the third parameter: LAG(total_sales, 1, 0).
Here's an example using LEAD() to compare with the next row:
-- Compare each employee's salary with the next higher salary
SELECT
name,
department,
salary,
LEAD(salary, 1) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS next_lower_salary,
salary - LEAD(salary, 1) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS salary_gap
FROM employees
ORDER BY department, salary DESC;The offset parameter (second argument) lets you look further ahead or behind:
LAG(salary, 1): Previous rowLAG(salary, 2): Two rows backLEAD(salary, 1): Next rowLEAD(salary, 2): Two rows ahead
NTILE() - Distribution into Buckets
NTILE(n) divides rows into n roughly equal buckets and assigns a bucket number (1 to n) to each row. This is useful for quartile analysis, percentile groupings, or dividing data into equal segments.
-- Divide employees into salary quartiles
SELECT
employee_id,
name,
department,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS salary_quartile,
CASE
WHEN NTILE(4) OVER (ORDER BY salary DESC) = 1 THEN 'Top 25%'
WHEN NTILE(4) OVER (ORDER BY salary DESC) = 2 THEN '25-50%'
WHEN NTILE(4) OVER (ORDER BY salary DESC) = 3 THEN '50-75%'
ELSE 'Bottom 25%'
END AS salary_bracket
FROM employees
ORDER BY salary DESC;With 9 employees and 4 quartiles, NTILE distributes them as evenly as possible: 2, 2, 2, 3 (the remainder goes to the last buckets).
Here's a practical use case - dividing customers into value segments:
-- Segment customers by lifetime value into 3 tiers
WITH customer_value AS (
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) AS total_orders,
SUM(o.total_amount) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
)
SELECT
customer_id,
customer_name,
total_orders,
ROUND(lifetime_value, 2) AS lifetime_value,
NTILE(3) OVER (ORDER BY lifetime_value DESC) AS value_tier,
CASE
WHEN NTILE(3) OVER (ORDER BY lifetime_value DESC) = 1 THEN 'High Value'
WHEN NTILE(3) OVER (ORDER BY lifetime_value DESC) = 2 THEN 'Medium Value'
ELSE 'Low Value'
END AS customer_segment
FROM customer_value
WHERE lifetime_value > 0
ORDER BY lifetime_value DESC;NTILE() is perfect for segmentation, A/B testing group assignment, and percentile analysis.
Real-World Example: Sales Performance Dashboard
Let's combine multiple window functions to create a comprehensive sales performance analysis:
-- Complete sales rep performance analysis
WITH sales_data AS (
SELECT
e.employee_id,
e.name AS sales_rep,
e.hire_date,
COUNT(o.order_id) AS total_orders,
COALESCE(SUM(o.total_amount), 0) AS total_sales,
COALESCE(AVG(o.total_amount), 0) AS avg_order_value
FROM employees e
LEFT JOIN orders o ON e.employee_id = o.sales_rep_id
WHERE e.department = 'Sales'
GROUP BY e.employee_id, e.name, e.hire_date
)
SELECT
sales_rep,
total_orders,
ROUND(total_sales, 2) AS total_sales,
ROUND(avg_order_value, 2) AS avg_order_value,
-- Ranking functions
RANK() OVER (ORDER BY total_sales DESC) AS sales_rank,
DENSE_RANK() OVER (ORDER BY total_orders DESC) AS orders_rank,
-- Percentile segmentation
NTILE(3) OVER (ORDER BY total_sales DESC) AS performance_tier,
-- Comparison with team
ROUND(AVG(total_sales) OVER (), 2) AS team_avg_sales,
ROUND(total_sales - AVG(total_sales) OVER (), 2) AS vs_team_avg,
ROUND(100.0 * total_sales / SUM(total_sales) OVER (), 1) AS pct_of_total_sales,
-- Sequential comparison
LAG(total_sales, 1) OVER (ORDER BY total_sales DESC) AS next_higher_sales,
ROUND(
100.0 * total_sales / LAG(total_sales, 1) OVER (ORDER BY total_sales DESC),
1
) AS pct_of_next_higher
FROM sales_data
ORDER BY total_sales DESC;This single query provides:
- Rankings: Sales rank and orders rank to identify top performers
- Tiers: Performance tier (1 = top third) for segmentation
- Team comparisons: Average sales and difference from average
- Contribution: Percentage of total team sales
- Sequential comparison: How close each rep is to the next higher performer
All of this in one query, with each row showing individual and comparative metrics!
Common Mistakes
1. Filtering on window functions in WHERE clause
❌ Wrong:
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
WHERE rank <= 3; -- Error! Can't use window function in WHERE
✅ Correct:
WITH ranked AS (
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
)
SELECT * FROM ranked WHERE rank <= 3;
2. Forgetting ORDER BY in ranking functions
❌ Wrong:
SELECT name, salary,
RANK() OVER (PARTITION BY department) AS rank -- Missing ORDER BY!
FROM employees;
✅ Correct:
SELECT name, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
3. Using GROUP BY with window functions unnecessarily
Window functions work on individual rows, not aggregated groups. If you use GROUP BY, the window function operates on the grouped results.
❌ Confusing:
SELECT department, AVG(salary) AS avg_salary,
RANK() OVER (ORDER BY AVG(salary) DESC) AS rank
FROM employees
GROUP BY department; -- GROUP BY changes what rows window function sees
✅ Clearer (if you want individual rows):
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
4. Rewriting the same window definition multiple times
You can use the WINDOW clause to name a window definition and reuse it:
❌ Repetitive:
SELECT name,
RANK() OVER (PARTITION BY department ORDER BY salary DESC),
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)
FROM employees;
✅ Better:
SELECT name,
RANK() OVER w,
ROW_NUMBER() OVER w
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);
Key Takeaways
- Window functions use OVER() to define the window of rows to operate on
- PARTITION BY divides data into groups; calculations restart for each group
- ORDER BY within OVER() determines row order for the calculation
- ROW_NUMBER() assigns unique sequential numbers (1, 2, 3, 4...)
- RANK() handles ties with gaps (1, 2, 2, 4, 5...)
- DENSE_RANK() handles ties without gaps (1, 2, 2, 3, 4...)
- Aggregate functions (SUM, AVG, etc.) as window functions create running calculations
- LAG() and LEAD() access previous or next rows without self-joins
- NTILE(n) divides rows into n buckets for quartile/percentile analysis
- Window functions preserve individual rows unlike GROUP BY
- Filter window function results with CTEs, not WHERE clauses
- Window functions are essential for analytics, reporting, and business intelligence
Practice Exercise
Using the following tables:
orders (order_id, customer_id, order_date, total_amount, sales_rep_id)
customers (customer_id, customer_name, signup_date, customer_tier)
employees (employee_id, name, department)
Write queries using window functions to:
- Rank customers by total spending within each customer tier
- Calculate each customer's running total of spending ordered by date
- Find the top 3 sales reps by total revenue, showing how far behind each is from the #1 rep
- Calculate month-over-month growth percentage for total sales
- Divide all orders into 10 equal buckets by order value and calculate the average order value for each bucket
Bonus challenge: Create a single query that shows each customer's most recent order, previous order amount, and how many days between those two orders using LAG().