PARTITION BY with Window Functions
Group rows for window function calculations.
Introduction to PARTITION BY
You've learned about window functions with the OVER() clause. Now let's dive deep into PARTITION BY - the most powerful aspect of window functions that lets you divide your data into groups and perform calculations within each group without losing individual rows.
Think of PARTITION BY like GROUP BY's sophisticated cousin:
- GROUP BY: Collapses rows into groups, returns one row per group
- PARTITION BY: Divides rows into groups, keeps all rows, adds calculated values
Real-world scenario: "Show each employee with their salary and their department's average salary side-by-side."
With GROUP BY, you'd lose individual employee rows. With PARTITION BY, you keep every employee row and add the department average as a new column!
A clause in window functions that divides the result set into partitions (groups) and applies the window function separately to each partition, while preserving all individual rows.
Feature | GROUP BY | PARTITION BY |
|---|---|---|
| Purpose | Aggregate rows into groups | Calculate within groups |
| Result | One row per group | All rows preserved |
| Use Case | "Total sales by region" | "Each sale with regional average" |
| Combines | Must use aggregate functions | Works with any window function |
Understanding the Difference
Let's start with sample data to see the difference clearly:
id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 95000 |
| 2 | Bob | Engineering | 85000 |
| 3 | Charlie | Sales | 70000 |
| 4 | Diana | Sales | 65000 |
| 5 | Eve | Marketing | 75000 |
| 6 | Frank | Marketing | 80000 |
With GROUP BY (collapses rows):
-- Average salary per department (one row per department)
SELECT
department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;We get 3 rows (one per department), but we lost individual employee data!
With PARTITION BY (keeps all rows):
-- Each employee with their department's average salary
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees
ORDER BY department, salary DESC;We get all 6 employees plus each department's average salary! Each employee can see how their salary compares to their department's average.
This is the magic of PARTITION BY - you get group-level statistics on individual row-level data.
Key difference:
- GROUP BY: 6 rows → 3 rows (collapsed)
- PARTITION BY: 6 rows → 6 rows (preserved) with added calculation
PARTITION BY Syntax
Basic structure:
SELECT
column1,
column2,
FUNCTION() OVER (PARTITION BY partition_column) AS result_name
FROM table_name;With multiple partition columns:
SELECT
column1,
FUNCTION() OVER (PARTITION BY col1, col2 ORDER BY col3) AS result
FROM table_name;Components:
- PARTITION BY column: Divides data into groups based on column values
- ORDER BY: (Optional) Orders rows within each partition
- FUNCTION(): Any window function (AVG, SUM, ROW_NUMBER, RANK, etc.)
Aggregate Functions with PARTITION BY
All aggregate functions work with PARTITION BY: SUM, AVG, COUNT, MIN, MAX.
sale_id | salesperson | region | amount | sale_date |
|---|---|---|---|---|
| 1 | Alice | North | 5000 | 2024-01-15 |
| 2 | Bob | North | 7000 | 2024-01-20 |
| 3 | Charlie | South | 6000 | 2024-01-18 |
| 4 | Alice | North | 4500 | 2024-02-01 |
| 5 | Diana | South | 5500 | 2024-02-05 |
| 6 | Bob | North | 8000 | 2024-02-10 |
Example 1: SUM - Total sales by region
SELECT
sale_id,
salesperson,
region,
amount,
SUM(amount) OVER (PARTITION BY region) AS regional_total
FROM sales
ORDER BY region, sale_date;Each sale shows the total for its region. North: 5000+7000+4500+8000 = 24,500. South: 6000+5500 = 11,500.
Example 2: COUNT - Number of sales per region
SELECT
sale_id,
salesperson,
region,
amount,
COUNT(*) OVER (PARTITION BY region) AS sales_count
FROM sales
ORDER BY region, sale_date;North has 4 sales, South has 2.
Example 3: MIN and MAX - Range within partition
SELECT
salesperson,
region,
amount,
MIN(amount) OVER (PARTITION BY region) AS min_regional_sale,
MAX(amount) OVER (PARTITION BY region) AS max_regional_sale,
amount - MIN(amount) OVER (PARTITION BY region) AS above_minimum
FROM sales
ORDER BY region, amount DESC;Each sale shows its region's minimum and maximum, plus how much it exceeds the minimum.
Ranking Functions with PARTITION BY
Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) are incredibly useful with PARTITION BY.
ROW_NUMBER - Sequential numbering within partition:
-- Rank salespeople within each region by sales amount
SELECT
salesperson,
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS regional_rank
FROM sales
ORDER BY region, regional_rank;Ranking restarts at 1 for each region! Bob's 8000 is #1 in North, Charlie's 6000 is #1 in South.
RANK - With gap for ties:
product_id | category | product_name | price |
|---|---|---|---|
| 1 | Electronics | Laptop | 999.99 |
| 2 | Electronics | Monitor | 349.99 |
| 3 | Electronics | Keyboard | 79.99 |
| 4 | Furniture | Desk | 299.99 |
| 5 | Furniture | Chair | 199.99 |
| 6 | Furniture | Lamp | 199.99 |
-- Rank products by price within category
SELECT
category,
product_name,
price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products;Chair and Lamp are tied at rank 2 (both $199.99). Notice there's no rank 3 in Furniture - RANK leaves gaps after ties.
Finding Top N per Group
One of the most common use cases: "Find the top 3 products in each category" or "Top 2 salespeople per region".
-- Top 2 sales per region
SELECT *
FROM (
SELECT
sale_id,
salesperson,
region,
amount,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rank_in_region
FROM sales
) ranked
WHERE rank_in_region <= 2
ORDER BY region, rank_in_region;We get the top 2 sales from each region! This would be extremely complex with GROUP BY alone.
Top 2 most expensive products per category:
SELECT category, product_name, price
FROM (
SELECT
category,
product_name,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products
) ranked
WHERE rank <= 2;Pattern for "Top N per group":
- Use ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY sort_column)
- Wrap in subquery
- Filter WHERE rank <= N
Multiple Partitions in One Query
You can use multiple window functions with different partitions:
SELECT
salesperson,
region,
amount,
-- Regional statistics
AVG(amount) OVER (PARTITION BY region) AS regional_avg,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS regional_rank,
-- Salesperson statistics (across all regions)
SUM(amount) OVER (PARTITION BY salesperson) AS salesperson_total,
-- Overall statistics (no partition)
AVG(amount) OVER () AS overall_avg
FROM sales
ORDER BY region, amount DESC;This single query calculates:
- Regional average and rank
- Each salesperson's total (all their sales combined)
- Overall company average
Each window function can have its own PARTITION BY!
PARTITION BY with ORDER BY
Adding ORDER BY to the OVER clause changes behavior - it creates a "running" or "cumulative" calculation.
Without ORDER BY (calculates over entire partition):
-- Total sales per region (all rows in partition)
SELECT
sale_id,
region,
amount,
SUM(amount) OVER (PARTITION BY region) AS regional_total
FROM sales
ORDER BY region, sale_id;Every North row shows 24,500 (total of all North sales).
With ORDER BY (running total within partition):
-- Running total per region (cumulative)
SELECT
sale_id,
region,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) AS running_total
FROM sales
ORDER BY region, sale_date;Now it's a running total! Each row shows the sum of all sales in that region up to that date.
For North:
- Jan 15: 5000 (first sale)
- Jan 20: 12000 (5000 + 7000)
- Feb 01: 16500 (5000 + 7000 + 4500)
- Feb 10: 24500 (5000 + 7000 + 4500 + 8000)
Key difference:
- No ORDER BY: Function applies to entire partition
- With ORDER BY: Function applies to rows from start of partition up to current row (running calculation)
Practical Business Examples
Example 1: Employee comparison to department average
SELECT
name,
department,
salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 2) AS dept_avg,
ROUND(salary - AVG(salary) OVER (PARTITION BY department), 2) AS diff_from_avg,
CASE
WHEN salary > AVG(salary) OVER (PARTITION BY department)
THEN 'Above Average'
ELSE 'Below Average'
END AS performance_level
FROM employees
ORDER BY department, salary DESC;Example 2: Market share analysis
SELECT
salesperson,
region,
amount,
SUM(amount) OVER (PARTITION BY region) AS regional_total,
ROUND(amount * 100.0 / SUM(amount) OVER (PARTITION BY region), 2) AS regional_share_pct,
SUM(amount) OVER () AS company_total,
ROUND(amount * 100.0 / SUM(amount) OVER (), 2) AS company_share_pct
FROM sales
ORDER BY region, amount DESC;Bob's 8000 sale represents 32.65% of North region sales and 22.22% of company sales.
Example 3: Moving average per product category
SELECT
category,
product_name,
price,
AVG(price) OVER (PARTITION BY category) AS category_avg_price,
price / AVG(price) OVER (PARTITION BY category) AS price_index
FROM products
ORDER BY category, price DESC;The Laptop costs 2.10x the Electronics category average. The Keyboard is only 0.17x (17% of) the average.
Common Mistakes to Avoid
Don't make these mistakes:
-
Forgetting ORDER BY for ranking functions: ❌
- Wrong:
ROW_NUMBER() OVER (PARTITION BY dept) - ROW_NUMBER needs ORDER BY to know which order to number rows
- Fix:
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
- Wrong:
-
Using GROUP BY when you mean PARTITION BY: ❌
- GROUP BY collapses rows, PARTITION BY preserves them
- If you want individual rows with group stats, use PARTITION BY
-
Mixing aggregate without window function: ❌
- Wrong:
SELECT name, salary, AVG(salary) FROM employees - Either GROUP BY or use OVER clause
- Fix:
SELECT name, salary, AVG(salary) OVER () FROM employees
- Wrong:
-
Not understanding ORDER BY's effect: ⚠️
- ORDER BY in OVER changes behavior to running calculations
- Without ORDER BY: entire partition
- With ORDER BY: running/cumulative
-
Assuming partitions are independent: They are!
- Calculations in one partition don't affect others
- Ranks restart at 1 in each partition
-
Over-partitioning: ❌
- Partitioning by unique column (like ID) is pointless
- Each partition would have only 1 row
- Partition by meaningful groups (department, region, category)
-
Not aliasing window functions: Makes queries unreadable
- Always use AS to name your calculated columns
AVG(salary) OVER (...) AS dept_avg_salary
-
Forgetting PARTITION BY creates running total with ORDER BY: ⚠️
SUM(amount) OVER (ORDER BY date)is a running total- Add PARTITION BY if you want running totals per group
Performance Considerations
PARTITION BY performance tips:
1. Index the partition columns:
-- If you frequently partition by department and order by salary
CREATE INDEX idx_dept_salary ON employees(department, salary);2. Limit the result set before applying window functions:
-- Good: Filter first, then apply window function
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
WHERE hire_date >= '2023-01-01';
-- Less efficient: Window function on all rows, then filter
SELECT * FROM (
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
) subquery
WHERE hire_date >= '2023-01-01';3. Window functions vs self-joins:
Window functions are usually faster than self-joins for analytical queries:
-- Slow: Self-join to get department average
SELECT e1.name, e1.salary, dept_avg.avg_salary
FROM employees e1
JOIN (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept_avg ON e1.department = dept_avg.department;
-- Fast: Window function
SELECT name, salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary
FROM employees;Best practice: Window functions are optimized for analytical queries. They're almost always faster and more readable than equivalent self-joins.
Key Takeaways
What you learned:
✅ PARTITION BY divides data into groups for window function calculations ✅ Unlike GROUP BY, PARTITION BY preserves all rows ✅ Window functions calculate within each partition separately ✅ Combine with aggregate functions (SUM, AVG, COUNT, MIN, MAX) ✅ Combine with ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ✅ ORDER BY in OVER clause creates running/cumulative calculations ✅ Multiple window functions can use different partitions in same query ✅ Perfect for "top N per group" queries ✅ More efficient than self-joins for analytical queries ✅ Essential for business analytics and reporting
Common patterns:
- Compare individual rows to group averages
- Rank items within categories
- Calculate running totals per group
- Find top N per group
- Calculate market share percentages
- Identify above/below average performers
Coming up next: You'll learn about ACID properties - the fundamental principles that ensure database reliability and data integrity!
Practice Exercise: Try these queries:
- Show each employee with their department's total salary budget
- Rank products by price within each category
- Calculate running total of sales per region ordered by date
- Find the top 3 highest-paid employees in each department
- Show each sale with its percentage of regional total