PARTITION BY with Window Functions

Group rows for window function calculations.

22 min read
Intermediate

Introduction to PARTITION BY

You've learned about window functions with the OVER() clause. Now let's dive deep into PARTITION BY - the most powerful aspect of window functions that lets you divide your data into groups and perform calculations within each group without losing individual rows.

Think of PARTITION BY like GROUP BY's sophisticated cousin:

  • GROUP BY: Collapses rows into groups, returns one row per group
  • PARTITION BY: Divides rows into groups, keeps all rows, adds calculated values

Real-world scenario: "Show each employee with their salary and their department's average salary side-by-side."

With GROUP BY, you'd lose individual employee rows. With PARTITION BY, you keep every employee row and add the department average as a new column!

A clause in window functions that divides the result set into partitions (groups) and applies the window function separately to each partition, while preserving all individual rows.

GROUP BY vs PARTITION BY
Feature
GROUP BY
PARTITION BY
PurposeAggregate rows into groupsCalculate within groups
ResultOne row per groupAll rows preserved
Use Case"Total sales by region""Each sale with regional average"
CombinesMust use aggregate functionsWorks with any window function

Understanding the Difference

Let's start with sample data to see the difference clearly:

employees
id
name
department
salary
1AliceEngineering95000
2BobEngineering85000
3CharlieSales70000
4DianaSales65000
5EveMarketing75000
6FrankMarketing80000

With GROUP BY (collapses rows):

sql
-- Average salary per department (one row per department)
SELECT
  department,
  AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

We get 3 rows (one per department), but we lost individual employee data!

With PARTITION BY (keeps all rows):

sql
-- Each employee with their department's average salary
SELECT
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees
ORDER BY department, salary DESC;

We get all 6 employees plus each department's average salary! Each employee can see how their salary compares to their department's average.

This is the magic of PARTITION BY - you get group-level statistics on individual row-level data.

Key difference:

  • GROUP BY: 6 rows → 3 rows (collapsed)
  • PARTITION BY: 6 rows → 6 rows (preserved) with added calculation

PARTITION BY Syntax

Basic structure:

sql
SELECT
  column1,
  column2,
  FUNCTION() OVER (PARTITION BY partition_column) AS result_name
FROM table_name;

With multiple partition columns:

sql
SELECT
  column1,
  FUNCTION() OVER (PARTITION BY col1, col2 ORDER BY col3) AS result
FROM table_name;

Components:

  • PARTITION BY column: Divides data into groups based on column values
  • ORDER BY: (Optional) Orders rows within each partition
  • FUNCTION(): Any window function (AVG, SUM, ROW_NUMBER, RANK, etc.)

Aggregate Functions with PARTITION BY

All aggregate functions work with PARTITION BY: SUM, AVG, COUNT, MIN, MAX.

sales
sale_id
salesperson
region
amount
sale_date
1AliceNorth50002024-01-15
2BobNorth70002024-01-20
3CharlieSouth60002024-01-18
4AliceNorth45002024-02-01
5DianaSouth55002024-02-05
6BobNorth80002024-02-10

Example 1: SUM - Total sales by region

sql
SELECT
  sale_id,
  salesperson,
  region,
  amount,
  SUM(amount) OVER (PARTITION BY region) AS regional_total
FROM sales
ORDER BY region, sale_date;

Each sale shows the total for its region. North: 5000+7000+4500+8000 = 24,500. South: 6000+5500 = 11,500.

Example 2: COUNT - Number of sales per region

sql
SELECT
  sale_id,
  salesperson,
  region,
  amount,
  COUNT(*) OVER (PARTITION BY region) AS sales_count
FROM sales
ORDER BY region, sale_date;

North has 4 sales, South has 2.

Example 3: MIN and MAX - Range within partition

sql
SELECT
  salesperson,
  region,
  amount,
  MIN(amount) OVER (PARTITION BY region) AS min_regional_sale,
  MAX(amount) OVER (PARTITION BY region) AS max_regional_sale,
  amount - MIN(amount) OVER (PARTITION BY region) AS above_minimum
FROM sales
ORDER BY region, amount DESC;

Each sale shows its region's minimum and maximum, plus how much it exceeds the minimum.

Ranking Functions with PARTITION BY

Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) are incredibly useful with PARTITION BY.

ROW_NUMBER - Sequential numbering within partition:

sql
-- Rank salespeople within each region by sales amount
SELECT
  salesperson,
  region,
  amount,
  ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS regional_rank
FROM sales
ORDER BY region, regional_rank;

Ranking restarts at 1 for each region! Bob's 8000 is #1 in North, Charlie's 6000 is #1 in South.

RANK - With gap for ties:

products
product_id
category
product_name
price
1ElectronicsLaptop999.99
2ElectronicsMonitor349.99
3ElectronicsKeyboard79.99
4FurnitureDesk299.99
5FurnitureChair199.99
6FurnitureLamp199.99
sql
-- Rank products by price within category
SELECT
  category,
  product_name,
  price,
  RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank
FROM products;

Chair and Lamp are tied at rank 2 (both $199.99). Notice there's no rank 3 in Furniture - RANK leaves gaps after ties.

Finding Top N per Group

One of the most common use cases: "Find the top 3 products in each category" or "Top 2 salespeople per region".

sql
-- Top 2 sales per region
SELECT *
FROM (
  SELECT
    sale_id,
    salesperson,
    region,
    amount,
    ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS rank_in_region
  FROM sales
) ranked
WHERE rank_in_region <= 2
ORDER BY region, rank_in_region;

We get the top 2 sales from each region! This would be extremely complex with GROUP BY alone.

Top 2 most expensive products per category:

sql
SELECT category, product_name, price
FROM (
  SELECT
    category,
    product_name,
    price,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
  FROM products
) ranked
WHERE rank <= 2;

Pattern for "Top N per group":

  1. Use ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY sort_column)
  2. Wrap in subquery
  3. Filter WHERE rank <= N

Multiple Partitions in One Query

You can use multiple window functions with different partitions:

sql
SELECT
  salesperson,
  region,
  amount,
  -- Regional statistics
  AVG(amount) OVER (PARTITION BY region) AS regional_avg,
  RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS regional_rank,
  -- Salesperson statistics (across all regions)
  SUM(amount) OVER (PARTITION BY salesperson) AS salesperson_total,
  -- Overall statistics (no partition)
  AVG(amount) OVER () AS overall_avg
FROM sales
ORDER BY region, amount DESC;

This single query calculates:

  • Regional average and rank
  • Each salesperson's total (all their sales combined)
  • Overall company average

Each window function can have its own PARTITION BY!

PARTITION BY with ORDER BY

Adding ORDER BY to the OVER clause changes behavior - it creates a "running" or "cumulative" calculation.

Without ORDER BY (calculates over entire partition):

sql
-- Total sales per region (all rows in partition)
SELECT
  sale_id,
  region,
  amount,
  SUM(amount) OVER (PARTITION BY region) AS regional_total
FROM sales
ORDER BY region, sale_id;

Every North row shows 24,500 (total of all North sales).

With ORDER BY (running total within partition):

sql
-- Running total per region (cumulative)
SELECT
  sale_id,
  region,
  sale_date,
  amount,
  SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) AS running_total
FROM sales
ORDER BY region, sale_date;

Now it's a running total! Each row shows the sum of all sales in that region up to that date.

For North:

  • Jan 15: 5000 (first sale)
  • Jan 20: 12000 (5000 + 7000)
  • Feb 01: 16500 (5000 + 7000 + 4500)
  • Feb 10: 24500 (5000 + 7000 + 4500 + 8000)

Key difference:

  • No ORDER BY: Function applies to entire partition
  • With ORDER BY: Function applies to rows from start of partition up to current row (running calculation)

Practical Business Examples

Example 1: Employee comparison to department average

sql
SELECT
  name,
  department,
  salary,
  ROUND(AVG(salary) OVER (PARTITION BY department), 2) AS dept_avg,
  ROUND(salary - AVG(salary) OVER (PARTITION BY department), 2) AS diff_from_avg,
  CASE
    WHEN salary > AVG(salary) OVER (PARTITION BY department)
    THEN 'Above Average'
    ELSE 'Below Average'
  END AS performance_level
FROM employees
ORDER BY department, salary DESC;

Example 2: Market share analysis

sql
SELECT
  salesperson,
  region,
  amount,
  SUM(amount) OVER (PARTITION BY region) AS regional_total,
  ROUND(amount * 100.0 / SUM(amount) OVER (PARTITION BY region), 2) AS regional_share_pct,
  SUM(amount) OVER () AS company_total,
  ROUND(amount * 100.0 / SUM(amount) OVER (), 2) AS company_share_pct
FROM sales
ORDER BY region, amount DESC;

Bob's 8000 sale represents 32.65% of North region sales and 22.22% of company sales.

Example 3: Moving average per product category

sql
SELECT
  category,
  product_name,
  price,
  AVG(price) OVER (PARTITION BY category) AS category_avg_price,
  price / AVG(price) OVER (PARTITION BY category) AS price_index
FROM products
ORDER BY category, price DESC;

The Laptop costs 2.10x the Electronics category average. The Keyboard is only 0.17x (17% of) the average.

Common Mistakes to Avoid

Don't make these mistakes:

  1. Forgetting ORDER BY for ranking functions: ❌

    • Wrong: ROW_NUMBER() OVER (PARTITION BY dept)
    • ROW_NUMBER needs ORDER BY to know which order to number rows
    • Fix: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)
  2. Using GROUP BY when you mean PARTITION BY: ❌

    • GROUP BY collapses rows, PARTITION BY preserves them
    • If you want individual rows with group stats, use PARTITION BY
  3. Mixing aggregate without window function: ❌

    • Wrong: SELECT name, salary, AVG(salary) FROM employees
    • Either GROUP BY or use OVER clause
    • Fix: SELECT name, salary, AVG(salary) OVER () FROM employees
  4. Not understanding ORDER BY's effect: ⚠️

    • ORDER BY in OVER changes behavior to running calculations
    • Without ORDER BY: entire partition
    • With ORDER BY: running/cumulative
  5. Assuming partitions are independent: They are!

    • Calculations in one partition don't affect others
    • Ranks restart at 1 in each partition
  6. Over-partitioning: ❌

    • Partitioning by unique column (like ID) is pointless
    • Each partition would have only 1 row
    • Partition by meaningful groups (department, region, category)
  7. Not aliasing window functions: Makes queries unreadable

    • Always use AS to name your calculated columns
    • AVG(salary) OVER (...) AS dept_avg_salary
  8. Forgetting PARTITION BY creates running total with ORDER BY: ⚠️

    • SUM(amount) OVER (ORDER BY date) is a running total
    • Add PARTITION BY if you want running totals per group

Performance Considerations

PARTITION BY performance tips:

1. Index the partition columns:

sql
-- If you frequently partition by department and order by salary
CREATE INDEX idx_dept_salary ON employees(department, salary);

2. Limit the result set before applying window functions:

sql
-- Good: Filter first, then apply window function
SELECT name, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
WHERE hire_date >= '2023-01-01';

-- Less efficient: Window function on all rows, then filter
SELECT * FROM (
  SELECT name, department, salary,
         AVG(salary) OVER (PARTITION BY department) AS dept_avg
  FROM employees
) subquery
WHERE hire_date >= '2023-01-01';

3. Window functions vs self-joins:

Window functions are usually faster than self-joins for analytical queries:

sql
-- Slow: Self-join to get department average
SELECT e1.name, e1.salary, dept_avg.avg_salary
FROM employees e1
JOIN (
  SELECT department, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) dept_avg ON e1.department = dept_avg.department;

-- Fast: Window function
SELECT name, salary,
       AVG(salary) OVER (PARTITION BY department) AS avg_salary
FROM employees;

Best practice: Window functions are optimized for analytical queries. They're almost always faster and more readable than equivalent self-joins.

Key Takeaways

What you learned:

PARTITION BY divides data into groups for window function calculations ✅ Unlike GROUP BY, PARTITION BY preserves all rows ✅ Window functions calculate within each partition separately ✅ Combine with aggregate functions (SUM, AVG, COUNT, MIN, MAX) ✅ Combine with ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ✅ ORDER BY in OVER clause creates running/cumulative calculations ✅ Multiple window functions can use different partitions in same query ✅ Perfect for "top N per group" queries ✅ More efficient than self-joins for analytical queries ✅ Essential for business analytics and reporting

Common patterns:

  • Compare individual rows to group averages
  • Rank items within categories
  • Calculate running totals per group
  • Find top N per group
  • Calculate market share percentages
  • Identify above/below average performers

Coming up next: You'll learn about ACID properties - the fundamental principles that ensure database reliability and data integrity!

Practice Exercise: Try these queries:

  1. Show each employee with their department's total salary budget
  2. Rank products by price within each category
  3. Calculate running total of sales per region ordered by date
  4. Find the top 3 highest-paid employees in each department
  5. Show each sale with its percentage of regional total