GROUP BY

Group rows and perform aggregate calculations on each group.

16 min read
Beginner

Introduction to GROUP BY

In the last lesson, you learned to calculate aggregates across all rows. For example, "What's the average price of all products?"

But what if you want to answer questions like:

  • "What's the average price per category?"
  • "How many products do we have in each category?"
  • "What's the total salary per department?"

You need to group rows by category/department and then calculate aggregates for each group. That's what GROUP BY does!

Groups rows that have the same values in specified columns into summary rows. Often used with aggregate functions (COUNT, SUM, AVG, MIN, MAX) to perform calculations on each group.

Think of it like:

  • Sorting your laundry by color (white pile, dark pile, colors pile)
  • Then counting items in each pile separately

Basic Syntax:

SELECT column, aggregate_function(column)
FROM table_name
GROUP BY column;

Let's work with our products table:

products
id
product_name
category
price
stock
1LaptopElectronics999.9945
2MouseElectronics29.99120
3Desk ChairFurniture199.9930
4MonitorElectronics349.9960
5Desk LampFurniture39.9985
6KeyboardElectronics79.9995
7BookshelfFurniture159.9920

Grouping by a Single Column

Let's answer: "How many products are in each category?"

sql
-- Count products per category
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

What happened:

  1. SQL grouped all rows by category
  2. Electronics group: Laptop, Mouse, Monitor, Keyboard (4 products)
  3. Furniture group: Desk Chair, Desk Lamp, Bookshelf (3 products)
  4. COUNT(*) calculated for each group separately

Without GROUP BY, you'd get one total:

sql
-- This gives total across ALL products
SELECT COUNT(*) AS total_products
FROM products;

GROUP BY splits this into separate counts per category!

GROUP BY with Different Aggregates

You can use any aggregate function with GROUP BY.

Average price per category:

sql
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;

Total stock per category:

sql
SELECT category, SUM(stock) AS total_stock
FROM products
GROUP BY category;

Electronics has 45 + 120 + 60 + 95 = 320 units in stock. Furniture has 30 + 85 + 20 = 135 units.

Price range per category:

sql
SELECT
  category,
  MIN(price) AS cheapest,
  MAX(price) AS most_expensive,
  MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;

Multiple Aggregates per Group

You can calculate several statistics for each group in one query:

sql
-- Complete category summary
SELECT
  category,
  COUNT(*) AS product_count,
  SUM(stock) AS total_stock,
  ROUND(AVG(price), 2) AS avg_price,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM products
GROUP BY category;

Perfect for dashboards and reports!

GROUP BY with WHERE Clause

You can filter rows before grouping with WHERE. Remember the order:

  1. FROM - get data from table
  2. WHERE - filter individual rows
  3. GROUP BY - group the filtered rows
  4. Aggregate functions - calculate on each group
sql
-- Average price per category, only for products with stock > 50
SELECT
  category,
  COUNT(*) AS product_count,
  ROUND(AVG(price), 2) AS avg_price
FROM products
WHERE stock > 50
GROUP BY category;

What happened:

  1. WHERE filtered to only products with stock > 50 (Mouse, Monitor, Keyboard, Desk Lamp)
  2. These 4 products were grouped by category
  3. Aggregates calculated for each filtered group

Another example:

sql
-- Count expensive products (>$100) per category
SELECT
  category,
  COUNT(*) AS expensive_products
FROM products
WHERE price > 100
GROUP BY category;

Grouping by Multiple Columns

You can group by more than one column to create finer groupings.

employees
id
name
department
location
salary
1AliceEngineeringNew York95000
2BobEngineeringNew York85000
3CharlieEngineeringLondon90000
4DianaSalesNew York70000
5EveSalesLondon75000
6FrankSalesLondon72000
7GraceMarketingNew York80000
sql
-- Count employees by department AND location
SELECT
  department,
  location,
  COUNT(*) AS employee_count,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department, location
ORDER BY department, location;

Each unique combination of (department, location) forms a separate group:

  • Engineering + London: 1 employee
  • Engineering + New York: 2 employees
  • Marketing + New York: 1 employee
  • Sales + London: 2 employees
  • Sales + New York: 1 employee

Another example:

sales
id
product_id
region
quarter
revenue
1101NorthQ15000
2101NorthQ26000
3101SouthQ14500
4102NorthQ13000
5102SouthQ23500
sql
-- Revenue by region and quarter
SELECT
  region,
  quarter,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, quarter
ORDER BY region, quarter;

GROUP BY with ORDER BY

You can sort groups using ORDER BY. This comes after GROUP BY.

Order of clauses:

SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ...
sql
-- Categories sorted by product count (descending)
SELECT
  category,
  COUNT(*) AS product_count,
  ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
ORDER BY product_count DESC;

Sort by aggregate result:

sql
-- Departments sorted by average salary (highest first)
SELECT
  department,
  COUNT(*) AS employees,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Engineering has the highest average salary!

The Golden Rule of GROUP BY

Critical rule: Every column in SELECT must be either:

  1. Listed in GROUP BY, OR
  2. Inside an aggregate function (COUNT, SUM, AVG, MIN, MAX)

This is WRONG:

SELECT category, product_name, COUNT(*)
FROM products
GROUP BY category;

Error: product_name is not in GROUP BY and not in an aggregate function. Which product_name should SQL return for each category? There are multiple products per category!

This is CORRECT:

sql
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

category is in GROUP BY, COUNT(*) is an aggregate. ✓

Another correct example:

sql
-- Multiple non-aggregate columns must ALL be in GROUP BY
SELECT category, location, COUNT(*) AS count
FROM inventory
GROUP BY category, location;

Both category and location are in GROUP BY. ✓

Common Mistakes to Avoid

Don't make these mistakes:

  1. Forgetting column in GROUP BY: ❌

    • SELECT category, location, COUNT(*) FROM products GROUP BY category
    • If you SELECT it, you must GROUP BY it (unless it's an aggregate)
  2. Using WHERE to filter aggregates: ❌

    • Wrong: WHERE COUNT(*) > 5
    • WHERE filters rows BEFORE grouping
    • Use HAVING instead (next lesson!)
  3. Grouping by alias: ❌ (doesn't work in some databases)

    • SELECT category AS cat FROM products GROUP BY cat
    • Use the actual column name: GROUP BY category
  4. Confusing WHERE and HAVING: ❌

    • WHERE filters individual rows before grouping
    • HAVING filters groups after aggregation
  5. Not using aggregate with GROUP BY: ❌

    • SELECT category FROM products GROUP BY category
    • This works but is pointless - use DISTINCT instead!
    • SELECT DISTINCT category FROM products is clearer
  6. ORDER BY column not in SELECT: Works but confusing

    • SELECT category, COUNT(*) FROM products GROUP BY category ORDER BY AVG(price)
    • Readers can't see what you're sorting by
  7. Grouping without aggregates: ❌ Wasteful

    • SELECT category FROM products GROUP BY category
    • Just use: SELECT DISTINCT category FROM products

Practical Examples

Example 1: Sales report by product

order_items
order_id
product_id
quantity
price
1101250.00
1102130.00
2101150.00
3103320.00
4101250.00
sql
-- Total sales per product
SELECT
  product_id,
  COUNT(*) AS times_sold,
  SUM(quantity) AS total_quantity,
  SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;

Example 2: Department headcount and payroll

sql
SELECT
  department,
  COUNT(*) AS headcount,
  ROUND(AVG(salary), 2) AS avg_salary,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary,
  SUM(salary) AS total_payroll
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;

Example 3: Monthly user registrations

users
id
username
signup_date
1alice2024-01-05
2bob2024-01-15
3charlie2024-02-03
4diana2024-02-10
5eve2024-02-20
sql
-- Registrations per month
SELECT
  DATE_FORMAT(signup_date, '%Y-%m') AS month,
  COUNT(*) AS new_users
FROM users
GROUP BY DATE_FORMAT(signup_date, '%Y-%m')
ORDER BY month;

GROUP BY vs DISTINCT

If you just want unique values without aggregates, use DISTINCT instead of GROUP BY:

sql
-- These are equivalent (but DISTINCT is clearer)
SELECT DISTINCT category FROM products;

SELECT category FROM products GROUP BY category;

Use DISTINCT when you just want unique values. Use GROUP BY when you want to calculate aggregates for each group.

Key Takeaways

What you learned:

GROUP BY groups rows with the same values together ✅ Use with aggregate functions to calculate stats per group ✅ Every column in SELECT must be in GROUP BY or in an aggregate ✅ WHERE filters rows BEFORE grouping, aggregates happen after ✅ You can group by multiple columns for finer groupings ✅ Use ORDER BY after GROUP BY to sort groups ✅ Execution order: FROM → WHERE → GROUP BY → aggregates → ORDER BY ✅ Use DISTINCT instead of GROUP BY if you don't need aggregates

Coming up next: We'll learn about HAVING to filter groups based on aggregate results (e.g., "show categories with more than 5 products")!

Practice Exercise: Try these queries:

  1. Count how many employees are in each department
  2. Find the average salary per department
  3. Calculate total stock per category, sorted by total stock DESC
  4. Count products per category, only for products with price > 50
  5. Find the minimum and maximum salary per department and location