GROUP BY
Group rows and perform aggregate calculations on each group.
Introduction to GROUP BY
In the last lesson, you learned to calculate aggregates across all rows. For example, "What's the average price of all products?"
But what if you want to answer questions like:
- "What's the average price per category?"
- "How many products do we have in each category?"
- "What's the total salary per department?"
You need to group rows by category/department and then calculate aggregates for each group. That's what GROUP BY does!
Groups rows that have the same values in specified columns into summary rows. Often used with aggregate functions (COUNT, SUM, AVG, MIN, MAX) to perform calculations on each group.
Think of it like:
- Sorting your laundry by color (white pile, dark pile, colors pile)
- Then counting items in each pile separately
Basic Syntax:
SELECT column, aggregate_function(column)
FROM table_name
GROUP BY column;
Let's work with our products table:
id | product_name | category | price | stock |
|---|---|---|---|---|
| 1 | Laptop | Electronics | 999.99 | 45 |
| 2 | Mouse | Electronics | 29.99 | 120 |
| 3 | Desk Chair | Furniture | 199.99 | 30 |
| 4 | Monitor | Electronics | 349.99 | 60 |
| 5 | Desk Lamp | Furniture | 39.99 | 85 |
| 6 | Keyboard | Electronics | 79.99 | 95 |
| 7 | Bookshelf | Furniture | 159.99 | 20 |
Grouping by a Single Column
Let's answer: "How many products are in each category?"
-- Count products per category
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;What happened:
- SQL grouped all rows by
category - Electronics group: Laptop, Mouse, Monitor, Keyboard (4 products)
- Furniture group: Desk Chair, Desk Lamp, Bookshelf (3 products)
- COUNT(*) calculated for each group separately
Without GROUP BY, you'd get one total:
-- This gives total across ALL products
SELECT COUNT(*) AS total_products
FROM products;GROUP BY splits this into separate counts per category!
GROUP BY with Different Aggregates
You can use any aggregate function with GROUP BY.
Average price per category:
SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;Total stock per category:
SELECT category, SUM(stock) AS total_stock
FROM products
GROUP BY category;Electronics has 45 + 120 + 60 + 95 = 320 units in stock. Furniture has 30 + 85 + 20 = 135 units.
Price range per category:
SELECT
category,
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;Multiple Aggregates per Group
You can calculate several statistics for each group in one query:
-- Complete category summary
SELECT
category,
COUNT(*) AS product_count,
SUM(stock) AS total_stock,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM products
GROUP BY category;Perfect for dashboards and reports!
GROUP BY with WHERE Clause
You can filter rows before grouping with WHERE. Remember the order:
- FROM - get data from table
- WHERE - filter individual rows
- GROUP BY - group the filtered rows
- Aggregate functions - calculate on each group
-- Average price per category, only for products with stock > 50
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
WHERE stock > 50
GROUP BY category;What happened:
- WHERE filtered to only products with stock > 50 (Mouse, Monitor, Keyboard, Desk Lamp)
- These 4 products were grouped by category
- Aggregates calculated for each filtered group
Another example:
-- Count expensive products (>$100) per category
SELECT
category,
COUNT(*) AS expensive_products
FROM products
WHERE price > 100
GROUP BY category;Grouping by Multiple Columns
You can group by more than one column to create finer groupings.
id | name | department | location | salary |
|---|---|---|---|---|
| 1 | Alice | Engineering | New York | 95000 |
| 2 | Bob | Engineering | New York | 85000 |
| 3 | Charlie | Engineering | London | 90000 |
| 4 | Diana | Sales | New York | 70000 |
| 5 | Eve | Sales | London | 75000 |
| 6 | Frank | Sales | London | 72000 |
| 7 | Grace | Marketing | New York | 80000 |
-- Count employees by department AND location
SELECT
department,
location,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department, location
ORDER BY department, location;Each unique combination of (department, location) forms a separate group:
- Engineering + London: 1 employee
- Engineering + New York: 2 employees
- Marketing + New York: 1 employee
- Sales + London: 2 employees
- Sales + New York: 1 employee
Another example:
id | product_id | region | quarter | revenue |
|---|---|---|---|---|
| 1 | 101 | North | Q1 | 5000 |
| 2 | 101 | North | Q2 | 6000 |
| 3 | 101 | South | Q1 | 4500 |
| 4 | 102 | North | Q1 | 3000 |
| 5 | 102 | South | Q2 | 3500 |
-- Revenue by region and quarter
SELECT
region,
quarter,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, quarter
ORDER BY region, quarter;GROUP BY with ORDER BY
You can sort groups using ORDER BY. This comes after GROUP BY.
Order of clauses:
SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ...
-- Categories sorted by product count (descending)
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
ORDER BY product_count DESC;Sort by aggregate result:
-- Departments sorted by average salary (highest first)
SELECT
department,
COUNT(*) AS employees,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;Engineering has the highest average salary!
The Golden Rule of GROUP BY
Critical rule: Every column in SELECT must be either:
- Listed in GROUP BY, OR
- Inside an aggregate function (COUNT, SUM, AVG, MIN, MAX)
This is WRONG:
SELECT category, product_name, COUNT(*)
FROM products
GROUP BY category;
Error: product_name is not in GROUP BY and not in an aggregate function. Which product_name should SQL return for each category? There are multiple products per category!
This is CORRECT:
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;category is in GROUP BY, COUNT(*) is an aggregate. ✓
Another correct example:
-- Multiple non-aggregate columns must ALL be in GROUP BY
SELECT category, location, COUNT(*) AS count
FROM inventory
GROUP BY category, location;Both category and location are in GROUP BY. ✓
Common Mistakes to Avoid
Don't make these mistakes:
-
Forgetting column in GROUP BY: ❌
SELECT category, location, COUNT(*) FROM products GROUP BY category- If you SELECT it, you must GROUP BY it (unless it's an aggregate)
-
Using WHERE to filter aggregates: ❌
- Wrong:
WHERE COUNT(*) > 5 - WHERE filters rows BEFORE grouping
- Use HAVING instead (next lesson!)
- Wrong:
-
Grouping by alias: ❌ (doesn't work in some databases)
SELECT category AS cat FROM products GROUP BY cat- Use the actual column name:
GROUP BY category
-
Confusing WHERE and HAVING: ❌
- WHERE filters individual rows before grouping
- HAVING filters groups after aggregation
-
Not using aggregate with GROUP BY: ❌
SELECT category FROM products GROUP BY category- This works but is pointless - use DISTINCT instead!
SELECT DISTINCT category FROM productsis clearer
-
ORDER BY column not in SELECT: Works but confusing
SELECT category, COUNT(*) FROM products GROUP BY category ORDER BY AVG(price)- Readers can't see what you're sorting by
-
Grouping without aggregates: ❌ Wasteful
SELECT category FROM products GROUP BY category- Just use:
SELECT DISTINCT category FROM products
Practical Examples
Example 1: Sales report by product
order_id | product_id | quantity | price |
|---|---|---|---|
| 1 | 101 | 2 | 50.00 |
| 1 | 102 | 1 | 30.00 |
| 2 | 101 | 1 | 50.00 |
| 3 | 103 | 3 | 20.00 |
| 4 | 101 | 2 | 50.00 |
-- Total sales per product
SELECT
product_id,
COUNT(*) AS times_sold,
SUM(quantity) AS total_quantity,
SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;Example 2: Department headcount and payroll
SELECT
department,
COUNT(*) AS headcount,
ROUND(AVG(salary), 2) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;Example 3: Monthly user registrations
id | username | signup_date |
|---|---|---|
| 1 | alice | 2024-01-05 |
| 2 | bob | 2024-01-15 |
| 3 | charlie | 2024-02-03 |
| 4 | diana | 2024-02-10 |
| 5 | eve | 2024-02-20 |
-- Registrations per month
SELECT
DATE_FORMAT(signup_date, '%Y-%m') AS month,
COUNT(*) AS new_users
FROM users
GROUP BY DATE_FORMAT(signup_date, '%Y-%m')
ORDER BY month;GROUP BY vs DISTINCT
If you just want unique values without aggregates, use DISTINCT instead of GROUP BY:
-- These are equivalent (but DISTINCT is clearer)
SELECT DISTINCT category FROM products;
SELECT category FROM products GROUP BY category;Use DISTINCT when you just want unique values. Use GROUP BY when you want to calculate aggregates for each group.
Key Takeaways
What you learned:
✅ GROUP BY groups rows with the same values together ✅ Use with aggregate functions to calculate stats per group ✅ Every column in SELECT must be in GROUP BY or in an aggregate ✅ WHERE filters rows BEFORE grouping, aggregates happen after ✅ You can group by multiple columns for finer groupings ✅ Use ORDER BY after GROUP BY to sort groups ✅ Execution order: FROM → WHERE → GROUP BY → aggregates → ORDER BY ✅ Use DISTINCT instead of GROUP BY if you don't need aggregates
Coming up next: We'll learn about HAVING to filter groups based on aggregate results (e.g., "show categories with more than 5 products")!
Practice Exercise: Try these queries:
- Count how many employees are in each department
- Find the average salary per department
- Calculate total stock per category, sorted by total stock DESC
- Count products per category, only for products with price > 50
- Find the minimum and maximum salary per department and location