Kekkei - Advancing Education For Everyone

Introduction to GROUP BY

In the last lesson, you learned to calculate aggregates across all rows. For example, "What's the average price of all products?"

But what if you want to answer questions like:

"What's the average price per category?"
"How many products do we have in each category?"
"What's the total salary per department?"

You need to group rows by category/department and then calculate aggregates for each group. That's what GROUP BY does!

Groups rows that have the same values in specified columns into summary rows. Often used with aggregate functions (COUNT, SUM, AVG, MIN, MAX) to perform calculations on each group.

Think of it like:

Sorting your laundry by color (white pile, dark pile, colors pile)
Then counting items in each pile separately

Basic Syntax:

SELECT column, aggregate_function(column)
FROM table_name
GROUP BY column;

Let's work with our products table:

products

id	product_name	category	price	stock
1	Laptop	Electronics	999.99	45
2	Mouse	Electronics	29.99	120
3	Desk Chair	Furniture	199.99	30
4	Monitor	Electronics	349.99	60
5	Desk Lamp	Furniture	39.99	85
6	Keyboard	Electronics	79.99	95
7	Bookshelf	Furniture	159.99	20

Grouping by a Single Column

Let's answer: "How many products are in each category?"

sql

-- Count products per category
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

What happened:

SQL grouped all rows by category
Electronics group: Laptop, Mouse, Monitor, Keyboard (4 products)
Furniture group: Desk Chair, Desk Lamp, Bookshelf (3 products)
COUNT(*) calculated for each group separately

Without GROUP BY, you'd get one total:

sql

-- This gives total across ALL products
SELECT COUNT(*) AS total_products
FROM products;

GROUP BY splits this into separate counts per category!

GROUP BY with Different Aggregates

You can use any aggregate function with GROUP BY.

Average price per category:

sql

SELECT category, ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category;

Total stock per category:

sql

SELECT category, SUM(stock) AS total_stock
FROM products
GROUP BY category;

Electronics has 45 + 120 + 60 + 95 = 320 units in stock. Furniture has 30 + 85 + 20 = 135 units.

Price range per category:

sql

SELECT
  category,
  MIN(price) AS cheapest,
  MAX(price) AS most_expensive,
  MAX(price) - MIN(price) AS price_range
FROM products
GROUP BY category;

Multiple Aggregates per Group

You can calculate several statistics for each group in one query:

sql

-- Complete category summary
SELECT
  category,
  COUNT(*) AS product_count,
  SUM(stock) AS total_stock,
  ROUND(AVG(price), 2) AS avg_price,
  MIN(price) AS min_price,
  MAX(price) AS max_price
FROM products
GROUP BY category;

Perfect for dashboards and reports!

GROUP BY with WHERE Clause

You can filter rows before grouping with WHERE. Remember the order:

FROM - get data from table
WHERE - filter individual rows
GROUP BY - group the filtered rows
Aggregate functions - calculate on each group

sql

-- Average price per category, only for products with stock > 50
SELECT
  category,
  COUNT(*) AS product_count,
  ROUND(AVG(price), 2) AS avg_price
FROM products
WHERE stock > 50
GROUP BY category;

What happened:

WHERE filtered to only products with stock > 50 (Mouse, Monitor, Keyboard, Desk Lamp)
These 4 products were grouped by category
Aggregates calculated for each filtered group

Another example:

sql

-- Count expensive products (>$100) per category
SELECT
  category,
  COUNT(*) AS expensive_products
FROM products
WHERE price > 100
GROUP BY category;

Grouping by Multiple Columns

You can group by more than one column to create finer groupings.

employees

id	name	department	location	salary
1	Alice	Engineering	New York	95000
2	Bob	Engineering	New York	85000
3	Charlie	Engineering	London	90000
4	Diana	Sales	New York	70000
5	Eve	Sales	London	75000
6	Frank	Sales	London	72000
7	Grace	Marketing	New York	80000

sql

-- Count employees by department AND location
SELECT
  department,
  location,
  COUNT(*) AS employee_count,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department, location
ORDER BY department, location;

Each unique combination of (department, location) forms a separate group:

Engineering + London: 1 employee
Engineering + New York: 2 employees
Marketing + New York: 1 employee
Sales + London: 2 employees
Sales + New York: 1 employee

Another example:

sales

id	product_id	region	quarter	revenue
1	101	North	Q1	5000
2	101	North	Q2	6000
3	101	South	Q1	4500
4	102	North	Q1	3000
5	102	South	Q2	3500

sql

-- Revenue by region and quarter
SELECT
  region,
  quarter,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY region, quarter
ORDER BY region, quarter;

GROUP BY with ORDER BY

You can sort groups using ORDER BY. This comes after GROUP BY.

Order of clauses:

SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ...

sql

-- Categories sorted by product count (descending)
SELECT
  category,
  COUNT(*) AS product_count,
  ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
ORDER BY product_count DESC;

Sort by aggregate result:

sql

-- Departments sorted by average salary (highest first)
SELECT
  department,
  COUNT(*) AS employees,
  ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Engineering has the highest average salary!

The Golden Rule of GROUP BY

Critical rule: Every column in SELECT must be either:

Listed in GROUP BY, OR
Inside an aggregate function (COUNT, SUM, AVG, MIN, MAX)

This is WRONG:

SELECT category, product_name, COUNT(*)
FROM products
GROUP BY category;

Error: product_name is not in GROUP BY and not in an aggregate function. Which product_name should SQL return for each category? There are multiple products per category!

This is CORRECT:

sql

SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

category is in GROUP BY, COUNT(*) is an aggregate. ✓

Another correct example:

sql

-- Multiple non-aggregate columns must ALL be in GROUP BY
SELECT category, location, COUNT(*) AS count
FROM inventory
GROUP BY category, location;

Both category and location are in GROUP BY. ✓

Common Mistakes to Avoid

Don't make these mistakes:

Forgetting column in GROUP BY: ❌
- SELECT category, location, COUNT(*) FROM products GROUP BY category
- If you SELECT it, you must GROUP BY it (unless it's an aggregate)
Using WHERE to filter aggregates: ❌
- Wrong: WHERE COUNT(*) > 5
- WHERE filters rows BEFORE grouping
- Use HAVING instead (next lesson!)
Grouping by alias: ❌ (doesn't work in some databases)
- SELECT category AS cat FROM products GROUP BY cat
- Use the actual column name: GROUP BY category
Confusing WHERE and HAVING: ❌
- WHERE filters individual rows before grouping
- HAVING filters groups after aggregation
Not using aggregate with GROUP BY: ❌
- SELECT category FROM products GROUP BY category
- This works but is pointless - use DISTINCT instead!
- SELECT DISTINCT category FROM products is clearer
ORDER BY column not in SELECT: Works but confusing
- SELECT category, COUNT(*) FROM products GROUP BY category ORDER BY AVG(price)
- Readers can't see what you're sorting by
Grouping without aggregates: ❌ Wasteful
- SELECT category FROM products GROUP BY category
- Just use: SELECT DISTINCT category FROM products

Practical Examples

Example 1: Sales report by product

order_items

order_id	product_id	quantity	price
1	101	2	50.00
1	102	1	30.00
2	101	1	50.00
3	103	3	20.00
4	101	2	50.00

sql

-- Total sales per product
SELECT
  product_id,
  COUNT(*) AS times_sold,
  SUM(quantity) AS total_quantity,
  SUM(quantity * price) AS total_revenue
FROM order_items
GROUP BY product_id
ORDER BY total_revenue DESC;

Example 2: Department headcount and payroll

sql

SELECT
  department,
  COUNT(*) AS headcount,
  ROUND(AVG(salary), 2) AS avg_salary,
  MIN(salary) AS min_salary,
  MAX(salary) AS max_salary,
  SUM(salary) AS total_payroll
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;

Example 3: Monthly user registrations

users

id	username	signup_date
1	alice	2024-01-05
2	bob	2024-01-15
3	charlie	2024-02-03
4	diana	2024-02-10
5	eve	2024-02-20

sql

-- Registrations per month
SELECT
  DATE_FORMAT(signup_date, '%Y-%m') AS month,
  COUNT(*) AS new_users
FROM users
GROUP BY DATE_FORMAT(signup_date, '%Y-%m')
ORDER BY month;

GROUP BY vs DISTINCT

If you just want unique values without aggregates, use DISTINCT instead of GROUP BY:

sql

-- These are equivalent (but DISTINCT is clearer)
SELECT DISTINCT category FROM products;

SELECT category FROM products GROUP BY category;

Use DISTINCT when you just want unique values. Use GROUP BY when you want to calculate aggregates for each group.

Key Takeaways

What you learned:

✅ GROUP BY groups rows with the same values together ✅ Use with aggregate functions to calculate stats per group ✅ Every column in SELECT must be in GROUP BY or in an aggregate ✅ WHERE filters rows BEFORE grouping, aggregates happen after ✅ You can group by multiple columns for finer groupings ✅ Use ORDER BY after GROUP BY to sort groups ✅ Execution order: FROM → WHERE → GROUP BY → aggregates → ORDER BY ✅ Use DISTINCT instead of GROUP BY if you don't need aggregates

Coming up next: We'll learn about HAVING to filter groups based on aggregate results (e.g., "show categories with more than 5 products")!

Practice Exercise: Try these queries:

Count how many employees are in each department
Find the average salary per department
Calculate total stock per category, sorted by total stock DESC
Count products per category, only for products with price > 50
Find the minimum and maximum salary per department and location

Introduction to GROUP BYFocusStart Focus Mode

Grouping by a Single ColumnFocusStart Focus Mode

GROUP BY with Different AggregatesFocusStart Focus Mode

Multiple Aggregates per GroupFocusStart Focus Mode

GROUP BY with WHERE ClauseFocusStart Focus Mode

Grouping by Multiple ColumnsFocusStart Focus Mode

GROUP BY with ORDER BYFocusStart Focus Mode

The Golden Rule of GROUP BYFocusStart Focus Mode

Common Mistakes to AvoidFocusStart Focus Mode

Practical ExamplesFocusStart Focus Mode

GROUP BY vs DISTINCTFocusStart Focus Mode

Key TakeawaysFocusStart Focus Mode