Subqueries
Nest queries within queries for complex data retrieval.
Introduction to Subqueries
Sometimes you need to use the result of one query inside another query. For example:
- "Find employees who earn more than the average salary" (need to calculate average first)
- "Show products in the same category as the most expensive product"
- "List customers who have placed more orders than customer X"
You could run two separate queries, but subqueries let you combine them into one! A subquery is a query nested inside another query.
A SELECT statement embedded inside another SQL statement (SELECT, INSERT, UPDATE, DELETE, or WHERE clause). The subquery executes first, and its result is used by the outer query.
Basic structure:
SELECT column
FROM table
WHERE column = (SELECT column FROM another_table WHERE ...);
└─────────── subquery ──────────┘
Let's work with our familiar tables:
id | name | department | salary | hire_date |
|---|---|---|---|---|
| 1 | Alice | Engineering | 95000 | 2020-03-15 |
| 2 | Bob | Marketing | 75000 | 2021-07-22 |
| 3 | Charlie | Engineering | 85000 | 2019-11-03 |
| 4 | Diana | Sales | 70000 | 2022-01-10 |
| 5 | Eve | Marketing | 80000 | 2020-09-01 |
Scalar Subqueries - Return a Single Value
A scalar subquery returns exactly one value (one row, one column). Perfect for comparisons.
-- Find employees earning more than the average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);How this works:
- Subquery executes first:
SELECT AVG(salary) FROM employees→ returns 81000 - Outer query uses that value:
WHERE salary > 81000 - Results: Alice (95000) and Charlie (85000)
Another example:
-- Find the most expensive product's price
SELECT product_name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);Comparing to a specific value:
-- Find employees in the same department as Alice
SELECT name, department
FROM employees
WHERE department = (SELECT department FROM employees WHERE name = 'Alice');Important: If the subquery returns more than one row, you'll get an error!
WHERE department = (SELECT department FROM employees)
ERROR: Subquery returns more than 1 row
Use IN instead of = for multi-row subqueries.
Subqueries with IN - Matching Multiple Values
Use IN when the subquery returns multiple values:
-- Find employees in departments that have more than 1 employee
SELECT name, department
FROM employees
WHERE department IN (
SELECT department
FROM employees
GROUP BY department
HAVING COUNT(*) > 1
);Step by step:
- Subquery finds departments with >1 employee:
['Engineering', 'Marketing'] - Outer query filters:
WHERE department IN ('Engineering', 'Marketing')
Another example:
id | customer_id | order_date | total |
|---|---|---|---|
| 101 | 1 | 2024-01-15 | 500 |
| 102 | 2 | 2024-01-20 | 350 |
| 103 | 1 | 2024-02-01 | 750 |
-- Find customers who have placed orders
SELECT id, name
FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);NOT IN for exclusion:
-- Find customers who have NOT placed orders
SELECT id, name
FROM customers
WHERE id NOT IN (SELECT DISTINCT customer_id FROM orders);NOT IN with NULL: Be careful! If the subquery returns any NULL values, NOT IN will return zero results.
Better alternative: Use LEFT JOIN with IS NULL (covered in JOIN lessons).
Subqueries with EXISTS - Testing for Existence
EXISTS checks if a subquery returns any rows (doesn't care about the values, just whether rows exist).
-- Find customers who have placed at least one order
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
);How EXISTS works:
- For each customer, the subquery checks if any orders exist for that customer
- If the subquery returns any rows, EXISTS is TRUE
- If the subquery returns zero rows, EXISTS is FALSE
Note: We used SELECT 1 because EXISTS only cares if rows exist, not what columns are returned.
NOT EXISTS for negative checks:
-- Find customers who have NOT placed any orders
SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.id
);EXISTS vs IN:
- EXISTS is often faster for large datasets
- EXISTS stops as soon as it finds one match
- IN loads all results from subquery before checking
Subqueries in SELECT Clause
You can use subqueries in the SELECT clause to calculate values for each row:
-- Show each employee's salary vs department average
SELECT
name,
department,
salary,
(SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg_salary,
salary - (SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department) AS diff_from_avg
FROM employees e1;For each employee, the subquery calculates their department's average salary.
Another example:
-- Show customer with their order count
SELECT
c.name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.id) AS order_count
FROM customers c;Subqueries in FROM Clause (Derived Tables)
You can use a subquery as a "temporary table" in the FROM clause:
-- Find departments with average salary > 75000
SELECT dept_name, avg_salary
FROM (
SELECT
department AS dept_name,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) AS dept_averages
WHERE avg_salary > 75000;What happened:
- Subquery calculates average salary per department
- Result is treated as a temporary table called
dept_averages - Outer query filters to departments where avg > 75000
Note: You must give the derived table an alias (AS dept_averages)!
Another example - combining aggregates:
-- Show top 3 highest-paid employees per department
SELECT *
FROM (
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
) AS ranked_employees
WHERE dept_rank <= 3;Subqueries with ANY and ALL
ANY returns TRUE if the comparison is TRUE for any value returned by the subquery:
-- Find employees earning more than ANY Marketing employee
SELECT name, department, salary
FROM employees
WHERE salary > ANY (
SELECT salary
FROM employees
WHERE department = 'Marketing'
);Marketing salaries are [75000, 80000]. Alice, Charlie, and Eve all earn more than at least one Marketing employee (>75000).
ALL returns TRUE if the comparison is TRUE for all values:
-- Find employees earning more than ALL Marketing employees
SELECT name, department, salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE department = 'Marketing'
);Only Alice and Charlie earn more than ALL Marketing employees (>80000).
Comparison:
> ANY(list)means "greater than the minimum"> ALL(list)means "greater than the maximum"< ANY(list)means "less than the maximum"< ALL(list)means "less than the minimum"
Common Mistakes to Avoid
Don't make these mistakes:
-
Scalar subquery returns multiple rows: ❌
WHERE department = (SELECT department FROM employees)- Use IN instead:
WHERE department IN (SELECT ...)
-
Forgetting alias for derived table: ❌
FROM (SELECT ...) WHERE ...- Must add:
FROM (SELECT ...) AS alias WHERE ...
-
NOT IN with NULL values: ❌ Returns no results
- Subquery returns [1, 2, NULL]
NOT IN (1, 2, NULL)always returns no rows!- Use NOT EXISTS or LEFT JOIN IS NULL instead
-
Inefficient subquery in SELECT: ❌
- Subquery runs once PER ROW (can be slow)
- Consider using JOIN or window functions instead
-
Subquery references wrong table: ❌
- Correlated subquery must reference outer table correctly
- Use clear aliases: e1 for outer, e2 for inner
-
Using ORDER BY in subquery unnecessarily: ❌
- ORDER BY in subquery is ignored unless using LIMIT
- Order the outer query instead
-
Comparing NULL with IN: ❌
WHERE id IN (1, 2, NULL)might not work as expected- NULL comparisons need IS NULL
-
Nested subqueries too deep: ❌ Hard to read
- Multiple nested levels become confusing
- Consider using CTEs (Common Table Expressions) or JOIN instead
Practical Examples
Example 1: Find second-highest salary
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);Example 2: Customers with above-average order totals
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > (SELECT AVG(total) FROM orders);Example 3: Products never ordered
SELECT product_id, product_name
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
);Example 4: Employees hired in the same year as the company's first employee
SELECT name, hire_date
FROM employees
WHERE YEAR(hire_date) = (
SELECT YEAR(MIN(hire_date))
FROM employees
);Example 5: Departments with all employees earning > $70k
SELECT DISTINCT department
FROM employees e1
WHERE NOT EXISTS (
SELECT 1
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary <= 70000
);Subqueries vs JOINs
Many subqueries can be rewritten as JOINs. Which is better?
Aspect | Subquery | JOIN |
|---|---|---|
| Readability | Sometimes clearer for simple cases | Better for complex multi-table queries |
| Performance | Can be slower (runs for each row) | Usually faster (optimized by database) |
| Use case | Single value checks, EXISTS | Combining data from multiple tables |
| Best for | Filtering, scalar values | Retrieving columns from multiple tables |
Same result, different approach:
-- Subquery approach
SELECT name
FROM customers
WHERE id IN (SELECT customer_id FROM orders);
-- JOIN approach (often faster)
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;When to use subqueries:
- Checking against a single aggregated value (AVG, MAX, MIN)
- EXISTS/NOT EXISTS checks
- When you need a calculated value for each row
- When subquery logic is clearer than JOIN
When to use JOINs:
- Retrieving columns from multiple tables
- Better performance on large datasets
- More readable for complex multi-table queries
Key Takeaways
What you learned:
✅ Subquery is a query nested inside another query ✅ Scalar subqueries return one value (use with =, <, >, etc.) ✅ Use IN for subqueries returning multiple values ✅ Use EXISTS to check if subquery returns any rows ✅ Use NOT EXISTS for "doesn't exist" checks (better than NOT IN) ✅ Subqueries can appear in SELECT, FROM, WHERE, HAVING ✅ ANY and ALL for comparing to multiple values ✅ Derived tables (subquery in FROM) need an alias ✅ Often JOIN is faster than subquery (but both work)
Coming up next: We'll learn about Common Table Expressions (CTEs) - a cleaner, more powerful alternative to complex subqueries!
Practice Exercise: Try these queries:
- Find products more expensive than average
- Find departments where all employees earn > $75k
- Show customers with their total order count (using subquery in SELECT)
- Find employees earning less than their department average
- List products that have never been ordered (using NOT IN or NOT EXISTS)