Subqueries

Nest queries within queries for complex data retrieval.

18 min read
Beginner

Introduction to Subqueries

Sometimes you need to use the result of one query inside another query. For example:

  • "Find employees who earn more than the average salary" (need to calculate average first)
  • "Show products in the same category as the most expensive product"
  • "List customers who have placed more orders than customer X"

You could run two separate queries, but subqueries let you combine them into one! A subquery is a query nested inside another query.

A SELECT statement embedded inside another SQL statement (SELECT, INSERT, UPDATE, DELETE, or WHERE clause). The subquery executes first, and its result is used by the outer query.

Basic structure:

SELECT column
FROM table
WHERE column = (SELECT column FROM another_table WHERE ...);
                 └─────────── subquery ──────────┘

Let's work with our familiar tables:

employees
id
name
department
salary
hire_date
1AliceEngineering950002020-03-15
2BobMarketing750002021-07-22
3CharlieEngineering850002019-11-03
4DianaSales700002022-01-10
5EveMarketing800002020-09-01

Scalar Subqueries - Return a Single Value

A scalar subquery returns exactly one value (one row, one column). Perfect for comparisons.

sql
-- Find employees earning more than the average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

How this works:

  1. Subquery executes first: SELECT AVG(salary) FROM employees → returns 81000
  2. Outer query uses that value: WHERE salary > 81000
  3. Results: Alice (95000) and Charlie (85000)

Another example:

sql
-- Find the most expensive product's price
SELECT product_name, price
FROM products
WHERE price = (SELECT MAX(price) FROM products);

Comparing to a specific value:

sql
-- Find employees in the same department as Alice
SELECT name, department
FROM employees
WHERE department = (SELECT department FROM employees WHERE name = 'Alice');

Important: If the subquery returns more than one row, you'll get an error!

WHERE department = (SELECT department FROM employees)

ERROR: Subquery returns more than 1 row

Use IN instead of = for multi-row subqueries.

Subqueries with IN - Matching Multiple Values

Use IN when the subquery returns multiple values:

sql
-- Find employees in departments that have more than 1 employee
SELECT name, department
FROM employees
WHERE department IN (
  SELECT department
  FROM employees
  GROUP BY department
  HAVING COUNT(*) > 1
);

Step by step:

  1. Subquery finds departments with >1 employee: ['Engineering', 'Marketing']
  2. Outer query filters: WHERE department IN ('Engineering', 'Marketing')

Another example:

orders
id
customer_id
order_date
total
10112024-01-15500
10222024-01-20350
10312024-02-01750
sql
-- Find customers who have placed orders
SELECT id, name
FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);

NOT IN for exclusion:

sql
-- Find customers who have NOT placed orders
SELECT id, name
FROM customers
WHERE id NOT IN (SELECT DISTINCT customer_id FROM orders);

NOT IN with NULL: Be careful! If the subquery returns any NULL values, NOT IN will return zero results.

Better alternative: Use LEFT JOIN with IS NULL (covered in JOIN lessons).

Subqueries with EXISTS - Testing for Existence

EXISTS checks if a subquery returns any rows (doesn't care about the values, just whether rows exist).

sql
-- Find customers who have placed at least one order
SELECT c.id, c.name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.id
);

How EXISTS works:

  1. For each customer, the subquery checks if any orders exist for that customer
  2. If the subquery returns any rows, EXISTS is TRUE
  3. If the subquery returns zero rows, EXISTS is FALSE

Note: We used SELECT 1 because EXISTS only cares if rows exist, not what columns are returned.

NOT EXISTS for negative checks:

sql
-- Find customers who have NOT placed any orders
SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.id
);

EXISTS vs IN:

  • EXISTS is often faster for large datasets
  • EXISTS stops as soon as it finds one match
  • IN loads all results from subquery before checking

Subqueries in SELECT Clause

You can use subqueries in the SELECT clause to calculate values for each row:

sql
-- Show each employee's salary vs department average
SELECT
  name,
  department,
  salary,
  (SELECT AVG(salary)
   FROM employees e2
   WHERE e2.department = e1.department) AS dept_avg_salary,
  salary - (SELECT AVG(salary)
            FROM employees e2
            WHERE e2.department = e1.department) AS diff_from_avg
FROM employees e1;

For each employee, the subquery calculates their department's average salary.

Another example:

sql
-- Show customer with their order count
SELECT
  c.name,
  (SELECT COUNT(*)
   FROM orders o
   WHERE o.customer_id = c.id) AS order_count
FROM customers c;

Subqueries in FROM Clause (Derived Tables)

You can use a subquery as a "temporary table" in the FROM clause:

sql
-- Find departments with average salary > 75000
SELECT dept_name, avg_salary
FROM (
  SELECT
    department AS dept_name,
    AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 75000;

What happened:

  1. Subquery calculates average salary per department
  2. Result is treated as a temporary table called dept_averages
  3. Outer query filters to departments where avg > 75000

Note: You must give the derived table an alias (AS dept_averages)!

Another example - combining aggregates:

sql
-- Show top 3 highest-paid employees per department
SELECT *
FROM (
  SELECT
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
  FROM employees
) AS ranked_employees
WHERE dept_rank <= 3;

Subqueries with ANY and ALL

ANY returns TRUE if the comparison is TRUE for any value returned by the subquery:

sql
-- Find employees earning more than ANY Marketing employee
SELECT name, department, salary
FROM employees
WHERE salary > ANY (
  SELECT salary
  FROM employees
  WHERE department = 'Marketing'
);

Marketing salaries are [75000, 80000]. Alice, Charlie, and Eve all earn more than at least one Marketing employee (>75000).

ALL returns TRUE if the comparison is TRUE for all values:

sql
-- Find employees earning more than ALL Marketing employees
SELECT name, department, salary
FROM employees
WHERE salary > ALL (
  SELECT salary
  FROM employees
  WHERE department = 'Marketing'
);

Only Alice and Charlie earn more than ALL Marketing employees (>80000).

Comparison:

  • > ANY(list) means "greater than the minimum"
  • > ALL(list) means "greater than the maximum"
  • < ANY(list) means "less than the maximum"
  • < ALL(list) means "less than the minimum"

Common Mistakes to Avoid

Don't make these mistakes:

  1. Scalar subquery returns multiple rows: ❌

    • WHERE department = (SELECT department FROM employees)
    • Use IN instead: WHERE department IN (SELECT ...)
  2. Forgetting alias for derived table: ❌

    • FROM (SELECT ...) WHERE ...
    • Must add: FROM (SELECT ...) AS alias WHERE ...
  3. NOT IN with NULL values: ❌ Returns no results

    • Subquery returns [1, 2, NULL]
    • NOT IN (1, 2, NULL) always returns no rows!
    • Use NOT EXISTS or LEFT JOIN IS NULL instead
  4. Inefficient subquery in SELECT: ❌

    • Subquery runs once PER ROW (can be slow)
    • Consider using JOIN or window functions instead
  5. Subquery references wrong table: ❌

    • Correlated subquery must reference outer table correctly
    • Use clear aliases: e1 for outer, e2 for inner
  6. Using ORDER BY in subquery unnecessarily: ❌

    • ORDER BY in subquery is ignored unless using LIMIT
    • Order the outer query instead
  7. Comparing NULL with IN: ❌

    • WHERE id IN (1, 2, NULL) might not work as expected
    • NULL comparisons need IS NULL
  8. Nested subqueries too deep: ❌ Hard to read

    • Multiple nested levels become confusing
    • Consider using CTEs (Common Table Expressions) or JOIN instead

Practical Examples

Example 1: Find second-highest salary

sql
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Example 2: Customers with above-average order totals

sql
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > (SELECT AVG(total) FROM orders);

Example 3: Products never ordered

sql
SELECT product_id, product_name
FROM products
WHERE product_id NOT IN (
  SELECT DISTINCT product_id
  FROM order_items
);

Example 4: Employees hired in the same year as the company's first employee

sql
SELECT name, hire_date
FROM employees
WHERE YEAR(hire_date) = (
  SELECT YEAR(MIN(hire_date))
  FROM employees
);

Example 5: Departments with all employees earning > $70k

sql
SELECT DISTINCT department
FROM employees e1
WHERE NOT EXISTS (
  SELECT 1
  FROM employees e2
  WHERE e2.department = e1.department
    AND e2.salary <= 70000
);

Subqueries vs JOINs

Many subqueries can be rewritten as JOINs. Which is better?

Subqueries vs JOINs
Aspect
Subquery
JOIN
ReadabilitySometimes clearer for simple casesBetter for complex multi-table queries
PerformanceCan be slower (runs for each row)Usually faster (optimized by database)
Use caseSingle value checks, EXISTSCombining data from multiple tables
Best forFiltering, scalar valuesRetrieving columns from multiple tables

Same result, different approach:

sql
-- Subquery approach
SELECT name
FROM customers
WHERE id IN (SELECT customer_id FROM orders);

-- JOIN approach (often faster)
SELECT DISTINCT c.name
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;

When to use subqueries:

  • Checking against a single aggregated value (AVG, MAX, MIN)
  • EXISTS/NOT EXISTS checks
  • When you need a calculated value for each row
  • When subquery logic is clearer than JOIN

When to use JOINs:

  • Retrieving columns from multiple tables
  • Better performance on large datasets
  • More readable for complex multi-table queries

Key Takeaways

What you learned:

Subquery is a query nested inside another query ✅ Scalar subqueries return one value (use with =, <, >, etc.) ✅ Use IN for subqueries returning multiple values ✅ Use EXISTS to check if subquery returns any rows ✅ Use NOT EXISTS for "doesn't exist" checks (better than NOT IN) ✅ Subqueries can appear in SELECT, FROM, WHERE, HAVING ✅ ANY and ALL for comparing to multiple values ✅ Derived tables (subquery in FROM) need an alias ✅ Often JOIN is faster than subquery (but both work)

Coming up next: We'll learn about Common Table Expressions (CTEs) - a cleaner, more powerful alternative to complex subqueries!

Practice Exercise: Try these queries:

  1. Find products more expensive than average
  2. Find departments where all employees earn > $75k
  3. Show customers with their total order count (using subquery in SELECT)
  4. Find employees earning less than their department average
  5. List products that have never been ordered (using NOT IN or NOT EXISTS)