INTERSECT and EXCEPT

Find common or different rows between query results.

18 min read
Intermediate

Introduction to Set Operations

You've learned about UNION to combine results from multiple queries. SQL provides two more powerful set operations: INTERSECT and EXCEPT.

Think of them like Venn diagrams:

  • UNION - everything in A or B (we covered this)
  • INTERSECT - only what's in both A and B (the overlap)
  • EXCEPT - what's in A but not in B (the difference)

Real-world scenarios:

  • INTERSECT: "Find customers who bought both product A AND product B"
  • EXCEPT: "Find products in our catalog that have never been ordered"

These operations work on entire result sets, comparing complete rows.

SQL operations that combine or compare entire result sets: UNION (combine all), INTERSECT (common rows), and EXCEPT (difference). They treat query results as mathematical sets.

Set Operations Comparison
Operation
Returns
Removes Duplicates
Example Use
UNIONAll rows from both queriesYesCombine customer lists
UNION ALLAll rows from both queriesNoCombine keeping duplicates
INTERSECTOnly rows in both queriesYesFind common elements
EXCEPTRows in first, not in secondYesFind missing elements

Understanding INTERSECT

INTERSECT returns only rows that appear in both query results. Think of it as "AND" for entire result sets.

Let's work with these tables:

customers_2023
customer_id
name
101Alice
102Bob
103Charlie
104Diana
customers_2024
customer_id
name
102Bob
103Charlie
105Eve
106Frank

Find customers who were active in both 2023 AND 2024:

sql
SELECT customer_id, name FROM customers_2023
INTERSECT
SELECT customer_id, name FROM customers_2024;

Only Bob and Charlie appear in both tables. Alice and Diana only appear in 2023, Eve and Frank only in 2024 - they're excluded.

Key points:

  • Rows must match exactly (all columns)
  • Automatically removes duplicates
  • Both queries must return the same number of columns with compatible data types

INTERSECT compares complete rows. If you select customer_id, name, both values must match. If you select just customer_id, only IDs are compared.

INTERSECT Syntax and Rules

Basic syntax:

sql
SELECT column1, column2, ...
FROM table1
WHERE condition1
INTERSECT
SELECT column1, column2, ...
FROM table2
WHERE condition2;

Rules:

  1. Same number of columns in both queries
  2. Corresponding columns must have compatible types
  3. Column names can differ (result uses first query's names)
  4. Duplicates are automatically removed
  5. ORDER BY goes at the very end (after all set operations)
sql
-- Valid: Same structure, different names
SELECT product_id FROM inventory
INTERSECT
SELECT product_id FROM orders;

-- Valid: With ORDER BY at the end
SELECT category FROM products_2023
INTERSECT
SELECT category FROM products_2024
ORDER BY category;

Understanding EXCEPT

EXCEPT returns rows from the first query that don't appear in the second query. Think of it as "subtraction" or "A minus B".

Find customers who were active in 2023 but NOT in 2024:

sql
SELECT customer_id, name FROM customers_2023
EXCEPT
SELECT customer_id, name FROM customers_2024;

Alice and Diana appear in 2023 but not in 2024 - they churned!

Find customers who joined in 2024 (new customers):

sql
SELECT customer_id, name FROM customers_2024
EXCEPT
SELECT customer_id, name FROM customers_2023;

Eve and Frank are in 2024 but not 2023 - they're new customers!

Order matters with EXCEPT! A EXCEPT B is different from B EXCEPT A. The first query is the "base" - you're removing rows that match the second query.

Note: Some databases (MySQL) don't support EXCEPT. Use MINUS instead (Oracle, DB2) or rewrite with NOT IN or NOT EXISTS:

sql
-- EXCEPT version
SELECT customer_id, name FROM customers_2023
EXCEPT
SELECT customer_id, name FROM customers_2024;

-- Equivalent using NOT IN (works in all databases)
SELECT customer_id, name
FROM customers_2023
WHERE customer_id NOT IN (SELECT customer_id FROM customers_2024);

Practical Examples

Example 1: Products available in both warehouses

warehouse_a
product_id
product_name
1Laptop
2Mouse
3Keyboard
4Monitor
warehouse_b
product_id
product_name
2Mouse
3Keyboard
5Webcam
6Headphones
sql
-- Products in both warehouses (can fulfill from either)
SELECT product_id, product_name FROM warehouse_a
INTERSECT
SELECT product_id, product_name FROM warehouse_b;

Example 2: Products only in Warehouse A

sql
-- Products only in Warehouse A (need to restock B?)
SELECT product_id, product_name FROM warehouse_a
EXCEPT
SELECT product_id, product_name FROM warehouse_b;

Example 3: Finding customers who bought specific products

orders
order_id
customer_id
product_id
quantity
110112
210211
310123
410311
510222
610431
sql
-- Customers who bought BOTH product 1 AND product 2
SELECT customer_id FROM orders WHERE product_id = 1
INTERSECT
SELECT customer_id FROM orders WHERE product_id = 2;

Alice (101) and Bob (102) bought both products. Charlie (103) only bought product 1, Diana (104) only bought product 3.

Example 4: Customers who bought product 1 but NOT product 2

sql
SELECT customer_id FROM orders WHERE product_id = 1
EXCEPT
SELECT customer_id FROM orders WHERE product_id = 2;

Only Charlie (103) bought product 1 without buying product 2.

Combining Multiple Set Operations

You can chain multiple set operations together:

2022 customers
customer_id
name
101Alice
102Bob
2023 customers
customer_id
name
102Bob
103Charlie
104Diana
2024 customers
customer_id
name
103Charlie
104Diana
105Eve
sql
-- Customers active in both 2023 AND 2024, but NOT in 2022
SELECT customer_id, name FROM customers_2023
INTERSECT
SELECT customer_id, name FROM customers_2024
EXCEPT
SELECT customer_id, name FROM customers_2022;

Order of operations:

  1. INTERSECT: finds customers in both 2023 and 2024 (Bob, Charlie, Diana)
  2. EXCEPT: removes customers who were also in 2022 (removes Bob)
  3. Result: Charlie and Diana (joined in 2023, still active in 2024)

Use parentheses for clarity (if supported):

sql
-- Make order explicit with parentheses
(SELECT customer_id, name FROM customers_2023
 INTERSECT
 SELECT customer_id, name FROM customers_2024)
EXCEPT
SELECT customer_id, name FROM customers_2022;

Set Operations vs JOINs

Set operations and JOINs can sometimes achieve similar results, but they work differently:

Set Operations vs JOINs
Feature
Set Operations
JOINs
CombinesRows (vertically)Columns (horizontally)
DuplicatesRemoved automaticallyKept (unless DISTINCT)
RequirementsSame column structureMatching join keys
ResultRows from query A or BCombined columns from A and B

INTERSECT vs INNER JOIN:

sql
-- INTERSECT: Returns customer rows that exist in both
SELECT customer_id FROM customers_2023
INTERSECT
SELECT customer_id FROM customers_2024;

-- INNER JOIN: Returns combined rows with matching customer_id
SELECT c1.customer_id, c1.name AS name_2023, c2.name AS name_2024
FROM customers_2023 c1
INNER JOIN customers_2024 c2 ON c1.customer_id = c2.customer_id;

INTERSECT returns just the IDs, JOIN returns columns from both tables.

EXCEPT vs LEFT JOIN with NULL check:

sql
-- EXCEPT: Customers in 2023, not in 2024
SELECT customer_id FROM customers_2023
EXCEPT
SELECT customer_id FROM customers_2024;

-- Equivalent with LEFT JOIN:
SELECT c1.customer_id
FROM customers_2023 c1
LEFT JOIN customers_2024 c2 ON c1.customer_id = c2.customer_id
WHERE c2.customer_id IS NULL;

When to use what:

  • Set operations: Comparing complete row sets, finding overlap or differences
  • JOINs: Combining data from related tables, need columns from both tables

Database Compatibility

Not all databases support all set operations:

Database Support
Database
INTERSECT
EXCEPT
Alternative
PostgreSQL✅ Yes✅ YesN/A
SQL Server✅ Yes✅ YesN/A
Oracle✅ Yes❌ NoUse MINUS
MySQL❌ No (before 8.0)❌ NoUse JOINs/subqueries
SQLite✅ Yes✅ YesN/A

MySQL workarounds before version 8.0:

sql
-- INTERSECT workaround
SELECT DISTINCT customer_id FROM customers_2023
WHERE customer_id IN (SELECT customer_id FROM customers_2024);

-- Or using INNER JOIN
SELECT DISTINCT c1.customer_id
FROM customers_2023 c1
INNER JOIN customers_2024 c2 ON c1.customer_id = c2.customer_id;

-- EXCEPT workaround
SELECT customer_id FROM customers_2023
WHERE customer_id NOT IN (SELECT customer_id FROM customers_2024);

-- Or using LEFT JOIN
SELECT c1.customer_id
FROM customers_2023 c1
LEFT JOIN customers_2024 c2 ON c1.customer_id = c2.customer_id
WHERE c2.customer_id IS NULL;

Performance Considerations

Tips for efficient set operations:

1. Use indexes on compared columns:

sql
-- Index the columns used in INTERSECT/EXCEPT
CREATE INDEX idx_customers_2023_id ON customers_2023(customer_id);
CREATE INDEX idx_customers_2024_id ON customers_2024(customer_id);

2. Filter early with WHERE clauses:

sql
-- Good: Filter before INTERSECT (smaller datasets)
SELECT customer_id FROM customers_2023 WHERE country = 'USA'
INTERSECT
SELECT customer_id FROM customers_2024 WHERE country = 'USA';

-- Less efficient: Filter after
SELECT customer_id
FROM (
  SELECT customer_id FROM customers_2023
  INTERSECT
  SELECT customer_id FROM customers_2024
) AS common
WHERE country = 'USA';  -- This won't work without the country column!

3. Consider using JOINs for large datasets:

JOINs are often faster than set operations on large tables because:

  • Optimizers are better at optimizing JOINs
  • Indexes are more effectively used with JOINs
  • Set operations may require sorting the entire result set

4. Use UNION ALL instead of UNION when duplicates don't matter:

sql
-- UNION ALL: Faster (no duplicate removal)
SELECT customer_id FROM customers_2023
UNION ALL
SELECT customer_id FROM customers_2024;

-- UNION: Slower (removes duplicates)
SELECT customer_id FROM customers_2023
UNION
SELECT customer_id FROM customers_2024;

Common Mistakes to Avoid

Don't make these mistakes:

  1. Mismatched column counts: ❌

    • Wrong: SELECT id, name FROM table1 INTERSECT SELECT id FROM table2
    • Both queries must return the same number of columns
  2. Incompatible data types: ❌

    • Wrong: SELECT id::TEXT FROM table1 INTERSECT SELECT id::INT FROM table2
    • Column types must be compatible
  3. Forgetting order matters for EXCEPT: ❌

    • A EXCEPT BB EXCEPT A
    • Think carefully about which is the "base" query
  4. Using EXCEPT in MySQL without checking version: ❌

    • MySQL < 8.0 doesn't support EXCEPT
    • Use NOT IN or LEFT JOIN workarounds
  5. Not considering NULL values: ⚠️

    • Set operations compare NULLs as equal (NULL = NULL is true)
    • This differs from regular WHERE clauses
  6. Ordering before set operations: ❌

    • Wrong: (SELECT * FROM t1 ORDER BY id) INTERSECT SELECT * FROM t2
    • ORDER BY must come after all set operations
  7. Assuming set operations are always faster than JOINs: ❌

    • JOINs are often faster, especially with proper indexes
    • Test both approaches on your data
  8. Using INTERSECT when EXISTS would be clearer: Sometimes

    • For existence checks, EXISTS can be more readable
    • Choose based on clarity and performance

Advanced Examples

Example 1: Find products in all three warehouses

warehouse_c
product_id
product_name
2Mouse
4Monitor
7Speaker
sql
-- Products available in all three warehouses
SELECT product_id, product_name FROM warehouse_a
INTERSECT
SELECT product_id, product_name FROM warehouse_b
INTERSECT
SELECT product_id, product_name FROM warehouse_c;

Only Mouse (product 2) exists in all three warehouses.

Example 2: Customers who bought A and B, but not C

sql
-- Bought products 1 AND 2, but NOT product 3
(SELECT customer_id FROM orders WHERE product_id = 1
 INTERSECT
 SELECT customer_id FROM orders WHERE product_id = 2)
EXCEPT
SELECT customer_id FROM orders WHERE product_id = 3;

Bob (102) bought products 1 and 2 but not product 3.

Example 3: Finding gaps in data

sql
-- Products in catalog but never ordered
SELECT product_id FROM products
EXCEPT
SELECT DISTINCT product_id FROM orders;

Products 4, 5, and 6 exist in the catalog but have never been ordered - potential candidates for discontinuation!

Key Takeaways

What you learned:

INTERSECT returns rows that appear in both queries (overlap) ✅ EXCEPT returns rows in the first query but not in the second (difference) ✅ Set operations compare complete rows, not individual columns ✅ Both queries must have the same number of columns with compatible types ✅ Duplicates are automatically removed (like UNION, not UNION ALL) ✅ Order matters for EXCEPT (A EXCEPT B ≠ B EXCEPT A) ✅ Not all databases support INTERSECT/EXCEPT (MySQL < 8.0, Oracle uses MINUS) ✅ Can combine multiple set operations in one query ✅ ORDER BY goes at the very end, after all set operations ✅ JOINs may be faster than set operations for large datasets

Common use cases:

  • Finding customers active across multiple periods
  • Identifying products in multiple locations
  • Detecting churned customers (were active, now not)
  • Finding new customers (active now, weren't before)
  • Comparing datasets for overlap or differences
  • Quality checks (catalog vs orders)

Coming up next: You'll explore more advanced filtering with IN, BETWEEN, and LIKE operators for flexible pattern matching!

Practice Exercise: Try these queries:

  1. Find customers who placed orders in January AND February
  2. Find products in warehouse_a but not in warehouse_b
  3. Find employees in both Engineering and Sales departments (if they transferred)
  4. List customers who bought products 1, 2, and 3 (all three)
  5. Find email addresses that appear in both 'unsubscribed' and 'active' lists (data integrity issue!)