UNION Operations

Combine query results vertically with UNION and UNION ALL.

16 min read
Beginner

Introduction to UNION Operations

Sometimes you need to combine results from multiple queries into a single result set. For example:

  • Merge customer lists from multiple regions
  • Combine current and archived records
  • Create a unified view of data from different tables
  • Consolidate similar data for reporting

UNION operations let you stack query results vertically - combining rows from multiple SELECT statements into one result set!

A SQL operator that combines the results of two or more SELECT statements into a single result set. It "stacks" rows vertically, removing duplicate rows by default. UNION ALL includes all rows including duplicates.

Key concepts:

  • Combines results vertically (adding rows)
  • Different from JOINs which combine horizontally (adding columns)
  • All SELECT statements must have same number of columns
  • Column data types must be compatible
  • Column names come from the first SELECT

Think of UNION as "stacking" query results on top of each other.

Basic UNION Syntax

Syntax:

SELECT column1, column2
FROM table1
UNION
SELECT column1, column2
FROM table2;

Let's work with two employee tables from different offices:

us_employees
id
name
office
salary
1Alice SmithNew York95000
2Bob JohnsonNew York85000
3Carol WhiteChicago90000
uk_employees
id
name
office
salary
101David BrownLondon80000
102Emma WilsonManchester75000
103Frank MillerLondon95000
sql
-- Combine all employees from both offices
SELECT id, name, office, salary
FROM us_employees
UNION
SELECT id, name, office, salary
FROM uk_employees;

What happened:

  1. First SELECT gets rows from us_employees
  2. Second SELECT gets rows from uk_employees
  3. UNION combines them into one result set
  4. All 6 rows are returned (no duplicates in this case)

UNION vs UNION ALL

UNION removes duplicate rows, UNION ALL keeps all rows including duplicates.

Let's see the difference with overlapping data:

current_customers
customer_id
name
status
1Acme Corpactive
2TechStart Incactive
3Global Ltdactive
vip_customers
customer_id
name
status
2TechStart Incvip
3Global Ltdvip
4Elite Covip
sql
-- UNION (removes duplicates)
SELECT customer_id, name
FROM current_customers
UNION
SELECT customer_id, name
FROM vip_customers
ORDER BY customer_id;

Notice customers 2 and 3 appear only once - UNION removed duplicates!

Now with UNION ALL:

sql
-- UNION ALL (keeps duplicates)
SELECT customer_id, name
FROM current_customers
UNION ALL
SELECT customer_id, name
FROM vip_customers
ORDER BY customer_id;

With UNION ALL, customers 2 and 3 appear twice - all rows kept!

When to use each:

  • UNION: When you want unique rows only (slower, removes duplicates)
  • UNION ALL: When you want all rows including duplicates (faster, no deduplication)

Performance tip: Use UNION ALL when you know there are no duplicates or duplicates are acceptable. It's faster because it skips the deduplication step.

UNION Requirements - Same Structure

All SELECT statements in a UNION must have:

  1. Same number of columns
  2. Compatible data types (same position)
  3. Columns don't need same names (first SELECT determines names)
sql
-- This works - same structure
SELECT id, name, 'US' AS region
FROM us_employees
UNION
SELECT id, name, 'UK' AS region
FROM uk_employees;

This fails - different number of columns:

SELECT id, name FROM us_employees
UNION
SELECT id, name, office FROM uk_employees;  -- Error! 2 vs 3 columns

This fails - incompatible types:

SELECT id, name FROM us_employees
UNION
SELECT name, id FROM uk_employees;  -- Error! INT vs VARCHAR mismatch

Adding Identifying Columns

Often you want to know which query each row came from:

sql
-- Add source identifier
SELECT
    name,
    office,
    salary,
    'US Office' AS source
FROM us_employees
UNION ALL
SELECT
    name,
    office,
    salary,
    'UK Office' AS source
FROM uk_employees
ORDER BY salary DESC;

The source column helps identify where each row originated!

Combining More Than Two Queries

You can UNION multiple SELECT statements:

q1_sales
product
sales
Laptop50000
Mouse5000
q2_sales
product
sales
Laptop55000
Keyboard8000
q3_sales
product
sales
Laptop60000
Monitor30000
sql
-- Combine sales from all quarters
SELECT product, sales, 'Q1' AS quarter FROM q1_sales
UNION ALL
SELECT product, sales, 'Q2' AS quarter FROM q2_sales
UNION ALL
SELECT product, sales, 'Q3' AS quarter FROM q3_sales
ORDER BY product, quarter;

UNION with WHERE Clauses

Each SELECT can have its own WHERE clause:

sql
-- High earners from both offices
SELECT name, office, salary, 'US' AS region
FROM us_employees
WHERE salary >= 90000
UNION
SELECT name, office, salary, 'UK' AS region
FROM uk_employees
WHERE salary >= 90000
ORDER BY salary DESC;

UNION with Aggregations

Combine aggregated results from different tables:

sql
-- Summary statistics from both offices
SELECT
    'US Office' AS office,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary
FROM us_employees
UNION ALL
SELECT
    'UK Office' AS office,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary
FROM uk_employees;

UNION with JOINs

Each SELECT in a UNION can include JOINs:

orders_2023
order_id
customer_id
total
11500
22300
orders_2024
order_id
customer_id
total
1011600
1023450
customers
customer_id
name
1Alice Corp
2Bob Inc
3Carol Ltd
sql
-- All orders with customer names from both years
SELECT
    o.order_id,
    c.name AS customer,
    o.total,
    2023 AS year
FROM orders_2023 o
JOIN customers c ON o.customer_id = c.customer_id
UNION ALL
SELECT
    o.order_id,
    c.name AS customer,
    o.total,
    2024 AS year
FROM orders_2024 o
JOIN customers c ON o.customer_id = c.customer_id
ORDER BY year, order_id;

Practical Example: Customer Contact List

Combine multiple sources into a unified contact list:

newsletter_subscribers
email
name
alice@example.comAlice Smith
bob@example.comBob Johnson
customers
email
first_name
last_name
bob@example.comBobJohnson
carol@example.comCarolWhite
event_attendees
attendee_email
attendee_name
carol@example.comCarol White
david@example.comDavid Brown
sql
-- Unified contact list from all sources
SELECT
    email,
    name,
    'Newsletter' AS source
FROM newsletter_subscribers
UNION
SELECT
    email,
    CONCAT(first_name, ' ', last_name) AS name,
    'Customer' AS source
FROM customers
UNION
SELECT
    attendee_email AS email,
    attendee_name AS name,
    'Event' AS source
FROM event_attendees
ORDER BY email;

Notice duplicates were removed! Bob and Carol appear in multiple sources but show only once.

To see all occurrences:

sql
-- Show all occurrences with UNION ALL
SELECT email, name, 'Newsletter' AS source
FROM newsletter_subscribers
UNION ALL
SELECT email, CONCAT(first_name, ' ', last_name), 'Customer'
FROM customers
UNION ALL
SELECT attendee_email, attendee_name, 'Event'
FROM event_attendees
ORDER BY email, source;

UNION with NULL Values

Columns that don't exist in all queries can use NULL:

sql
-- Combine products with different structures
SELECT
    product_name,
    price,
    NULL AS discount_percent
FROM regular_products
UNION ALL
SELECT
    product_name,
    price,
    discount
FROM clearance_products
WHERE discount > 0;

This ensures all queries have the same number of columns!

Common Mistakes to Avoid

Don't make these mistakes:

  1. Different number of columns: ❌

    • SELECT a, b UNION SELECT a, b, c
    • All queries must have same column count
  2. Incompatible data types: ❌

    • SELECT id, name UNION SELECT name, id
    • Column types must match by position
  3. Using UNION when UNION ALL is better: ⚠️

    • UNION removes duplicates (slower)
    • If no duplicates exist or duplicates are OK, use UNION ALL
  4. Forgetting ORDER BY goes at the end: ❌

    • SELECT * FROM t1 ORDER BY id UNION SELECT * FROM t2
    • Correct: SELECT * FROM t1 UNION SELECT * FROM t2 ORDER BY id
  5. Not aliasing columns from first SELECT: ❌

    • Column names come from first SELECT only
    • Alias columns there if you want specific names
  6. Assuming column names must match: ❌

    • Names don't need to match, only count and types
    • Names from first SELECT are used
  7. Using UNION on large tables without filters: ⚠️

    • Filter with WHERE before combining
    • Reduces data volume and improves performance

Performance Considerations

UNION performance tips:

  1. Use UNION ALL when possible

    • Skip duplicate checking if not needed
    • Can be significantly faster
  2. Filter before combining

    • Add WHERE clauses to each SELECT
    • Reduces rows to process
  3. Index important columns

    • Especially columns in WHERE clauses
    • Helps each individual query run faster
  4. Consider alternatives for very large unions

    • If combining many tables, consider views
    • Or restructure schema if constantly using UNION

Key Takeaways

What you learned:

UNION combines results from multiple SELECT statements vertically ✅ UNION removes duplicate rows (default) ✅ UNION ALL keeps all rows including duplicates (faster) ✅ All SELECT statements must have:

  • Same number of columns
  • Compatible data types (by position) ✅ Column names come from first SELECT ✅ ORDER BY goes at the very end (after all UNIONs) ✅ Each SELECT can have its own WHERE, JOINs, GROUP BY ✅ Can add literal columns to identify source ✅ Use NULL to pad missing columns

Real-world applications:

  • Combining data from multiple regions/offices
  • Merging current and archived tables
  • Creating unified contact lists
  • Consolidating historical data by time period
  • Building comprehensive reports from multiple sources

Practice Exercise:

  1. Combine customers from two regions, add a region column
  2. Create a unified product list from multiple categories
  3. Combine sales data from Q1, Q2, Q3, Q4 with quarter labels
  4. Merge employee lists showing only those earning > 80000
  5. Create a master email list from customers, newsletter, and event attendees
  6. Compare UNION vs UNION ALL performance on duplicate data
  7. Combine monthly summaries (COUNT, SUM) from 12 monthly tables