UNION Operations
Combine query results vertically with UNION and UNION ALL.
Introduction to UNION Operations
Sometimes you need to combine results from multiple queries into a single result set. For example:
- Merge customer lists from multiple regions
- Combine current and archived records
- Create a unified view of data from different tables
- Consolidate similar data for reporting
UNION operations let you stack query results vertically - combining rows from multiple SELECT statements into one result set!
A SQL operator that combines the results of two or more SELECT statements into a single result set. It "stacks" rows vertically, removing duplicate rows by default. UNION ALL includes all rows including duplicates.
Key concepts:
- Combines results vertically (adding rows)
- Different from JOINs which combine horizontally (adding columns)
- All SELECT statements must have same number of columns
- Column data types must be compatible
- Column names come from the first SELECT
Think of UNION as "stacking" query results on top of each other.
Basic UNION Syntax
Syntax:
SELECT column1, column2
FROM table1
UNION
SELECT column1, column2
FROM table2;
Let's work with two employee tables from different offices:
id | name | office | salary |
|---|---|---|---|
| 1 | Alice Smith | New York | 95000 |
| 2 | Bob Johnson | New York | 85000 |
| 3 | Carol White | Chicago | 90000 |
id | name | office | salary |
|---|---|---|---|
| 101 | David Brown | London | 80000 |
| 102 | Emma Wilson | Manchester | 75000 |
| 103 | Frank Miller | London | 95000 |
-- Combine all employees from both offices
SELECT id, name, office, salary
FROM us_employees
UNION
SELECT id, name, office, salary
FROM uk_employees;What happened:
- First SELECT gets rows from us_employees
- Second SELECT gets rows from uk_employees
- UNION combines them into one result set
- All 6 rows are returned (no duplicates in this case)
UNION vs UNION ALL
UNION removes duplicate rows, UNION ALL keeps all rows including duplicates.
Let's see the difference with overlapping data:
customer_id | name | status |
|---|---|---|
| 1 | Acme Corp | active |
| 2 | TechStart Inc | active |
| 3 | Global Ltd | active |
customer_id | name | status |
|---|---|---|
| 2 | TechStart Inc | vip |
| 3 | Global Ltd | vip |
| 4 | Elite Co | vip |
-- UNION (removes duplicates)
SELECT customer_id, name
FROM current_customers
UNION
SELECT customer_id, name
FROM vip_customers
ORDER BY customer_id;Notice customers 2 and 3 appear only once - UNION removed duplicates!
Now with UNION ALL:
-- UNION ALL (keeps duplicates)
SELECT customer_id, name
FROM current_customers
UNION ALL
SELECT customer_id, name
FROM vip_customers
ORDER BY customer_id;With UNION ALL, customers 2 and 3 appear twice - all rows kept!
When to use each:
- UNION: When you want unique rows only (slower, removes duplicates)
- UNION ALL: When you want all rows including duplicates (faster, no deduplication)
Performance tip: Use UNION ALL when you know there are no duplicates or duplicates are acceptable. It's faster because it skips the deduplication step.
UNION Requirements - Same Structure
All SELECT statements in a UNION must have:
- Same number of columns
- Compatible data types (same position)
- Columns don't need same names (first SELECT determines names)
-- This works - same structure
SELECT id, name, 'US' AS region
FROM us_employees
UNION
SELECT id, name, 'UK' AS region
FROM uk_employees;This fails - different number of columns:
SELECT id, name FROM us_employees
UNION
SELECT id, name, office FROM uk_employees; -- Error! 2 vs 3 columns
This fails - incompatible types:
SELECT id, name FROM us_employees
UNION
SELECT name, id FROM uk_employees; -- Error! INT vs VARCHAR mismatch
Adding Identifying Columns
Often you want to know which query each row came from:
-- Add source identifier
SELECT
name,
office,
salary,
'US Office' AS source
FROM us_employees
UNION ALL
SELECT
name,
office,
salary,
'UK Office' AS source
FROM uk_employees
ORDER BY salary DESC;The source column helps identify where each row originated!
Combining More Than Two Queries
You can UNION multiple SELECT statements:
product | sales |
|---|---|
| Laptop | 50000 |
| Mouse | 5000 |
product | sales |
|---|---|
| Laptop | 55000 |
| Keyboard | 8000 |
product | sales |
|---|---|
| Laptop | 60000 |
| Monitor | 30000 |
-- Combine sales from all quarters
SELECT product, sales, 'Q1' AS quarter FROM q1_sales
UNION ALL
SELECT product, sales, 'Q2' AS quarter FROM q2_sales
UNION ALL
SELECT product, sales, 'Q3' AS quarter FROM q3_sales
ORDER BY product, quarter;UNION with WHERE Clauses
Each SELECT can have its own WHERE clause:
-- High earners from both offices
SELECT name, office, salary, 'US' AS region
FROM us_employees
WHERE salary >= 90000
UNION
SELECT name, office, salary, 'UK' AS region
FROM uk_employees
WHERE salary >= 90000
ORDER BY salary DESC;UNION with Aggregations
Combine aggregated results from different tables:
-- Summary statistics from both offices
SELECT
'US Office' AS office,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary
FROM us_employees
UNION ALL
SELECT
'UK Office' AS office,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary
FROM uk_employees;UNION with JOINs
Each SELECT in a UNION can include JOINs:
order_id | customer_id | total |
|---|---|---|
| 1 | 1 | 500 |
| 2 | 2 | 300 |
order_id | customer_id | total |
|---|---|---|
| 101 | 1 | 600 |
| 102 | 3 | 450 |
customer_id | name |
|---|---|
| 1 | Alice Corp |
| 2 | Bob Inc |
| 3 | Carol Ltd |
-- All orders with customer names from both years
SELECT
o.order_id,
c.name AS customer,
o.total,
2023 AS year
FROM orders_2023 o
JOIN customers c ON o.customer_id = c.customer_id
UNION ALL
SELECT
o.order_id,
c.name AS customer,
o.total,
2024 AS year
FROM orders_2024 o
JOIN customers c ON o.customer_id = c.customer_id
ORDER BY year, order_id;Practical Example: Customer Contact List
Combine multiple sources into a unified contact list:
email | name |
|---|---|
| alice@example.com | Alice Smith |
| bob@example.com | Bob Johnson |
email | first_name | last_name |
|---|---|---|
| bob@example.com | Bob | Johnson |
| carol@example.com | Carol | White |
attendee_email | attendee_name |
|---|---|
| carol@example.com | Carol White |
| david@example.com | David Brown |
-- Unified contact list from all sources
SELECT
email,
name,
'Newsletter' AS source
FROM newsletter_subscribers
UNION
SELECT
email,
CONCAT(first_name, ' ', last_name) AS name,
'Customer' AS source
FROM customers
UNION
SELECT
attendee_email AS email,
attendee_name AS name,
'Event' AS source
FROM event_attendees
ORDER BY email;Notice duplicates were removed! Bob and Carol appear in multiple sources but show only once.
To see all occurrences:
-- Show all occurrences with UNION ALL
SELECT email, name, 'Newsletter' AS source
FROM newsletter_subscribers
UNION ALL
SELECT email, CONCAT(first_name, ' ', last_name), 'Customer'
FROM customers
UNION ALL
SELECT attendee_email, attendee_name, 'Event'
FROM event_attendees
ORDER BY email, source;UNION with NULL Values
Columns that don't exist in all queries can use NULL:
-- Combine products with different structures
SELECT
product_name,
price,
NULL AS discount_percent
FROM regular_products
UNION ALL
SELECT
product_name,
price,
discount
FROM clearance_products
WHERE discount > 0;This ensures all queries have the same number of columns!
Common Mistakes to Avoid
Don't make these mistakes:
-
Different number of columns: ❌
SELECT a, b UNION SELECT a, b, c- All queries must have same column count
-
Incompatible data types: ❌
SELECT id, name UNION SELECT name, id- Column types must match by position
-
Using UNION when UNION ALL is better: ⚠️
- UNION removes duplicates (slower)
- If no duplicates exist or duplicates are OK, use UNION ALL
-
Forgetting ORDER BY goes at the end: ❌
SELECT * FROM t1 ORDER BY id UNION SELECT * FROM t2- Correct:
SELECT * FROM t1 UNION SELECT * FROM t2 ORDER BY id
-
Not aliasing columns from first SELECT: ❌
- Column names come from first SELECT only
- Alias columns there if you want specific names
-
Assuming column names must match: ❌
- Names don't need to match, only count and types
- Names from first SELECT are used
-
Using UNION on large tables without filters: ⚠️
- Filter with WHERE before combining
- Reduces data volume and improves performance
Performance Considerations
UNION performance tips:
-
Use UNION ALL when possible
- Skip duplicate checking if not needed
- Can be significantly faster
-
Filter before combining
- Add WHERE clauses to each SELECT
- Reduces rows to process
-
Index important columns
- Especially columns in WHERE clauses
- Helps each individual query run faster
-
Consider alternatives for very large unions
- If combining many tables, consider views
- Or restructure schema if constantly using UNION
Key Takeaways
What you learned:
✅ UNION combines results from multiple SELECT statements vertically ✅ UNION removes duplicate rows (default) ✅ UNION ALL keeps all rows including duplicates (faster) ✅ All SELECT statements must have:
- Same number of columns
- Compatible data types (by position) ✅ Column names come from first SELECT ✅ ORDER BY goes at the very end (after all UNIONs) ✅ Each SELECT can have its own WHERE, JOINs, GROUP BY ✅ Can add literal columns to identify source ✅ Use NULL to pad missing columns
Real-world applications:
- Combining data from multiple regions/offices
- Merging current and archived tables
- Creating unified contact lists
- Consolidating historical data by time period
- Building comprehensive reports from multiple sources
Practice Exercise:
- Combine customers from two regions, add a region column
- Create a unified product list from multiple categories
- Combine sales data from Q1, Q2, Q3, Q4 with quarter labels
- Merge employee lists showing only those earning > 80000
- Create a master email list from customers, newsletter, and event attendees
- Compare UNION vs UNION ALL performance on duplicate data
- Combine monthly summaries (COUNT, SUM) from 12 monthly tables