Analytical SQL Patterns
Common patterns for business analytics and reporting.
Analytical SQL Patterns
As you work with SQL for analytics and business intelligence, you'll discover that certain query patterns appear repeatedly. Understanding these common analytical patterns will help you write queries more efficiently and solve business problems faster.
In this lesson, we'll explore the most important SQL patterns used in analytics, from running totals and moving averages to ranking, percentiles, and period comparisons. These patterns form the foundation of most analytical work.
Why Learn Analytical Patterns?
Analytical patterns provide:
- Reusable solutions to common business questions
- Best practices for performance and readability
- Building blocks for complex analytical queries
- Consistency across analytical work in your organization
Running Totals (Cumulative Sums)
Running totals are one of the most common analytical patterns, used for calculating year-to-date revenue, cumulative customer counts, and more.
Example: Year-to-Date Revenue
Calculate cumulative revenue for each day:
WITH daily_revenue AS (
SELECT
DATE(order_date) AS sale_date,
SUM(total_amount) AS daily_revenue
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2025
GROUP BY DATE(order_date)
)
SELECT
sale_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY sale_date) AS running_total,
ROUND(AVG(daily_revenue) OVER (ORDER BY sale_date), 2) AS running_avg
FROM daily_revenue
ORDER BY sale_date
LIMIT 7;Moving Averages
Moving averages are essential for identifying trends and smoothing noisy data.
Example: 7-Day Moving Average
Calculate a 7-day moving average of daily sales:
WITH daily_sales AS (
SELECT
DATE(order_date) AS sale_date,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= '2025-03-01'
GROUP BY DATE(order_date)
)
SELECT
sale_date,
order_count,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7d,
ROUND(AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_30d
FROM daily_sales
ORDER BY sale_date DESC
LIMIT 5;Ranking and Percentiles
Ranking patterns help identify top performers, outliers, and distribution characteristics.
Example: Ranking Customers by Spending
Rank customers by total spending with ties handling:
WITH customer_spending AS (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '365 days'
GROUP BY customer_id
)
SELECT
customer_id,
order_count,
ROUND(total_spent, 2) AS total_spent,
ROW_NUMBER() OVER (ORDER BY total_spent DESC) AS row_num,
RANK() OVER (ORDER BY total_spent DESC) AS rank,
DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank,
NTILE(10) OVER (ORDER BY total_spent DESC) AS decile
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 8;Understanding the ranking functions:
- ROW_NUMBER(): Assigns unique sequential numbers, even for ties (1, 2, 3, 4...)
- RANK(): Same rank for ties, then skips numbers (1, 1, 3, 4...)
- DENSE_RANK(): Same rank for ties, no gaps (1, 1, 2, 3...)
- NTILE(N): Divides rows into N equal groups (useful for deciles, quartiles)
Example: Percentile Calculation
Find the 50th, 75th, and 90th percentile of order values:
SELECT
ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY total_amount), 2) AS median,
ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount), 2) AS p75,
ROUND(PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY total_amount), 2) AS p90,
ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_amount), 2) AS p95,
ROUND(AVG(total_amount), 2) AS average
FROM orders
WHERE order_date >= '2025-01-01';Period-over-Period Comparisons
Comparing metrics across time periods (month-over-month, year-over-year) is fundamental to understanding business performance.
Example: Month-over-Month Comparison
Compare each month's metrics to the previous month:
WITH monthly_metrics AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
order_count,
ROUND(revenue, 2) AS revenue,
unique_customers,
LAG(order_count) OVER (ORDER BY month) AS prev_month_orders,
order_count - LAG(order_count) OVER (ORDER BY month) AS order_change,
ROUND(100.0 * (order_count - LAG(order_count) OVER (ORDER BY month)) /
LAG(order_count) OVER (ORDER BY month), 2) AS order_growth_pct,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month)) /
LAG(revenue) OVER (ORDER BY month), 2) AS revenue_growth_pct
FROM monthly_metrics
ORDER BY month DESC
LIMIT 6;ABC Analysis (Pareto Analysis)
ABC analysis helps prioritize resources by identifying which products, customers, or categories contribute the most value.
Example: Product ABC Classification
Classify products based on their revenue contribution:
WITH product_revenue AS (
SELECT
product_id,
SUM(quantity * unit_price) AS total_revenue
FROM order_items
WHERE order_date >= CURRENT_DATE - INTERVAL '365 days'
GROUP BY product_id
),
revenue_with_cumulative AS (
SELECT
product_id,
total_revenue,
SUM(total_revenue) OVER () AS total_all_products,
SUM(total_revenue) OVER (ORDER BY total_revenue DESC) AS cumulative_revenue,
ROUND(100.0 * SUM(total_revenue) OVER (ORDER BY total_revenue DESC) /
SUM(total_revenue) OVER (), 2) AS cumulative_pct
FROM product_revenue
)
SELECT
product_id,
ROUND(total_revenue, 2) AS revenue,
cumulative_pct,
CASE
WHEN cumulative_pct <= 80 THEN 'A - High Value'
WHEN cumulative_pct <= 95 THEN 'B - Medium Value'
ELSE 'C - Low Value'
END AS abc_category
FROM revenue_with_cumulative
ORDER BY total_revenue DESC
LIMIT 12;This analysis shows that the top 8 products (A category) contribute 75.45% of total revenue, helping focus inventory and marketing efforts.
Sessionization Pattern
Sessionization groups user events into sessions based on time gaps between activities, essential for understanding user behavior and engagement.
Example: Creating User Sessions
Group user events into sessions with a 30-minute inactivity timeout:
WITH events_with_gaps AS (
SELECT
user_id,
event_time,
event_type,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event_time,
CASE
WHEN event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
> INTERVAL '30 minutes'
OR LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
THEN 1
ELSE 0
END AS is_new_session
FROM user_events
WHERE event_time >= '2025-03-01'
),
sessions AS (
SELECT
user_id,
event_time,
event_type,
SUM(is_new_session) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS session_id
FROM events_with_gaps
)
SELECT
user_id,
session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS event_count,
EXTRACT(EPOCH FROM (MAX(event_time) - MIN(event_time)))/60 AS session_duration_minutes
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_id
LIMIT 8;Gap and Island Detection
Gap and island detection identifies consecutive sequences (islands) and breaks (gaps) in data, useful for finding streaks, detecting outages, or analyzing continuous behavior.
Example: Finding Consecutive Active Days
Identify streaks of consecutive days where users were active:
WITH daily_activity AS (
SELECT DISTINCT
user_id,
DATE(activity_date) AS activity_date
FROM user_activity
WHERE activity_date >= '2025-03-01'
),
numbered_dates AS (
SELECT
user_id,
activity_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) AS rn,
activity_date - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) * INTERVAL '1 day') AS island_id
FROM daily_activity
),
streaks AS (
SELECT
user_id,
island_id,
MIN(activity_date) AS streak_start,
MAX(activity_date) AS streak_end,
COUNT(*) AS streak_length
FROM numbered_dates
GROUP BY user_id, island_id
)
SELECT
user_id,
streak_start,
streak_end,
streak_length
FROM streaks
WHERE streak_length >= 3
ORDER BY streak_length DESC, user_id
LIMIT 8;The trick here is subtracting ROW_NUMBER() from the date. Consecutive dates produce the same result (the "island_id"), making them easy to group.
First and Last Value Patterns
Tracking first and last occurrences helps understand customer acquisition, retention, and behavior changes.
Example: First and Last Purchase Analysis
Analyze each customer's first and most recent purchase:
WITH customer_purchases AS (
SELECT
customer_id,
order_date,
total_amount,
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS first_purchase_date,
FIRST_VALUE(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS first_purchase_amount,
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_purchase_date,
LAST_VALUE(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_purchase_amount,
COUNT(*) OVER (PARTITION BY customer_id) AS total_orders
FROM orders
)
SELECT DISTINCT
customer_id,
first_purchase_date,
ROUND(first_purchase_amount, 2) AS first_amount,
last_purchase_date,
ROUND(last_purchase_amount, 2) AS last_amount,
total_orders,
last_purchase_date - first_purchase_date AS customer_lifespan_days,
ROUND((last_purchase_amount - first_purchase_amount) / first_purchase_amount * 100, 2) AS amount_change_pct
FROM customer_purchases
WHERE total_orders >= 5
ORDER BY customer_lifespan_days DESC
LIMIT 6;Pivot Pattern (Rows to Columns)
Pivoting transforms row data into columns, useful for creating comparison tables and summary reports.
Example: Monthly Revenue by Product Category
Create a table showing monthly revenue for each category:
WITH monthly_category_revenue AS (
SELECT
DATE_TRUNC('month', o.order_date) AS month,
p.category,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2025-01-01'
GROUP BY DATE_TRUNC('month', o.order_date), p.category
)
SELECT
month,
ROUND(SUM(CASE WHEN category = 'Electronics' THEN revenue ELSE 0 END), 2) AS electronics,
ROUND(SUM(CASE WHEN category = 'Clothing' THEN revenue ELSE 0 END), 2) AS clothing,
ROUND(SUM(CASE WHEN category = 'Home' THEN revenue ELSE 0 END), 2) AS home,
ROUND(SUM(CASE WHEN category = 'Sports' THEN revenue ELSE 0 END), 2) AS sports,
ROUND(SUM(revenue), 2) AS total
FROM monthly_category_revenue
GROUP BY month
ORDER BY month;Common Analytical Mistakes
1. Incorrect Window Frame Defaults
Problem: Not understanding default window frames can produce unexpected results.
-- Wrong: RANGE BETWEEN uses value ranges, not row counts
SELECT
sale_date,
revenue,
AVG(revenue) OVER (ORDER BY sale_date) AS avg_revenue -- Not a true moving average!
FROM daily_sales;
-- Correct: Use ROWS BETWEEN for moving averages
SELECT
sale_date,
revenue,
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_sales;2. Forgetting UNBOUNDED FOLLOWING for LAST_VALUE
Problem: LAST_VALUE without proper frame specification only considers rows up to the current row.
-- Wrong: Only gets the current row as "last"
SELECT
customer_id,
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS last_order
FROM orders;
-- Correct: Specify full partition
SELECT
customer_id,
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order
FROM orders;3. Not Handling Division by Zero
Problem: Growth percentage calculations can fail when the denominator is zero.
-- Wrong: Division by zero error
SELECT
month,
(current_revenue - prev_revenue) / prev_revenue * 100 AS growth_pct
FROM monthly_metrics;
-- Correct: Use NULLIF to avoid division by zero
SELECT
month,
ROUND(100.0 * (current_revenue - prev_revenue) /
NULLIF(prev_revenue, 0), 2) AS growth_pct
FROM monthly_metrics;Performance Tips for Analytical Patterns
1. Use CTEs for Complex Patterns
Break complex patterns into readable, maintainable steps:
-- Good: Clear, readable structure
WITH base_data AS (
-- Get raw data
SELECT ...
),
aggregated AS (
-- Aggregate base data
SELECT ...
),
with_analytics AS (
-- Add analytical functions
SELECT ...
)
SELECT * FROM with_analytics;2. Filter Early, Aggregate Late
Reduce data volume before applying window functions:
-- Good: Filter in CTE, then apply window functions
WITH recent_orders AS (
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
customer_id,
SUM(total_amount) OVER (PARTITION BY customer_id) AS customer_total
FROM recent_orders;3. Consider Materialized Views for Expensive Patterns
If you run the same analytical pattern frequently, create a materialized view:
CREATE MATERIALIZED VIEW daily_revenue_summary AS
SELECT
DATE(order_date) AS sale_date,
SUM(total_amount) AS daily_revenue,
COUNT(*) AS order_count,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY DATE(order_date);
-- Refresh periodically
REFRESH MATERIALIZED VIEW daily_revenue_summary;Key Takeaways
- Running totals use SUM() OVER (ORDER BY ...) to calculate cumulative values, essential for year-to-date metrics
- Moving averages smooth data using ROWS BETWEEN N PRECEDING AND CURRENT ROW for trend analysis
- Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) identify top performers and create distribution segments
- Period comparisons use LAG/LEAD to calculate month-over-month, year-over-year, and other time-based changes
- ABC analysis categorizes items by cumulative contribution, helping prioritize high-value items
- Sessionization groups events based on time gaps using LAG and running sums
- Gap and island detection finds consecutive sequences by subtracting ROW_NUMBER from dates or values
- FIRST_VALUE/LAST_VALUE track beginning and end states within partitions (remember UNBOUNDED FOLLOWING for LAST_VALUE!)
- Pivot patterns use CASE WHEN to transform row data into column comparisons
- Optimize patterns with CTEs, early filtering, and materialized views for frequently-run queries
These analytical patterns form the foundation of most SQL analytics work. Master them, and you'll be able to answer almost any business question with SQL.
Practice Exercises
-
7-Day Moving Average with Trend: Calculate a 7-day moving average of daily order counts and include a column showing whether the trend is "Increasing", "Decreasing", or "Stable" compared to the previous day's moving average.
-
Customer Segmentation with RFM: Combine the RFM pattern with NTILE to create customer segments. Calculate recency, frequency, and monetary scores (1-5), then create a combined RFM score and classify customers into segments.
-
Sequential Pattern Analysis: Find all customers who made purchases on 3 or more consecutive days. Show the customer ID, streak start date, streak end date, and total amount spent during the streak.
-
Year-over-Year Growth by Category: Create a pivot table showing year-over-year revenue growth percentage for each product category by month. Include columns for each category and a total column.
-
Session Quality Scoring: Using the sessionization pattern, calculate metrics for each session including duration, event count, and conversion (did the session end with a purchase?). Rank sessions by quality score (combination of duration and conversion).
-
Percentile-Based Outlier Detection: Identify orders that fall outside the 5th and 95th percentile by order amount. For each outlier, show how many standard deviations it is from the mean.
-
Cohort Retention with Running Retention: Build a cohort analysis that shows not just each month's retention rate, but also a running retention rate (customers still active in month N or any month after).