Kekkei - Advancing Education For Everyone

Analytical SQL Patterns

As you work with SQL for analytics and business intelligence, you'll discover that certain query patterns appear repeatedly. Understanding these common analytical patterns will help you write queries more efficiently and solve business problems faster.

In this lesson, we'll explore the most important SQL patterns used in analytics, from running totals and moving averages to ranking, percentiles, and period comparisons. These patterns form the foundation of most analytical work.

Why Learn Analytical Patterns?

Analytical patterns provide:

Reusable solutions to common business questions
Best practices for performance and readability
Building blocks for complex analytical queries
Consistency across analytical work in your organization

Running Totals (Cumulative Sums)

A cumulative sum that adds each new value to the sum of all previous values. Also called a cumulative sum, it's essential for tracking cumulative metrics like year-to-date revenue.

Running totals are one of the most common analytical patterns, used for calculating year-to-date revenue, cumulative customer counts, and more.

Example: Year-to-Date Revenue

Calculate cumulative revenue for each day:

sql

WITH daily_revenue AS (
  SELECT
    DATE(order_date) AS sale_date,
    SUM(total_amount) AS daily_revenue
  FROM orders
  WHERE EXTRACT(YEAR FROM order_date) = 2025
  GROUP BY DATE(order_date)
)
SELECT
  sale_date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY sale_date) AS running_total,
  ROUND(AVG(daily_revenue) OVER (ORDER BY sale_date), 2) AS running_avg
FROM daily_revenue
ORDER BY sale_date
LIMIT 7;

The window frame for SUM() OVER (ORDER BY ...) defaults to "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", which creates a running total. You can explicitly specify this if needed for clarity.

Moving Averages

An average calculated over a sliding window of data points, used to smooth out short-term fluctuations and identify trends. Common examples include 7-day, 30-day, and 90-day moving averages.

Moving averages are essential for identifying trends and smoothing noisy data.

Example: 7-Day Moving Average

Calculate a 7-day moving average of daily sales:

sql

WITH daily_sales AS (
  SELECT
    DATE(order_date) AS sale_date,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
  FROM orders
  WHERE order_date >= '2025-03-01'
  GROUP BY DATE(order_date)
)
SELECT
  sale_date,
  order_count,
  revenue,
  ROUND(AVG(revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ), 2) AS moving_avg_7d,
  ROUND(AVG(revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ), 2) AS moving_avg_30d
FROM daily_sales
ORDER BY sale_date DESC
LIMIT 5;

Use ROWS BETWEEN for moving averages, not RANGE BETWEEN. ROWS gives you exactly N preceding rows regardless of ties, while RANGE can include all rows with the same ORDER BY value, which may give unexpected results.

Ranking and Percentiles

Ranking patterns help identify top performers, outliers, and distribution characteristics.

Example: Ranking Customers by Spending

Rank customers by total spending with ties handling:

sql

WITH customer_spending AS (
  SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
  FROM orders
  WHERE order_date >= CURRENT_DATE - INTERVAL '365 days'
  GROUP BY customer_id
)
SELECT
  customer_id,
  order_count,
  ROUND(total_spent, 2) AS total_spent,
  ROW_NUMBER() OVER (ORDER BY total_spent DESC) AS row_num,
  RANK() OVER (ORDER BY total_spent DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank,
  NTILE(10) OVER (ORDER BY total_spent DESC) AS decile
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 8;

Understanding the ranking functions:

ROW_NUMBER(): Assigns unique sequential numbers, even for ties (1, 2, 3, 4...)
RANK(): Same rank for ties, then skips numbers (1, 1, 3, 4...)
DENSE_RANK(): Same rank for ties, no gaps (1, 1, 2, 3...)
NTILE(N): Divides rows into N equal groups (useful for deciles, quartiles)

Example: Percentile Calculation

Find the 50th, 75th, and 90th percentile of order values:

sql

SELECT
  ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY total_amount), 2) AS median,
  ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount), 2) AS p75,
  ROUND(PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY total_amount), 2) AS p90,
  ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_amount), 2) AS p95,
  ROUND(AVG(total_amount), 2) AS average
FROM orders
WHERE order_date >= '2025-01-01';

PERCENTILE_CONT provides continuous interpolation, while PERCENTILE_DISC returns an actual value from the dataset. For most analytics use cases, PERCENTILE_CONT is preferred.

Period-over-Period Comparisons

Comparing metrics across time periods (month-over-month, year-over-year) is fundamental to understanding business performance.

Example: Month-over-Month Comparison

Compare each month's metrics to the previous month:

sql

WITH monthly_metrics AS (
  SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
  FROM orders
  WHERE order_date >= '2024-01-01'
  GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
  month,
  order_count,
  ROUND(revenue, 2) AS revenue,
  unique_customers,
  LAG(order_count) OVER (ORDER BY month) AS prev_month_orders,
  order_count - LAG(order_count) OVER (ORDER BY month) AS order_change,
  ROUND(100.0 * (order_count - LAG(order_count) OVER (ORDER BY month)) /
    LAG(order_count) OVER (ORDER BY month), 2) AS order_growth_pct,
  ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month)) /
    LAG(revenue) OVER (ORDER BY month), 2) AS revenue_growth_pct
FROM monthly_metrics
ORDER BY month DESC
LIMIT 6;

ABC Analysis (Pareto Analysis)

A categorization technique that divides items into three groups based on their importance: A items (top 20% that contribute ~80% of value), B items (next 30%), and C items (remaining 50%). Based on the Pareto Principle.

ABC analysis helps prioritize resources by identifying which products, customers, or categories contribute the most value.

Example: Product ABC Classification

Classify products based on their revenue contribution:

sql

WITH product_revenue AS (
  SELECT
    product_id,
    SUM(quantity * unit_price) AS total_revenue
  FROM order_items
  WHERE order_date >= CURRENT_DATE - INTERVAL '365 days'
  GROUP BY product_id
),
revenue_with_cumulative AS (
  SELECT
    product_id,
    total_revenue,
    SUM(total_revenue) OVER () AS total_all_products,
    SUM(total_revenue) OVER (ORDER BY total_revenue DESC) AS cumulative_revenue,
    ROUND(100.0 * SUM(total_revenue) OVER (ORDER BY total_revenue DESC) /
      SUM(total_revenue) OVER (), 2) AS cumulative_pct
  FROM product_revenue
)
SELECT
  product_id,
  ROUND(total_revenue, 2) AS revenue,
  cumulative_pct,
  CASE
    WHEN cumulative_pct <= 80 THEN 'A - High Value'
    WHEN cumulative_pct <= 95 THEN 'B - Medium Value'
    ELSE 'C - Low Value'
  END AS abc_category
FROM revenue_with_cumulative
ORDER BY total_revenue DESC
LIMIT 12;

This analysis shows that the top 8 products (A category) contribute 75.45% of total revenue, helping focus inventory and marketing efforts.

Sessionization Pattern

Sessionization groups user events into sessions based on time gaps between activities, essential for understanding user behavior and engagement.

Example: Creating User Sessions

Group user events into sessions with a 30-minute inactivity timeout:

sql

WITH events_with_gaps AS (
  SELECT
    user_id,
    event_time,
    event_type,
    LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event_time,
    CASE
      WHEN event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time)
        > INTERVAL '30 minutes'
      OR LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
      THEN 1
      ELSE 0
    END AS is_new_session
  FROM user_events
  WHERE event_time >= '2025-03-01'
),
sessions AS (
  SELECT
    user_id,
    event_time,
    event_type,
    SUM(is_new_session) OVER (
      PARTITION BY user_id
      ORDER BY event_time
    ) AS session_id
  FROM events_with_gaps
)
SELECT
  user_id,
  session_id,
  MIN(event_time) AS session_start,
  MAX(event_time) AS session_end,
  COUNT(*) AS event_count,
  EXTRACT(EPOCH FROM (MAX(event_time) - MIN(event_time)))/60 AS session_duration_minutes
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_id
LIMIT 8;

The sessionization pattern uses LAG to detect time gaps, marks new sessions with a flag, then uses SUM() OVER to create a running session counter. This is a powerful technique for any time-based grouping.

Gap and Island Detection

Gap and island detection identifies consecutive sequences (islands) and breaks (gaps) in data, useful for finding streaks, detecting outages, or analyzing continuous behavior.

Example: Finding Consecutive Active Days

Identify streaks of consecutive days where users were active:

sql

WITH daily_activity AS (
  SELECT DISTINCT
    user_id,
    DATE(activity_date) AS activity_date
  FROM user_activity
  WHERE activity_date >= '2025-03-01'
),
numbered_dates AS (
  SELECT
    user_id,
    activity_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) AS rn,
    activity_date - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) * INTERVAL '1 day') AS island_id
  FROM daily_activity
),
streaks AS (
  SELECT
    user_id,
    island_id,
    MIN(activity_date) AS streak_start,
    MAX(activity_date) AS streak_end,
    COUNT(*) AS streak_length
  FROM numbered_dates
  GROUP BY user_id, island_id
)
SELECT
  user_id,
  streak_start,
  streak_end,
  streak_length
FROM streaks
WHERE streak_length >= 3
ORDER BY streak_length DESC, user_id
LIMIT 8;

The trick here is subtracting ROW_NUMBER() from the date. Consecutive dates produce the same result (the "island_id"), making them easy to group.

First and Last Value Patterns

Tracking first and last occurrences helps understand customer acquisition, retention, and behavior changes.

Example: First and Last Purchase Analysis

Analyze each customer's first and most recent purchase:

sql

WITH customer_purchases AS (
  SELECT
    customer_id,
    order_date,
    total_amount,
    FIRST_VALUE(order_date) OVER (
      PARTITION BY customer_id
      ORDER BY order_date
    ) AS first_purchase_date,
    FIRST_VALUE(total_amount) OVER (
      PARTITION BY customer_id
      ORDER BY order_date
    ) AS first_purchase_amount,
    LAST_VALUE(order_date) OVER (
      PARTITION BY customer_id
      ORDER BY order_date
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_purchase_date,
    LAST_VALUE(total_amount) OVER (
      PARTITION BY customer_id
      ORDER BY order_date
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_purchase_amount,
    COUNT(*) OVER (PARTITION BY customer_id) AS total_orders
  FROM orders
)
SELECT DISTINCT
  customer_id,
  first_purchase_date,
  ROUND(first_purchase_amount, 2) AS first_amount,
  last_purchase_date,
  ROUND(last_purchase_amount, 2) AS last_amount,
  total_orders,
  last_purchase_date - first_purchase_date AS customer_lifespan_days,
  ROUND((last_purchase_amount - first_purchase_amount) / first_purchase_amount * 100, 2) AS amount_change_pct
FROM customer_purchases
WHERE total_orders >= 5
ORDER BY customer_lifespan_days DESC
LIMIT 6;

When using LAST_VALUE, you must specify "ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" or it will only consider rows up to the current row, not the actual last row in the partition.

Pivot Pattern (Rows to Columns)

Pivoting transforms row data into columns, useful for creating comparison tables and summary reports.

Example: Monthly Revenue by Product Category

Create a table showing monthly revenue for each category:

sql

WITH monthly_category_revenue AS (
  SELECT
    DATE_TRUNC('month', o.order_date) AS month,
    p.category,
    SUM(oi.quantity * oi.unit_price) AS revenue
  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  WHERE o.order_date >= '2025-01-01'
  GROUP BY DATE_TRUNC('month', o.order_date), p.category
)
SELECT
  month,
  ROUND(SUM(CASE WHEN category = 'Electronics' THEN revenue ELSE 0 END), 2) AS electronics,
  ROUND(SUM(CASE WHEN category = 'Clothing' THEN revenue ELSE 0 END), 2) AS clothing,
  ROUND(SUM(CASE WHEN category = 'Home' THEN revenue ELSE 0 END), 2) AS home,
  ROUND(SUM(CASE WHEN category = 'Sports' THEN revenue ELSE 0 END), 2) AS sports,
  ROUND(SUM(revenue), 2) AS total
FROM monthly_category_revenue
GROUP BY month
ORDER BY month;

Modern SQL databases like PostgreSQL support the CROSSTAB function for pivoting, but the CASE WHEN pattern shown here works across all databases and gives you more control over the output.

Common Analytical Mistakes

1. Incorrect Window Frame Defaults

Problem: Not understanding default window frames can produce unexpected results.

sql

-- Wrong: RANGE BETWEEN uses value ranges, not row counts
SELECT
  sale_date,
  revenue,
  AVG(revenue) OVER (ORDER BY sale_date) AS avg_revenue  -- Not a true moving average!
FROM daily_sales;

-- Correct: Use ROWS BETWEEN for moving averages
SELECT
  sale_date,
  revenue,
  AVG(revenue) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS moving_avg_7d
FROM daily_sales;

2. Forgetting UNBOUNDED FOLLOWING for LAST_VALUE

Problem: LAST_VALUE without proper frame specification only considers rows up to the current row.

sql

-- Wrong: Only gets the current row as "last"
SELECT
  customer_id,
  LAST_VALUE(order_date) OVER (
    PARTITION BY customer_id ORDER BY order_date
  ) AS last_order
FROM orders;

-- Correct: Specify full partition
SELECT
  customer_id,
  LAST_VALUE(order_date) OVER (
    PARTITION BY customer_id ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_order
FROM orders;

3. Not Handling Division by Zero

Problem: Growth percentage calculations can fail when the denominator is zero.

sql

-- Wrong: Division by zero error
SELECT
  month,
  (current_revenue - prev_revenue) / prev_revenue * 100 AS growth_pct
FROM monthly_metrics;

-- Correct: Use NULLIF to avoid division by zero
SELECT
  month,
  ROUND(100.0 * (current_revenue - prev_revenue) /
    NULLIF(prev_revenue, 0), 2) AS growth_pct
FROM monthly_metrics;

Performance Tips for Analytical Patterns

1. Use CTEs for Complex Patterns

Break complex patterns into readable, maintainable steps:

sql

-- Good: Clear, readable structure
WITH base_data AS (
  -- Get raw data
  SELECT ...
),
aggregated AS (
  -- Aggregate base data
  SELECT ...
),
with_analytics AS (
  -- Add analytical functions
  SELECT ...
)
SELECT * FROM with_analytics;

2. Filter Early, Aggregate Late

Reduce data volume before applying window functions:

sql

-- Good: Filter in CTE, then apply window functions
WITH recent_orders AS (
  SELECT customer_id, order_date, total_amount
  FROM orders
  WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
  customer_id,
  SUM(total_amount) OVER (PARTITION BY customer_id) AS customer_total
FROM recent_orders;

3. Consider Materialized Views for Expensive Patterns

If you run the same analytical pattern frequently, create a materialized view:

sql

CREATE MATERIALIZED VIEW daily_revenue_summary AS
SELECT
  DATE(order_date) AS sale_date,
  SUM(total_amount) AS daily_revenue,
  COUNT(*) AS order_count,
  COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY DATE(order_date);

-- Refresh periodically
REFRESH MATERIALIZED VIEW daily_revenue_summary;

Key Takeaways

Running totals use SUM() OVER (ORDER BY ...) to calculate cumulative values, essential for year-to-date metrics
Moving averages smooth data using ROWS BETWEEN N PRECEDING AND CURRENT ROW for trend analysis
Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) identify top performers and create distribution segments
Period comparisons use LAG/LEAD to calculate month-over-month, year-over-year, and other time-based changes
ABC analysis categorizes items by cumulative contribution, helping prioritize high-value items
Sessionization groups events based on time gaps using LAG and running sums
Gap and island detection finds consecutive sequences by subtracting ROW_NUMBER from dates or values
FIRST_VALUE/LAST_VALUE track beginning and end states within partitions (remember UNBOUNDED FOLLOWING for LAST_VALUE!)
Pivot patterns use CASE WHEN to transform row data into column comparisons
Optimize patterns with CTEs, early filtering, and materialized views for frequently-run queries

These analytical patterns form the foundation of most SQL analytics work. Master them, and you'll be able to answer almost any business question with SQL.

Practice Exercises

7-Day Moving Average with Trend: Calculate a 7-day moving average of daily order counts and include a column showing whether the trend is "Increasing", "Decreasing", or "Stable" compared to the previous day's moving average.
Customer Segmentation with RFM: Combine the RFM pattern with NTILE to create customer segments. Calculate recency, frequency, and monetary scores (1-5), then create a combined RFM score and classify customers into segments.
Sequential Pattern Analysis: Find all customers who made purchases on 3 or more consecutive days. Show the customer ID, streak start date, streak end date, and total amount spent during the streak.
Year-over-Year Growth by Category: Create a pivot table showing year-over-year revenue growth percentage for each product category by month. Include columns for each category and a total column.
Session Quality Scoring: Using the sessionization pattern, calculate metrics for each session including duration, event count, and conversion (did the session end with a purchase?). Rank sessions by quality score (combination of duration and conversion).
Percentile-Based Outlier Detection: Identify orders that fall outside the 5th and 95th percentile by order amount. For each outlier, show how many standard deviations it is from the mean.
Cohort Retention with Running Retention: Build a cohort analysis that shows not just each month's retention rate, but also a running retention rate (customers still active in month N or any month after).

Analytical SQL PatternsFocusStart Focus Mode