Understanding OHLCV Data Structure

Master the OHLCV format, timeframe aggregation, and how to extract information from candlestick data

20 min read
Beginner

Introduction

OHLCV (Open, High, Low, Close, Volume) is the standard format for representing price data across timeframes. Understanding OHLCV structure is essential for implementing technical indicators and strategies.

This lesson covers:

  • What each OHLCV component represents
  • How to interpret candlestick data
  • Timeframe aggregation and resampling
  • Information density in OHLCV bars
  • Working with OHLCV data in pandas

OHLCV Components Explained

Each bar (candlestick) in a chart contains five data points:

OHLCV Data Structure
Field
Description
Trading Significance
OpenFirst traded price in the periodOpening auction price discovery
HighHighest traded price in the periodMaximum buying pressure reached
LowLowest traded price in the periodMaximum selling pressure reached
CloseLast traded price in the periodFinal consensus value; most important for analysis
VolumeTotal shares/contracts tradedParticipation level; confirms price moves
python
import yfinance as yf
import pandas as pd

# Download OHLCV data for Apple
aapl = yf.download('AAPL', start='2024-01-01', end='2024-01-10', progress=False)

# Display first 5 days
print("AAPL OHLCV Data (First 5 days of 2024):")
print(aapl.head())

Interpreting a Single Bar

Example: 2024-01-02

  • Open: $187.15 - Trading began at this price
  • High: $187.74 - Buyers pushed price to this maximum
  • Low: $184.35 - Sellers pushed price to this minimum
  • Close: $185.64 - Day ended here (net bearish, closed below open)
  • Volume: 82.3M shares - Heavy participation (above average)

Range = High - Low = 187.74187.74 - 184.35 = $3.39 (1.82% range)

This shows significant intraday volatility - buyers and sellers fought for control, with sellers winning (close < open).

The Close Price: Most Important for Analysis

While all OHLCV components matter, Close is the most important for technical analysis:

Why Close Matters Most

  1. Official settlement price: Determines end-of-day portfolio valuations
  2. Market consensus: Represents final agreement on fair value for that period
  3. Least manipulable: Open can be gapped by overnight news; high/low are single prints; close requires sustained pressure
  4. Most liquid: Highest volume typically occurs near market close (closing auction)

Most indicators use Close prices: Moving averages, RSI, MACD all typically calculate using close prices.

python
# Calculate simple moving average using Close
aapl['SMA_20'] = aapl['Close'].rolling(window=20).mean()

# Compare using different price components
aapl['SMA_20_High'] = aapl['High'].rolling(window=20).mean()
aapl['SMA_20_Low'] = aapl['Low'].rolling(window=20).mean()

print("Moving averages using different OHLC components:")
print(aapl[['Close', 'SMA_20', 'SMA_20_High', 'SMA_20_Low']].tail())

Standard practice is to use Close for indicators unless you have a specific reason to use another price (e.g., using High/Low for volatility measures like ATR).

High-Low Range: Measuring Volatility

The High-Low range encapsulates intraday volatility and is crucial for volatility-based indicators.

Average True Range (ATR) Foundation

ATR (covered in Module 8) uses the high-low range to measure "true" volatility including gaps.

python
# Calculate daily range and range %
aapl['Range'] = aapl['High'] - aapl['Low']
aapl['Range_Pct'] = (aapl['Range'] / aapl['Close']) * 100

print("Daily Price Ranges:")
print(aapl[['High', 'Low', 'Range', 'Range_Pct']].head(10))

# Average range
print(f"\nAverage daily range: ${aapl['Range'].mean():.2f}")
print(f"Average range %: {aapl['Range_Pct'].mean():.2f}%")

A typical AAPL day has a 1.58% range (high to low). Days with >3% range indicate high volatility (news, earnings, market events).

Volume: The Confirmation Indicator

Volume measures participation. High volume confirms price moves; low volume suggests weak conviction.

Volume Analysis Principles

  1. Rising price + Rising volume = Strong uptrend (buyers stepping in aggressively)
  2. Rising price + Falling volume = Weak uptrend (may be exhaustion)
  3. Falling price + Rising volume = Strong downtrend (sellers aggressive)
  4. Falling price + Falling volume = Weak downtrend (may be capitulation)
python
# Calculate volume moving average to identify high/low volume days
aapl['Volume_MA'] = aapl['Volume'].rolling(window=20).mean()
aapl['Volume_Ratio'] = aapl['Volume'] / aapl['Volume_MA']

# Calculate daily returns
aapl['Return'] = aapl['Close'].pct_change() * 100

# Identify high volume days
high_volume_days = aapl[aapl['Volume_Ratio'] > 1.5].copy()

print("High Volume Days (>1.5x average):")
print(high_volume_days[['Close', 'Return', 'Volume', 'Volume_Ratio']].head(10))

Pro tip: Use volume as a confirmation filter. If your strategy signals a buy on a breakout, confirm with volume >1.5x average. Low-volume breakouts often fail (false breakouts).

Timeframe Aggregation and Resampling

OHLCV data exists at multiple timeframes: 1-minute, 5-minute, 1-hour, daily, weekly, monthly. Understanding how to aggregate is crucial.

How Aggregation Works

When converting from lower timeframe to higher:

  • Open: First open of the period
  • High: Maximum high across all bars
  • Low: Minimum low across all bars
  • Close: Last close of the period
  • Volume: Sum of all volumes
python
# Download daily data
daily = yf.download('AAPL', start='2023-01-01', end='2024-01-01', progress=False)

# Resample to weekly data
weekly = daily.resample('W').agg({
    'Open': 'first',
    'High': 'max',
    'Low': 'min',
    'Close': 'last',
    'Volume': 'sum'
})

print("Daily data (last 5 days):")
print(daily[['Open', 'High', 'Low', 'Close', 'Volume']].tail())

print("\nWeekly data (last 5 weeks):")
print(weekly.tail())

Notice how weekly bars aggregate the data:

  • Week of 2023-12-31: Open from Monday (193.61), High across entire week (194.66), Low across week (191.09), Close on Friday (192.53)
  • Volume: Sum of all daily volumes = 153.7M shares for the week

Information Density in Different Timeframes

Higher timeframes smooth out noise but sacrifice timeliness. Lower timeframes offer more data points but contain more noise.

Timeframe Tradeoffs
Timeframe
Bars/Year
Signal/Noise
Use Case
1-minute~78,000Very lowHFT, scalping (difficult)
5-minute~15,600LowDay trading
1-hour~1,950MediumSwing trading entry timing
Daily~252Medium-HighSwing trading, position trading
Weekly~52HighPosition trading, long-term trends
Monthly~12Very highLong-term regime analysis

Multi-Timeframe Analysis

Professional traders use multiple timeframes:

  1. Higher timeframe (weekly/daily): Identify overall trend and key levels
  2. Lower timeframe (4H/1H): Time precise entries and exits

Example: Weekly chart shows strong uptrend → use daily chart to buy pullbacks to support.

Golden rule: Your trading timeframe should be 1-2 steps below your analysis timeframe. If analyzing daily charts, trade on 4H or 1H for entries. This prevents whipsaws from noise.

Practical OHLCV Operations in Pandas

Let's see common operations you'll use throughout this course:

python
import pandas as pd
import numpy as np

# Create sample OHLCV data
data = yf.download('AAPL', start='2023-01-01', end='2024-01-01', progress=False)

# 1. Calculate typical price (average of H, L, C)
data['Typical_Price'] = (data['High'] + data['Low'] + data['Close']) / 3

# 2. Calculate true range (accounts for gaps)
data['TR'] = np.maximum(
    data['High'] - data['Low'],
    np.maximum(
        abs(data['High'] - data['Close'].shift(1)),
        abs(data['Low'] - data['Close'].shift(1))
    )
)

# 3. Body size (open-close) and wick sizes
data['Body'] = abs(data['Close'] - data['Open'])
data['Upper_Wick'] = data['High'] - np.maximum(data['Open'], data['Close'])
data['Lower_Wick'] = np.minimum(data['Open'], data['Close']) - data['Low']

# 4. Bullish/Bearish classification
data['Bullish'] = (data['Close'] > data['Open']).astype(int)

print("OHLCV Derived Features:")
print(data[['Open', 'High', 'Low', 'Close', 'Body', 'Upper_Wick', 'Lower_Wick', 'Bullish']].tail())

These derived features form the basis for many indicators and candlestick pattern recognition algorithms.

Summary

Key Takeaways

  1. OHLCV format is standard: Open, High, Low, Close, Volume for each time period
  2. Close is most important for indicators and analysis - represents final consensus
  3. High-Low range measures intraday volatility
  4. Volume confirms price moves - high volume validates trends
  5. Timeframe aggregation combines lower bars into higher bars using first open, max high, min low, last close, sum volume
  6. Signal-to-noise ratio increases with higher timeframes at the cost of timeliness

Next Steps

Now that you understand OHLCV structure, the next lesson covers fetching market data with yfinance: how to download real data from Yahoo Finance and prepare it for analysis in Python.