Understanding OHLCV Data Structure
Master the OHLCV format, timeframe aggregation, and how to extract information from candlestick data
Introduction
OHLCV (Open, High, Low, Close, Volume) is the standard format for representing price data across timeframes. Understanding OHLCV structure is essential for implementing technical indicators and strategies.
This lesson covers:
- What each OHLCV component represents
- How to interpret candlestick data
- Timeframe aggregation and resampling
- Information density in OHLCV bars
- Working with OHLCV data in pandas
OHLCV Components Explained
Each bar (candlestick) in a chart contains five data points:
Field | Description | Trading Significance |
|---|---|---|
| Open | First traded price in the period | Opening auction price discovery |
| High | Highest traded price in the period | Maximum buying pressure reached |
| Low | Lowest traded price in the period | Maximum selling pressure reached |
| Close | Last traded price in the period | Final consensus value; most important for analysis |
| Volume | Total shares/contracts traded | Participation level; confirms price moves |
import yfinance as yf
import pandas as pd
# Download OHLCV data for Apple
aapl = yf.download('AAPL', start='2024-01-01', end='2024-01-10', progress=False)
# Display first 5 days
print("AAPL OHLCV Data (First 5 days of 2024):")
print(aapl.head())Interpreting a Single Bar
Example: 2024-01-02
- Open: $187.15 - Trading began at this price
- High: $187.74 - Buyers pushed price to this maximum
- Low: $184.35 - Sellers pushed price to this minimum
- Close: $185.64 - Day ended here (net bearish, closed below open)
- Volume: 82.3M shares - Heavy participation (above average)
Range = High - Low = 184.35 = $3.39 (1.82% range)
This shows significant intraday volatility - buyers and sellers fought for control, with sellers winning (close < open).
The Close Price: Most Important for Analysis
While all OHLCV components matter, Close is the most important for technical analysis:
Why Close Matters Most
- Official settlement price: Determines end-of-day portfolio valuations
- Market consensus: Represents final agreement on fair value for that period
- Least manipulable: Open can be gapped by overnight news; high/low are single prints; close requires sustained pressure
- Most liquid: Highest volume typically occurs near market close (closing auction)
Most indicators use Close prices: Moving averages, RSI, MACD all typically calculate using close prices.
# Calculate simple moving average using Close
aapl['SMA_20'] = aapl['Close'].rolling(window=20).mean()
# Compare using different price components
aapl['SMA_20_High'] = aapl['High'].rolling(window=20).mean()
aapl['SMA_20_Low'] = aapl['Low'].rolling(window=20).mean()
print("Moving averages using different OHLC components:")
print(aapl[['Close', 'SMA_20', 'SMA_20_High', 'SMA_20_Low']].tail())Standard practice is to use Close for indicators unless you have a specific reason to use another price (e.g., using High/Low for volatility measures like ATR).
High-Low Range: Measuring Volatility
The High-Low range encapsulates intraday volatility and is crucial for volatility-based indicators.
Average True Range (ATR) Foundation
ATR (covered in Module 8) uses the high-low range to measure "true" volatility including gaps.
# Calculate daily range and range %
aapl['Range'] = aapl['High'] - aapl['Low']
aapl['Range_Pct'] = (aapl['Range'] / aapl['Close']) * 100
print("Daily Price Ranges:")
print(aapl[['High', 'Low', 'Range', 'Range_Pct']].head(10))
# Average range
print(f"\nAverage daily range: ${aapl['Range'].mean():.2f}")
print(f"Average range %: {aapl['Range_Pct'].mean():.2f}%")A typical AAPL day has a 1.58% range (high to low). Days with >3% range indicate high volatility (news, earnings, market events).
Volume: The Confirmation Indicator
Volume measures participation. High volume confirms price moves; low volume suggests weak conviction.
Volume Analysis Principles
- Rising price + Rising volume = Strong uptrend (buyers stepping in aggressively)
- Rising price + Falling volume = Weak uptrend (may be exhaustion)
- Falling price + Rising volume = Strong downtrend (sellers aggressive)
- Falling price + Falling volume = Weak downtrend (may be capitulation)
# Calculate volume moving average to identify high/low volume days
aapl['Volume_MA'] = aapl['Volume'].rolling(window=20).mean()
aapl['Volume_Ratio'] = aapl['Volume'] / aapl['Volume_MA']
# Calculate daily returns
aapl['Return'] = aapl['Close'].pct_change() * 100
# Identify high volume days
high_volume_days = aapl[aapl['Volume_Ratio'] > 1.5].copy()
print("High Volume Days (>1.5x average):")
print(high_volume_days[['Close', 'Return', 'Volume', 'Volume_Ratio']].head(10))Pro tip: Use volume as a confirmation filter. If your strategy signals a buy on a breakout, confirm with volume >1.5x average. Low-volume breakouts often fail (false breakouts).
Timeframe Aggregation and Resampling
OHLCV data exists at multiple timeframes: 1-minute, 5-minute, 1-hour, daily, weekly, monthly. Understanding how to aggregate is crucial.
How Aggregation Works
When converting from lower timeframe to higher:
- Open: First open of the period
- High: Maximum high across all bars
- Low: Minimum low across all bars
- Close: Last close of the period
- Volume: Sum of all volumes
# Download daily data
daily = yf.download('AAPL', start='2023-01-01', end='2024-01-01', progress=False)
# Resample to weekly data
weekly = daily.resample('W').agg({
'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last',
'Volume': 'sum'
})
print("Daily data (last 5 days):")
print(daily[['Open', 'High', 'Low', 'Close', 'Volume']].tail())
print("\nWeekly data (last 5 weeks):")
print(weekly.tail())Notice how weekly bars aggregate the data:
- Week of 2023-12-31: Open from Monday (193.61), High across entire week (194.66), Low across week (191.09), Close on Friday (192.53)
- Volume: Sum of all daily volumes = 153.7M shares for the week
Information Density in Different Timeframes
Higher timeframes smooth out noise but sacrifice timeliness. Lower timeframes offer more data points but contain more noise.
Timeframe | Bars/Year | Signal/Noise | Use Case |
|---|---|---|---|
| 1-minute | ~78,000 | Very low | HFT, scalping (difficult) |
| 5-minute | ~15,600 | Low | Day trading |
| 1-hour | ~1,950 | Medium | Swing trading entry timing |
| Daily | ~252 | Medium-High | Swing trading, position trading |
| Weekly | ~52 | High | Position trading, long-term trends |
| Monthly | ~12 | Very high | Long-term regime analysis |
Multi-Timeframe Analysis
Professional traders use multiple timeframes:
- Higher timeframe (weekly/daily): Identify overall trend and key levels
- Lower timeframe (4H/1H): Time precise entries and exits
Example: Weekly chart shows strong uptrend → use daily chart to buy pullbacks to support.
Golden rule: Your trading timeframe should be 1-2 steps below your analysis timeframe. If analyzing daily charts, trade on 4H or 1H for entries. This prevents whipsaws from noise.
Practical OHLCV Operations in Pandas
Let's see common operations you'll use throughout this course:
import pandas as pd
import numpy as np
# Create sample OHLCV data
data = yf.download('AAPL', start='2023-01-01', end='2024-01-01', progress=False)
# 1. Calculate typical price (average of H, L, C)
data['Typical_Price'] = (data['High'] + data['Low'] + data['Close']) / 3
# 2. Calculate true range (accounts for gaps)
data['TR'] = np.maximum(
data['High'] - data['Low'],
np.maximum(
abs(data['High'] - data['Close'].shift(1)),
abs(data['Low'] - data['Close'].shift(1))
)
)
# 3. Body size (open-close) and wick sizes
data['Body'] = abs(data['Close'] - data['Open'])
data['Upper_Wick'] = data['High'] - np.maximum(data['Open'], data['Close'])
data['Lower_Wick'] = np.minimum(data['Open'], data['Close']) - data['Low']
# 4. Bullish/Bearish classification
data['Bullish'] = (data['Close'] > data['Open']).astype(int)
print("OHLCV Derived Features:")
print(data[['Open', 'High', 'Low', 'Close', 'Body', 'Upper_Wick', 'Lower_Wick', 'Bullish']].tail())These derived features form the basis for many indicators and candlestick pattern recognition algorithms.
Summary
Key Takeaways
- OHLCV format is standard: Open, High, Low, Close, Volume for each time period
- Close is most important for indicators and analysis - represents final consensus
- High-Low range measures intraday volatility
- Volume confirms price moves - high volume validates trends
- Timeframe aggregation combines lower bars into higher bars using first open, max high, min low, last close, sum volume
- Signal-to-noise ratio increases with higher timeframes at the cost of timeliness
Next Steps
Now that you understand OHLCV structure, the next lesson covers fetching market data with yfinance: how to download real data from Yahoo Finance and prepare it for analysis in Python.