Data Visualization

Master histograms, bar charts, box plots, and scatter plots. Learn to read distribution shapes and spot misleading charts.

18 min read
Beginner

Why Visualize Data?

Numbers alone can deceive. Consider Anscombe's Quartet — four datasets that have nearly identical summary statistics (same mean, same variance, same correlation, same regression line) but look completely different when plotted.

This is the power of visualization: it reveals patterns, outliers, clusters, and relationships that summary statistics hide. A single chart can tell you more than a page of numbers.

Rule of thumb: Always plot your data before computing anything. The shape of the data should guide which statistics you use.

Histograms: The Shape of Your Data

A chart that divides continuous data into bins (ranges) and shows how many data points fall into each bin using bars. The height of each bar represents the frequency or count.

What histograms reveal:

  • Shape of the distribution (symmetric, skewed, bimodal)
  • Center — where most values cluster
  • Spread — how wide the distribution is
  • Outliers — isolated bars far from the rest

Key decisions when making histograms:

  • Bin width matters enormously. Too few bins: you lose detail. Too many bins: you see noise instead of pattern. There's no single right answer — experiment!
  • Bars should touch each other (no gaps) because the data is continuous.
Reading Distribution Shapes

Symmetric (bell-shaped): Heights of adults, IQ scores → Mean ≈ Median, data clusters evenly around the center

Right-skewed (long tail to the right): Income, house prices, hospital stays → Mean > Median, most values are low with a few very high ones

Left-skewed (long tail to the left): Age at retirement, easy exam scores → Mean < Median, most values are high with a few very low ones

Bimodal (two peaks): Heights of all adults (men + women mixed), arrival times at a restaurant (lunch + dinner rush) → Suggests two distinct groups in your data

Uniform (flat): Rolling a fair die, random number generator → All values roughly equally likely

Bar Charts: Comparing Categories

Displays categorical data using rectangular bars. The height (or length) of each bar represents the count or value for that category. Unlike histograms, bars don't touch — categories are distinct.

Histogram vs Bar Chart:

  • Histogram: Continuous data, bars touch, order matters (x-axis is numerical)
  • Bar chart: Categorical data, bars don't touch, order is often arbitrary

Variations:

  • Grouped bar chart: Compare multiple groups side by side
  • Stacked bar chart: Show composition within categories
  • Horizontal bar chart: Better when category labels are long

Box Plots: The Five-Number Summary

A compact visualization that shows five key statistics:

  1. Minimum (excluding outliers)
  2. Q1 (25th percentile) — bottom of the box
  3. Median (Q2) — line inside the box
  4. Q3 (75th percentile) — top of the box
  5. Maximum (excluding outliers)

Individual dots beyond the whiskers represent outliers.

What box plots reveal:

  • Center: Where the median line sits
  • Spread: The width of the box (IQR)
  • Skewness: If the median line isn't centered in the box, the data is skewed
  • Outliers: Individual points plotted beyond the whiskers

Box plots shine when comparing groups. Placing box plots side by side instantly shows differences in center, spread, and skewness across categories.

Reading a Box Plot

Imagine a box plot of exam scores:

  • The box stretches from 65 (Q1) to 85 (Q3)
  • The median line is at 78
  • Whiskers extend from 45 to 98
  • Two dots appear at 15 and 20

Interpretation:

  • The middle 50% of students scored between 65 and 85
  • The typical score (median) was 78
  • The median is slightly closer to Q1, suggesting slight right skew
  • Most scores range from 45 to 98
  • Two students scored unusually low (15, 20) — potential outliers worth investigating

Scatter Plots: Relationships Between Variables

Plots two numerical variables against each other, with each data point as a dot. The x-axis shows one variable, the y-axis shows the other.

What scatter plots reveal:

  • Direction: Positive trend (up-right), negative trend (down-right), or no trend
  • Strength: How tightly points cluster around a pattern
  • Form: Linear, curved, or no clear pattern
  • Outliers: Points that don't fit the overall pattern

Scatter plots are your go-to tool for exploring relationships before computing correlations or fitting regression models.

Misleading Charts: How Visualization Lies

Charts can manipulate perception just as effectively as they can inform. Here are the most common tricks:

1. Truncated Y-axis Starting the y-axis at a value other than zero can make small differences look enormous.

A bar chart showing approval ratings of 51% vs 49% looks nearly identical with a full 0-100% axis. But zoom into 48-52%? Suddenly one bar towers over the other.

2. Manipulated aspect ratio Stretching a chart vertically exaggerates trends. Compressing it horizontally flattens them. The same data can look like a crisis or a plateau.

3. Cherry-picked time ranges A stock that's down 5% today but up 200% over 5 years. Show only today? Panic. Show 5 years? Celebration. The chosen window controls the narrative.

4. 3D charts Three-dimensional pie charts and bar charts distort proportions due to perspective. A slice in front looks bigger than an equal slice in back. Never use 3D charts.

5. Dual axes Plotting two different variables with different scales on the same chart can imply a relationship that doesn't exist (or hide one that does).

Whenever you see a chart, ask: What does the y-axis start at? What time range is shown? Are the scales comparable? Is anything being hidden? These questions protect you from visual manipulation.

Choosing the Right Visualization

Chart Selection Guide
Your Goal
Data Type
Best Chart
Distribution of one variableContinuousHistogram
Compare categoriesCategoricalBar chart
Five-number summaryContinuousBox plot
Compare groups' distributionsContinuous + CategoricalSide-by-side box plots
Relationship between two variablesTwo continuousScatter plot
Trend over timeTime seriesLine chart
Part-to-whole compositionCategorical (few groups)Pie chart (or better: stacked bar)

Pro tip: The best visualization is the one that answers your question most clearly. Fancy charts that confuse your audience are worse than simple charts that communicate effectively.

Test your knowledge

🧠 Knowledge Check
1 / 3

What is the key difference between a histogram and a bar chart?