Kekkei - Advancing Financial Science For Everyone

Why Data Types Matter

Before you can analyze data, you need to know what kind of data you're looking at. This isn't a boring classification exercise — it fundamentally determines:

Which summary statistics make sense (you can't take the "average" of colors)
Which visualizations work (you don't make a histogram of names)
Which statistical tests are valid (different tests for different data types)

Get the data type wrong, and your entire analysis could be meaningless.

The Big Split: Categorical vs Numerical

All data falls into one of two broad families:

Data that represents categories or groups. It describes qualities or characteristics. You can count how many fall into each category, but you can't do meaningful arithmetic with the values.

Examples: blood type (A, B, AB, O), country of birth, favorite color, yes/no responses.

Data that represents measurable quantities. You can do arithmetic with these values — add them, average them, measure distances between them.

Examples: height (5'9"), temperature (72°F), income ($50,000), number of siblings (3).

Quick test: Can you meaningfully add two values together? If yes → numerical. If no → categorical. You can add two heights (meaningfully), but you can't add "blue" + "green."

Categorical Data: Nominal vs Ordinal

Not all categories are created equal. Some have a natural order, others don't.

Categories with no natural ordering. The categories are just labels — rearranging them doesn't lose any information.

Examples: blood type (A, B, AB, O), eye color, country, programming language.

Categories with a meaningful order, but the distances between categories aren't necessarily equal.

Examples: education level (high school < bachelor's < master's < PhD), satisfaction rating (1-5 stars), T-shirt sizes (S < M < L < XL).

The Ordinal Trap

A restaurant rating of ⭐⭐⭐⭐ (4 stars) vs ⭐⭐ (2 stars).

We know 4 stars is better than 2 stars. But is the difference between 2 and 4 stars the same as between 4 and 5 stars? Not necessarily! The jump from "bad" to "decent" might feel bigger than from "great" to "excellent."

This is why you should be careful about computing averages of ordinal data. An "average rating of 3.7 stars" is common but technically debatable.

Numerical Data: Discrete vs Continuous

Numerical data splits further based on what values are possible:

Can only take specific, countable values (usually integers). There are gaps between possible values.

Examples: number of children (0, 1, 2, 3...), number of cars owned, goals scored in a match, number of emails received.

Can take any value within a range, including decimals. There are no gaps between possible values.

Examples: height (5.847 feet), weight (68.3 kg), temperature (98.6°F), time to complete a race (9.58 seconds).

Quick test: Can the value be 2.5? If the answer is "yes, that makes sense" → continuous. If "no, it must be a whole number" → discrete. You can't have 2.5 children, but you can weigh 2.5 kg.

The Complete Classification

Data Type Classification

Type	Subtype	Order?	Arithmetic?	Examples
Categorical	Nominal	❌ No	❌ No	Blood type, color, country
Categorical	Ordinal	✅ Yes	⚠️ Limited	Star ratings, education level
Numerical	Discrete	✅ Yes	✅ Yes	Count of items, dice rolls
Numerical	Continuous	✅ Yes	✅ Yes	Height, weight, temperature

Data Structures: How Data Is Organized

Beyond individual data types, data comes in different structural forms:

Data collected from multiple subjects at a single point in time. Like a snapshot.

Example: A survey of 500 people's incomes conducted in January 2024. You see many subjects, one time point.

Data collected from one subject across multiple points in time. Like a movie.

Example: Apple's stock price recorded daily for a year. You see one subject, many time points.

Panel data (also called longitudinal data) combines both: multiple subjects measured at multiple time points. For example, tracking 100 patients' blood pressure monthly for two years.

Common Pitfalls

1. Numbers that aren't numerical data

Phone numbers, zip codes, and jersey numbers are digits, but they're categorical (nominal). You wouldn't average zip codes or say jersey #99 is "more" than jersey #10 in a meaningful way.

2. Treating ordinal as continuous

Computing the "average" of a 1-5 Likert scale is extremely common but controversial. The mean of "Strongly Agree" and "Strongly Disagree" isn't "Neutral" in any meaningful sense.

3. Binning continuous data carelessly

Converting continuous data into categories (e.g., "young," "middle-aged," "old") throws away information. Sometimes necessary, but always costs you statistical power.

Remember: Just because something is stored as a number in a database doesn't mean it's numerical data. Always think about what the values represent, not how they're encoded.

Identifying Data Types in Practice

Dataset: Student Records

Consider a university student database with these columns:

| Column | Example Value | Data Type | |---|---|---| | Student ID | 20240001 | Categorical (Nominal) — it's just a label | | Name | "Priya Sharma" | Categorical (Nominal) | | GPA | 3.72 | Numerical (Continuous) | | Year | Freshman, Sophomore... | Categorical (Ordinal) | | Credits Completed | 45 | Numerical (Discrete) | | Major | "Computer Science" | Categorical (Nominal) | | Satisfaction Score | 4 out of 5 | Categorical (Ordinal) | | Height (cm) | 167.3 | Numerical (Continuous) |

Test your knowledge

🧠 Knowledge Check

1 / 4

Types of Data

Why Data Types Matter

The Big Split: Categorical vs Numerical

Categorical Data: Nominal vs Ordinal

Numerical Data: Discrete vs Continuous

The Complete Classification

Data Structures: How Data Is Organized

Common Pitfalls

Identifying Data Types in Practice

Test your knowledge

A zip code (e.g., 10001) is what type of data?

Why Data Types MatterFocusStart Focus Mode

The Big Split: Categorical vs NumericalFocusStart Focus Mode

Categorical Data: Nominal vs OrdinalFocusStart Focus Mode

Numerical Data: Discrete vs ContinuousFocusStart Focus Mode

The Complete ClassificationFocusStart Focus Mode

Data Structures: How Data Is OrganizedFocusStart Focus Mode

Common PitfallsFocusStart Focus Mode

Identifying Data Types in PracticeFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

A zip code (e.g., 10001) is what type of data?

Why Data Types Matter

The Big Split: Categorical vs Numerical

Categorical Data: Nominal vs Ordinal

Numerical Data: Discrete vs Continuous

The Complete Classification

Data Structures: How Data Is Organized

Common Pitfalls

Identifying Data Types in Practice

Test your knowledge