Kekkei - Advancing Financial Science For Everyone

Putting It All Together

You've learned the theory. Now it's time to apply it.

This capstone lesson presents real-world statistical challenges where you'll integrate everything you've learned:

Descriptive statistics and visualization
Probability and distributions
Hypothesis testing and confidence intervals
Correlation, regression, and causation
Critical thinking and fallacy detection

Choose one or more projects below. Work through them systematically, documenting your reasoning.

What I cannot create, I do not understand.
— Richard Feynman

Project 1: Analyze a Real Dataset

Goal: Explore a dataset, summarize findings, and make evidence-based conclusions.

Data sources:

Kaggle datasets (kaggle.com/datasets)
UCI Machine Learning Repository
data.gov (government data)
FiveThirtyEight data (github.com/fivethirtyeight/data)
Your own data (work, hobby, research)

Required steps:

1. Data Exploration

How many observations? Variables?
What types of data? (categorical, numerical, time series)
Missing values? Outliers?
Create histograms, box plots, scatter plots

2. Descriptive Statistics

Central tendency: mean, median, mode
Spread: standard deviation, IQR
Check distribution shape: normal? skewed?

3. Research Question

Formulate a specific, testable question
Example: "Do SAT scores differ by geographic region?"

4. Statistical Analysis

Choose appropriate test (t-test, correlation, regression, chi-square)
Check assumptions (normality, independence, sample size)
Calculate test statistic and p-value
Construct confidence intervals

5. Interpretation

What do the results mean in plain English?
Statistical vs practical significance?
Limitations and confounders?
Alternative explanations?

6. Visualization

Create publication-quality graphs
Tell a story with data
Avoid misleading visualizations

7. Write-up

Introduction (question and why it matters)
Methods (data source, sample size, tests used)
Results (numbers, tables, figures)
Discussion (interpretation, limitations, future directions)

Bonus: Share your analysis as a blog post, report, or presentation. Explaining to others solidifies understanding.

Project 2: Design and Analyze an A/B Test

Goal: Design an experiment, collect data (or simulate), and analyze results.

Scenario: You run a website and want to increase signups.

Steps:

1. Hypothesis "Changing the signup button from blue to green will increase signup rate."

2. Design

Control: Blue button
Treatment: Green button
Metric: Signup rate (%)
Randomization: 50/50 split of visitors

3. Sample Size Calculation

Current signup rate: 5%
Minimum detectable effect: +1 percentage point (to 6%)
Significance level: α = 0.05
Power: 0.80
Calculate required sample size using formulas or online calculator

4. Data Collection (can simulate)

Generate simulated data with realistic effects
Include some noise and variability

5. Analysis

Calculate signup rates for both groups
Two-proportion Z-test
Confidence interval for difference
Effect size and practical significance

6. Conclusion

Is the result statistically significant?
Is the improvement worth implementing?
Cost-benefit analysis

7. Report

Present to stakeholders (pretend)
Recommendation: Roll out green button? More testing needed?

Extensions:

Multivariate test (test multiple changes simultaneously)
Sequential testing (analyze as data arrives)
Account for multiple testing if running many tests

Project 3: Debunk a Misleading Claim

Goal: Find a statistical claim in the wild and critically evaluate it.

Sources:

News articles
Social media
Advertisements
Political claims
Health/wellness products

Required analysis:

1. Identify the Claim "Drinking green tea burns 500 extra calories per day!"

2. Find the Source

Original study? Or just marketing?
Sample size? Study design?
Peer-reviewed? Replicated?

3. Critical Questions

Is this correlation or causation?
What's the baseline / control group?
Relative vs absolute effect?
Cherry picking? Publication bias?
Funding source / conflicts of interest?
Sample representative?

4. Alternative Explanations

Confounding variables?
Reverse causation?
Measurement error?
Regression to the mean?

5. Calculate True Effect

If they report relative risk, find absolute risk
If they say "statistically significant," find effect size
Compare claimed effect to plausible reality

6. Write-up

Original claim (with source)
Your analysis (with evidence)
Conclusion: True? Exaggerated? False?
Corrected interpretation

Examples to consider:

Diet / supplement claims
Political polls (methodology and interpretation)
Vaccine/health scares
Financial advice ("this strategy beats the market")
Product effectiveness claims

Project 4: Build a Statistical Report

Goal: Analyze a business problem using statistics and present actionable insights.

Sample scenarios:

Scenario A: Customer Retention

Problem: Customer churn increasing
Data: Customer demographics, usage patterns, churn status
Analysis: What factors predict churn? (logistic regression)
Recommendation: Target interventions for high-risk customers

Scenario B: Pricing Optimization

Problem: What price maximizes revenue?
Data: Historical sales at different prices
Analysis: Price elasticity, demand curves, confidence intervals
Recommendation: Optimal price range

Scenario C: Quality Control

Problem: Defect rate seems high
Data: Defect counts over time
Analysis: Control charts, hypothesis tests vs target rate
Recommendation: Process changes needed?

Required components:

1. Executive Summary

Problem statement
Key findings (2-3 bullet points)
Recommendation (actionable)

2. Data & Methods

Data sources and sample size
Variables analyzed
Statistical methods used
Assumptions and limitations

3. Results

Tables and visualizations
Statistical tests with interpretation
Confidence intervals
Sensitivity analysis

4. Discussion

Practical significance
Limitations and caveats
Risks and uncertainty
Implementation considerations

5. Recommendations

Clear, actionable next steps
Expected impact (with uncertainty ranges)
Monitoring plan

Goal: Write for non-statistical stakeholders. Avoid jargon. Focus on business impact.

Project 5: Statistical Simulation

Goal: Use simulation to understand a statistical concept or solve a problem.

Option A: Central Limit Theorem

Start with a non-normal distribution (exponential, uniform, etc.)
Sample from it repeatedly, calculate means
Plot distribution of means
Show how it becomes normal as n increases
Demonstrate E[X̄] = μ and SE = σ/√n

Option B: Bootstrap Confidence Intervals

Take a sample of data
Bootstrap resample 10,000 times
Calculate statistic (mean, median, correlation) for each
Construct 95% CI using percentile method
Compare to theoretical CI

Option C: Power Analysis

Simulate studies with varying sample sizes
Generate data under H₁ (effect exists)
Test H₀ (no effect)
Calculate proportion of times p < 0.05 (power)
Show how power increases with n and effect size

Option D: P-Hacking Demonstration

Simulate 20 studies testing a null effect
Show that ~1 will have p < 0.05 by chance
Demonstrate dangers of selective reporting

Option E: Type I vs Type II Errors

Simulate data under H₀ and H₁
Vary significance level (α) and sample size
Calculate Type I error rate (false positives)
Calculate Type II error rate (false negatives)
Show the tradeoff

Tools: Python (NumPy, SciPy, Matplotlib), R, or even Excel with random number generation.

Deliverable: Annotated code + visualizations + explanation of what you learned.

Evaluation Rubric

What Makes a Great Capstone?

Dimension	Excellent	Poor
Statistical Rigor	Appropriate methods, checks assumptions, acknowledges limitations	Wrong test, ignores violations, overconfident
Critical Thinking	Questions assumptions, considers alternatives, separates correlation from causation	Takes data at face value, jumps to conclusions
Communication	Clear explanations, effective visualizations, tells a story	Jargon-heavy, confusing graphs, no narrative
Practical Relevance	Actionable insights, considers real-world constraints	Purely academic, impractical recommendations
Honesty	Reports uncertainty, acknowledges what can't be concluded	Hides limitations, overstates certainty

Next Steps After This Course

You've built a solid foundation in statistics. Where to go from here?

Deepen your knowledge:

Bayesian statistics: Probability as degree of belief
Machine learning: Prediction and pattern recognition
Causal inference: Going beyond correlation
Time series forecasting: ARIMA, exponential smoothing
Survey design and sampling: How to collect data properly
Experimental design: Factorial designs, blocking, interactions

Practice continuously:

Analyze datasets regularly (Kaggle, personal projects)
Read research papers critically
Follow data science blogs and journals
Participate in competitions (Kaggle, DrivenData)

Learn tools:

Python: pandas, NumPy, SciPy, statsmodels, scikit-learn
R: tidyverse, ggplot2, statistical tests
SQL: Data extraction and manipulation
Visualization: Tableau, matplotlib, seaborn, ggplot2

Teach others:

Explain concepts to reinforce understanding
Write blog posts or tutorials
Help peers with statistics questions

Stay skeptical:

Question statistical claims
Look for fallacies in the wild
Demand evidence and good methodology
Update beliefs based on evidence

The journey continues. Statistics is a skill that deepens with practice and never stops being useful.

All models are wrong, but some are useful.
— George Box

Congratulations on completing the statistics course!

You now possess a rare and valuable skill: the ability to think clearly about uncertainty, make evidence-based decisions, and spot statistical nonsense.

Use these skills wisely. The world needs more statistically literate people.

Test your knowledge

🧠 Knowledge Check

1 / 5

Capstone Projects

Putting It All Together

Project 1: Analyze a Real Dataset

Project 2: Design and Analyze an A/B Test

Project 3: Debunk a Misleading Claim

Project 4: Build a Statistical Report

Project 5: Statistical Simulation

Evaluation Rubric

Next Steps After This Course

Test your knowledge

When analyzing a real dataset, what should you do FIRST?

Putting It All TogetherFocusStart Focus Mode

Project 1: Analyze a Real DatasetFocusStart Focus Mode

Project 2: Design and Analyze an A/B TestFocusStart Focus Mode

Project 3: Debunk a Misleading ClaimFocusStart Focus Mode

Project 4: Build a Statistical ReportFocusStart Focus Mode

Project 5: Statistical SimulationFocusStart Focus Mode

Evaluation RubricFocusStart Focus Mode

Next Steps After This CourseFocusStart Focus Mode

Test your knowledgeFocusStart Focus Mode

When analyzing a real dataset, what should you do FIRST?

Putting It All Together

Project 1: Analyze a Real Dataset

Project 2: Design and Analyze an A/B Test

Project 3: Debunk a Misleading Claim

Project 4: Build a Statistical Report

Project 5: Statistical Simulation

Evaluation Rubric

Next Steps After This Course

Test your knowledge