Statistical Methods & Thinking

13 Episodes

Reverse

Episode 13 | Survival Analysis: Making Sense of Time-to-Event Data

2026-02-0341:36

In this episode, we introduce the core ideas behind analyzing time-to-event data—situations where the outcome isn’t just “what happened,” but when it happened. A key challenge is that some participants haven’t experienced the event yet by the end of follow-up (or they drop out), so the data are only partially observed.We build the intuition for describing how risk changes over time, then walk through three practical tools: how to estimate a survival curve from one group, how to compare two groups fairly over the whole follow-up, and how to study the role of multiple predictors while keeping the time dimension front and center.

Episode 12 | Clustering and Classification: Finding Structure in Data

2026-02-0338:39

In this episode, we step into multivariate thinking and ask a practical question: when do data points naturally form “groups,” and how can we use those groups to make decisions?We walk through how grouping methods decide what’s “close” or “similar,” then compare two main approaches—building clusters step by step versus forming clusters all at once. You’ll also hear how tree-like visual summaries help us see structure in messy data, and how the same multivariate ideas can be flipped into classification, where the goal is to assign a new case to the most likely group.

Episode 11 | Finding Structure in Multivariate Data

2026-02-0244:40

This episode is about what to do when your data has many variables at once. We start with the basic idea of how variables “move together” (correlation and covariance), and why that matters for understanding patterns in real datasets.Then we introduce dimension reduction—ways to compress lots of information into a few summary features, so you can see the main structure without getting lost in details. We explain how these methods find the directions where the data varies most, and how a simple “rotation” can make the results easier to interpret.We wrap up with practical rules of thumb for deciding how many components to keep, and a quick preview of how these ideas connect to grouping similar observations and classifying new cases.

Episode 10 | From Chi-Square to GLMs: Beyond Linear Regression

2026-02-0236:55

This episode is about working with categorical outcomes—questions where results fall into categories rather than a numeric scale. We learn how to check whether two variables are related, how to model the chance of a “yes/no” outcome using multiple predictors, and how to compare different modeling choices. We finish with simple ways to judge how well a model fits and whether a simpler or more detailed model is the better choice.

Episode 9 | Categorical Data in Practice: Measures of Association, and Simpson’s Paradox

2026-02-0241:46

In this episode, we start with Fisher’s “Lady Tasting Tea”—a classic reminder that good questions need good experimental design. Then we shift from continuous outcomes to categorical data: how a simple 2×2 table turns test results into sensitivity/specificity, and study results into association measures like relative risk and odds ratios.Next, we unpack Simpson’s paradox—how the headline conclusion can flip once you stratify by a key factor. We wrap up with practical inference tools, including Fisher’s exact test and the chi-square test, plus a quick nod to ROC/AUC for evaluating classifiers.

Episode 8 | Two-Way ANOVA and Beyond

2026-02-0136:31

This episode moves from one-way ANOVA to two-factor randomized experiments, focusing on how to test main effects and, more importantly, interactions—when the effect of one factor depends on the level of the other. Using examples like printer sales and a fish reproduction index, we show how ANOVA partitions variation and supports hypothesis testing. We also give a quick tour of extensions including random-effects and mixed-effects models, plus ANCOVA for adjusting with covariates. In the second half, we shift to study design in biomedicine—contrasting prospective vs. retrospective data collection—and close with a short introduction to categorical data analysis and basic concepts in clinical trials.

Episode 7 | Design of Experiments

2026-02-0131:28

This episode introduces the core logic of experimental design and ANOVA: what we mean by causality, factors, and confounders—and why randomization, replication, and blocking are the practical tools that make comparisons fair. We build the one-way ANOVA model, run the hypothesis test in R, and discuss multiple comparisons and how to control Type I error. We also connect ANOVA to regression, highlight R.A. Fisher’s role in modern statistics, and close with randomized block designs to improve precision by accounting for nuisance variation.

Episode 6 | Model Selection Strategies

2026-02-0137:46

Episode 6 is about making multiple regression work in real life: how to choose predictors without overfitting, when to transform variables to fix messy variance or nonlinearity, and what to do when predictors are strongly correlated. We’ll walk through tools like Mallows’ Cp, partial F tests, and stepwise selection, then wrap up with ridge and lasso as practical fixes for multicollinearity—with quick examples along the way.

Episode 5 | Deeper in Multiple Linear Regression

2026-01-3131:54

Episode 5 connects the “big picture” of multiple linear regression: the matrix form of the model, how least squares and maximum likelihood lead to the same estimates under standard assumptions, and what the ANOVA table is really decomposing. We compare r-square vs. adjusted r-square, review t-tests for individual predictors and the F-test for overall model validity, and finish with practical model selection (AIC and partial F-tests) plus examples on diagnosing outliers and interpreting results.

Episode 4｜Multiple Linear Regression

2026-01-3134:07

Episode 4 introduces multiple linear regression—how to model an outcome using several predictors at once, and how to interpret each effect while holding the others constant.We cover dummy variables for categorical data, and interaction terms (e.g., how experience and gender together can change salary patterns). We also compare regression with the two-sample mean test, showing how they’re related but regression is more flexible. We end with a practical note: p-values aren’t the whole story, and conclusions should rely on context and assumptions, with nonparametric options available when data don’t fit normality well.

Episode 3｜Association, Inference, and Causal Thinking in Simple Linear Regression

2026-01-3136:24

This episode builds on simple linear regression by focusing on statistical inference—how we move from a fitted line to meaningful conclusions. We review the intuition behind least squares and explain why switching the roles of the two variables can lead to different fitted lines.We then discuss how to interpret regression results in practice, including point estimates, uncertainty, and the difference between statements about an average outcome versus predictions for an individual. Using simple examples and R-based illustrations, the episode highlights how confidence intervals and prediction intervals answer different questions.Finally, we return to a key warning in applied data analysis: association is not causation. Through classic real-world examples (such as ice cream sales and shark attacks), we explain how hidden variables can create misleading relationships—and why a strong regression fit does not automatically justify a causal claim.

Episode 2｜Simple Linear Regression

2026-01-3136:24

This episode introduces simple linear regression as a tool for understanding trends and making predictions from data. We begin with the historical insight of Francis Galton, whose study of the relationship between parents’ and children’s heights helped popularize the idea of describing association with a straight line.Building on this intuition, we explain the meaning of the slope and intercept in practical terms, and how they connect to interpretation and prediction. We then emphasize why it is important to check whether a linear model is appropriate before trusting it—using scatter plots and residual plots to detect nonlinearity, unequal variability, and unusual observations.Finally, we discuss how to judge whether the model is actually useful: how much it explains the variation in the data, what to look for in common regression output, and how these ideas translate into practice through simple examples.

Episode 1｜Seeing Association in Data: Scatter Plots and Correlation

2026-01-3130:00

This episode is based on the course syllabus and the first lecture of Applied Statistical Methods. The primary goal is to introduce students to applied data analysis using statistical tools.The lecture focuses on the analysis of association in data, with particular emphasis on graphical methods for describing bivariate relationships. Scatter plots are introduced as a fundamental tool for visualizing data and guiding statistical interpretation.Different measures of correlation, including Pearson, Kendall, and Spearman coefficients, are discussed and compared in terms of their assumptions, strengths, and robustness. Through these comparisons, the episode highlights how outliers and nonlinear relationships can affect statistical conclusions.The episode also provides an overview of how statistical methods are applied in practice, drawing on examples from past student projects in fields such as medicine, social sciences, and business analytics.

#box-pro-ellipsis-17735991054077{-webkit-line-clamp:2;}Statistical Methods & Thinking