Data Science #31 - Correlation and causation (1921), Wright Sewall

Update: 2025-07-26

Description

On the 31st episode of the podcast, we add Liron to the team, we review a gem from 1921, where Sewall Wright introduced path analysis, mapping hypothesized causal arrows into simple diagrams and proving that any sample correlation can be written as the sum of products of “path coefficients.”

By treating each arrow as a standardised regression weight, he showed how to split the variance of an outcome into direct, indirect, and joint pieces, then solve for unknown paths from an ordinary correlation matrix—turning the slogan “correlation ≠ causation” into a workable calculus for observational data.Wright’s algebra and diagrams became the blueprint for modern graphical causal models, structural‑equation modelling, and DAG‑based inference that power libraries such as DoWhy, Pyro and CausalNex.

The same logic underlies feature‑importance decompositions, counterfactual A/B testing, fairness audits, and explainable‑AI tooling, making a century‑old livestock‑breeding study a foundation stone of present‑day data‑science and AI practice.

Comments

In Channel

Data Science #34 - The deep learning original paper review, Hinton, Rumelhard & Williams (1985)

2025-11-2346:37

Data Science #33 - The Backpropagation method, Paul Werbos (1980)

2025-11-0357:45

Data Science #32 - A Markovian Decision Process, Richard Bellman (1957)

2025-09-1946:05

Data Science #31 - Correlation and causation (1921), Wright Sewall

2025-07-2648:11

Data Science #29 - The Chi-square automatic interaction detection(CHAID) algorithm (1979)

2025-05-2341:03

Data Science #28 - The Bloom filter algorithm

2025-05-2339:15

Data Science #27 - The History of Least Squares (1877)

2025-04-0232:09

Data Science #26 - The First Gradient decent algorithm by Cauchy (1847)

2025-03-2333:14

Data Science #24 - The Expectation Maximization (EM) algorithm Paper review (1977)

2025-02-0432:47

Data Science #23- The Markov Chain Monte Carl MCMC Paper review (1953)

2025-01-1437:54

Data Science #22 - The theory of dynamic programming, Paper review 1954

2025-01-0747:46

Data Science #21 - Steps Toward Artificial Intelligence

2024-12-2559:39

Data Science #20 - the Rao-Cramer bound (1945)

2024-12-0959:42

Data Science #19 - The Kullback–Leibler divergence paper (1951)

2024-12-0252:41

Data Science #18 - The k-nearest neighbors algorithm (1951)

2024-11-2544:01

Data Science #17 - The Monte Carlo Algorithm (1949)

2024-11-1838:11

Data Science #16 - The First Stochastic Descent Algorithm (1952)

2024-11-0742:20

Data Science #15 - The First Decision Tree Algorithm (1963)

2024-10-2836:35

Data Science #14 - The original k-means algorithm paper review (1957)

2024-10-1046:57

Data Science #13 - Kolmogorov complexity paper review (1965) - Part 2

2024-10-0129:25

00:00

Data Science #31 - Correlation and causation (1921), Wright Sewall

#box-pro-ellipsis-176735553276721{-webkit-line-clamp:2;}Data Science #31 - Correlation and causation (1921), Wright Sewall

Data Science #31 - Correlation and causation (1921), Wright Sewall

Mike E

Data Science #31 - Correlation and causation (1921), Wright Sewall