3.1 — Problem of Causal Inference
ECON 480 • Econometrics • Fall 2022
Dr. Ryan Safner
Associate Professor of Economics
safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com
\[\color{orange}{Y}=\color{teal}{\beta}(\color{purple}{X})\]
where \(\color{orange}{Y}\) is numeric:
Sample \(\color{#6A5ACD}{\xrightarrow{\text{statistical inference}}}\) Population \(\color{#e64173}{\xrightarrow{\text{causal indentification}}}\) Unobserved Parameters
We are going to reflect on one of the biggest problems in epistemology, the philosophy of knowledge
We see that X and Y are associated (or quantitatively, correlated), but how do we know if X causes Y?
Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not
Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome
✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨
✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨
\[\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}}\]
This is a nice way to think about the ideal proof of causality, but this is impossible to observe!
Individual counterfactuals do not exist (“the path not taken”)
You will always only ever get one of these per individual!
\[\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}\]
This is a nice way to think about the ideal proof of causality, but this is impossible to observe!
Individual counterfactuals do not exist (“the path not taken”)
You will always only ever get one of these per individual!
So what can we do?
\[\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}\]
Have large groups, and take averages instead!
Average Treatment Effect (ATE): difference in the average (expected value) of outcome \(Y\) between treated individuals and untreated individuals
\[\color{#314f4f}{\delta} = \color{#e64173}{(\bar{Y}|T=1)}-\color{#6A5ACD}{(\bar{Y}|T=0)}\]
\[\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}\]
Again:
\[\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}\]
\[\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}} \]
✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨
Angrist, Joshua & Jorn-Steffen Pischke, 2015, Mostly Harmless Econometrics
\(Y\): outcome variable (health index score, 1-5)
\(Y_i\): health score of an individual \(i\)
Individual \(i\) has a choice, leading to one of two outcomes:
\(\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\): causal effect for individual \(i\) of purchasing health insurance
John | Maria |
---|---|
\(Y_J^0=3\) | \(Y_M^0=5\) |
\(Y_J^1=4\) | \(Y_M^1=5\) |
John will choose to buy health insurance
Maria will choose to not buy health insurance
John | Maria |
---|---|
\(Y_J^0=3\) | \(Y_M^0=5\) |
\(Y_J^1=4\) | \(Y_M^1=5\) |
✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))
John | Maria |
---|---|
\(Y_J^0=3\) | \(Y_M^0=5\) |
\(Y_J^1=4\) | \(Y_M^1=5\) |
✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |
\(\color{#e64173}{Y_J=(Y_J^1)=4}\) | \(\color{#6A5ACD}{Y_M=(Y_M^0)=5}\) |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: \[\begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}\]
John | Maria |
---|---|
\(Y_J^0=3\) | \(Y_M^0=5\) |
\(Y_J^1=4\) | \(Y_M^1=5\) |
✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |
\(\color{#e64173}{Y_J=(Y_J^1)=4}\) | \(\color{#6A5ACD}{Y_M=(Y_M^0)=5}\) |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: \[\begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}\]
Observed difference between John and Maria: \[\color{#e64173}{Y_J}-\color{#6A5ACD}{Y_M}=-1\]
John | Maria |
---|---|
\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |
This is all the data we actually observe
\[Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}\]
John | Maria |
---|---|
\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |
This is all the data we actually observe
\[Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}\]
\[\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}\]
John | Maria |
---|---|
\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |
This is all the data we actually observe
\[\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}\]
\[\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M} \neq 0\]
\[\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}\neq 0\]
The choice to get treatment is endogenous
A choice made by optimizing agents
John and Maria have different preferences, endowments, & constraints that cause them to make different decisions
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]
\[\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})\]
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]
\[\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})\]
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
\[\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}\]
Actual (observed) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
\[\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}\]
\[\begin{align*} &= \color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#e64173}{Avg(}\color{#314f4f}{\delta_i}+\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#314f4f}{\delta_i}+\underbrace{\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}}_{\color{#FFA500}{\text{selection bias}}}\\ &= \color{#314f4f}{ATE} + \color{#FFA500}{\text{selection bias}} \\ \end{align*}\]
Actual (observed) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
\[diff. = \color{#314f4f}{ATE} + \color{#FFA500}{\text{Selection Bias}}\]
Selection bias: difference in average \(Y^0_i\) between groups pre-treatment
\(Y^0_i\) includes everything about person \(i\) relevant to health except treatment (insurance) status
Actual (observed) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
Treatment group and control group differ on average, for reasons other than getting treatment or not!
Control group is not a good counterfactual for treatment group without treatment
\[\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\]
\[Y = \beta_0+\beta_1 T_i + u_i\]
Where \(T_i = \begin{cases} \color{#6A5ACD}{0} & \color{#6A5ACD}{\text{ if person is not treated}}\\\color{#e64173}{1} & \color{#e64173}{\text{ if person is treated}}\\ \end{cases}\)
The problem is \(cor(T,u) \neq 0\)!
If treatment is randomly assigned for a large sample, it eliminates selection bias!
Treatment and control groups differ on average by nothing except treatment status
Creates ceterus paribus conditions in economics: groups are identical on average (holding constant age, sex, height, etc.)
Treatment Group
Control Group
\[Y = \beta_0+\beta_1 T_i + u_i\]
Treatment Group
Control Group
RCTs are considered the “gold standard” for causal claims
But society is not our laboratory (probably a good thing!)
We can rarely conduct experiments to get data
Instead, we often rely on observational data
This data is not random!
Must take extra care in forming an identification strategy
To make good claims about causation in society, we must get clever!
Economists often resort to searching for natural experiments
“Natural” events beyond our control occur that separate otherwise similar entities into a “treatment” group and a “control” group that we can compare
e.g. natural disasters, U.S. State laws, military draft
John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease
Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not
1813-1858
John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease
Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not
Professors Esther Duflo and Abhijit Banerjee, co-directors of MIT's @JPAL, receive congratulations on the big news this morning. They share in the #NobelPrize in economic sciences “for their experimental approach to alleviating global poverty.”
— Massachusetts Institute of Technology (MIT) (@MIT) October 14, 2019
Photo: Bryce Vickmark pic.twitter.com/NWeTrjR2Bq
Angus Deaton
Economics Nobel 2015
“The RCT is a useful tool, but I think that is a mistake to put method ahead of substance. I have written papers using RCTs…[but] no RCT can ever legitimately claim to have established causality. My theme is that RCTs have no special status, they have no exemption from the problems of inference that econometricians have always wrestled with, and there is nothing that they, and only they, can accomplish.”
Deaton, Angus, 2019, “Randomization in the Tropics Revisited: A Theme and Eleven Variations”, Working Paper
Lant Pritchett
“People keep saying that the recent Nobelists ‘studied global poverty.’ This is exactly wrong. They made a commitment to a method, not a subject, and their commitment to method prevented them from studying global poverty.”
“At a conference at Brookings in 2008 Paul Romer [2018 Nobelist] said:”You guys are like going to a doctor who says you have an allergy and you have cancer. With the skin rash we can divide you skin into areas and test variety of substances and identify with precision and some certainty the cause. Cancer we have some ideas how to treat it but there are a variety of approaches and since we cannot be sure and precise about which is best for you, we will ignore the cancer and not treat it.”
Angus Deaton
Economics Nobel 2015
“Lant Pritchett is so fun to listen to, sometimes you could forget that he is completely full of shit.”
[Source](https://medium.com/@ismailalimanik/lant-pritchett-the-debate-about-rcts-in-development-is-over-ec7a28a82c17
Programs randomly assign treatment to different individuals and measure causal effect of treatment
RAND Health Insurance Study: randomly give people health insurance
Oregon Medicaid Expansion: randomly give people Medicaid
HUD’s Moving to Opportunity: randomly give people moving vouchers
Tennessee STAR: randomly assign students to large vs. small classes
Even if a study is internally valid (used statistics correctly, etc.) we must still worry about external validity:
Is the finding generalizable to the whole population?
If we find something in India, does that extend to Bolivia? France?
Subjects of studies & surveys are often
IN MICEhttps://t.co/mLuKBRhsAb
— justsaysinmice (@justsaysinmice) September 15, 2020