3.1 — Problem of Causal Inference

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

## Contents

First Pass at Causation: RCTs

The Potential Outcomes Model

Natural Experiments

Attack of/on the Randomistas

## Two Types of Uses For Regression

$\color{orange}{Y}=\color{teal}{\beta}(\color{purple}{X})$

where $\color{orange}{Y}$ is numeric:

1. Causal inference: estimate $\color{teal}{\hat{\beta}}$ to determine how changes in $\color{purple}{X}$ cause changes in $\color{orange}{Y}$
• Care more about accurately estimating and understanding $\color{teal}{\hat{\beta}}$
• Remove as much bias in $\color{teal}{\hat{\beta}}$ as possible
• Don’t care much about goodness of fit! (You’ll never get it in the complex real world)
1. Prediction: predict $\color{orange}{\hat{Y}}$ using an estimated $\color{teal}{\hat{\beta}}$
• Care more about getting $\color{orange}{\hat{Y}}$ as accurate as possible, $\color{teal}{\hat{\beta}}$ is an unknown “black-box”
• Tweak models to maximize $R^2$, minimize $\hat{\sigma}_u$ (at all costs)

## Recall: Two Big Problems with Data

• We use econometrics to identify causal relationships & make inferences about them:
1. Problem for identification: endogeneity
• $X$ is exogenous if its variation is unrelated to other factors $(u)$ that affect $Y$
• $X$ is endogenous if its variation is related to other factors $(u)$ that affect $Y$
2. Problem for inference: randomness
• Data is random due to natural sampling variation
• Taking one sample of a population will yield slightly different information than another sample of the same population

## The Two Problems: Identification and Inference

Sample $\color{#6A5ACD}{\xrightarrow{\text{statistical inference}}}$ Population $\color{#e64173}{\xrightarrow{\text{causal indentification}}}$ Unobserved Parameters

• We saw how to statistically infer values of population parameters using our sample
• Purely empirical, math & statistics 🤓
• We now confront the problem of identifying causal relationships within population
• Endogeneity problem
• Even if we had perfect data on the whole population, “Does X truly cause Y?”, and can we measure that effect?
• More philosophy & theory than math & statistics! 🧐
• Truly you should do this first, before you get data to make inferences!

## What Does Causation Mean?

• We are going to reflect on one of the biggest problems in epistemology, the philosophy of knowledge

• We see that X and Y are associated (or quantitatively, correlated), but how do we know if X causes Y?

# First Pass at Causation: RCTs

## Random Control Trials (RCTs) I

• The ideal way to demonstrate causation is through a randomized control trial (RCT) or “random experiment”
• Randomly assign experimental units (e.g. people, firms, etc.) into groups
• Treatment group(s) get a treatment
• Control group gets no treatment
• Compare average results of treatment vs control groups after treatment o observe the average treatment effect (ATE)
• We will understand “causality” (for now) to mean the ATE from an ideal RCT

## Random Control Trials (RCTs) II

Classic (simplified) procedure of a randomized control trial (RCT) from medicine

## Random Control Trials (RCTs) IV

• Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not

## Random Control Trials (RCTs) IV

• Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not

• Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome

# The Potential Outcomes Model

## The Fundamental Problem of Causal Inference

• Suppose we have some outcome variable $Y$
• Individuals $(i)$ face a choice between two outcomes (such as being treated or not treated):
• $\color{#6A5ACD}{Y_i^{0}}$: outcome when individual $i$ is not treated
• $\color{#e64173}{Y_i^{1}}$: outcome when individual $i$ is treated

$\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}$

• $\color{#314f4f}{\delta_i}$ is the causal effect of treatment on individual $i$

## The Fundamental Problem of Causal Inference

$\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}$

• This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

## The Fundamental Problem of Causal Inference

$\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}}$

• This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

• Individual counterfactuals do not exist (“the path not taken”)

• You will always only ever get one of these per individual!

## The Fundamental Problem of Causal Inference

$\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}$

• This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

• Individual counterfactuals do not exist (“the path not taken”)

• You will always only ever get one of these per individual!

• e.g. what would your life have been like if you did not go to Hood College?? 🧐
• So what can we do?

## The Fundamental Problem of Causal Inference

$\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}$

• Have large groups, and take averages instead!

• Average Treatment Effect (ATE): difference in the average (expected value) of outcome $Y$ between treated individuals and untreated individuals

$\color{#314f4f}{\delta} = \color{#e64173}{(\bar{Y}|T=1)}-\color{#6A5ACD}{(\bar{Y}|T=0)}$

• $T_i$ is a binary variable, $= \begin{cases} \color{#6A5ACD}{0} & \color{#6A5ACD}{\text{ if person is not treated}}\\\color{#e64173}{1} & \color{#e64173}{\text{ if person is treated}}\\ \end{cases}$

## The Fundamental Problem of Causal Inference

$\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}$

Again:

• Either we observe individual $i$ in the treatment group $\color{#e64173}{(T=1)}$, i.e.

$\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}$

• Or we observe individual $i$ in the control group $\color{#6A5ACD}{(T=0)}$, i.e.

$\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}}$

• Never both at the same time:

$\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}$

## Example: The Effect of Having Health Insurance I

Example

What is the effect of having health insurance on health outcomes?

• National Health Interview Survey (NHIS) asks “Would you say your health in general is excellent, very good, good, fair, or poor?”

• Outcome variable $(Y)$: Index of health (1-poor to 5-excellent) in a sample of married NHIS respondents in 2009 who may or may not have health insurance

• Treatment $(X)$: Having health insurance (vs. not)

## Example: The Effect of Having Health Insurance II

Angrist, Joshua & Jorn-Steffen Pischke, 2015, Mostly Harmless Econometrics

## Example: The Effect of Having Health Insurance III

• $Y$: outcome variable (health index score, 1-5)

• $Y_i$: health score of an individual $i$

• Individual $i$ has a choice, leading to one of two outcomes:

• $\color{#6A5ACD}{Y^0_i}$: individual $i$ has not purchased health insurance (“Control”)
• $\color{#e64173}{Y^1_i}$: individual $i$ has purchased health insurance (“Treatment”)
• $\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}$: causal effect for individual $i$ of purchasing health insurance

## Example: A Hypothetical Comparison

John Maria
$Y_J^0=3$ $Y_M^0=5$
$Y_J^1=4$ $Y_M^1=5$
• John will choose to buy health insurance

• Maria will choose to not buy health insurance

## Example: A Hypothetical Comparison

John Maria
$Y_J^0=3$ $Y_M^0=5$
$Y_J^1=4$ $Y_M^1=5$
$\color{#314f4f}{\delta_J=1}$ $\color{#314f4f}{\delta_M=0}$
• John will choose to buy health insurance

• Maria will choose to not buy health insurance

• Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects $\color{#314f4f}{\delta_i}$)

## Example: A Hypothetical Comparison

John Maria
$Y_J^0=3$ $Y_M^0=5$
$Y_J^1=4$ $Y_M^1=5$
$\color{#314f4f}{\delta_J=1}$ $\color{#314f4f}{\delta_M=0}$
$\color{#e64173}{Y_J=(Y_J^1)=4}$ $\color{#6A5ACD}{Y_M=(Y_M^0)=5}$
• John will choose to buy health insurance

• Maria will choose to not buy health insurance

• Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects $\color{#314f4f}{\delta_i}$)

• Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: \begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}

## Example: A Hypothetical Comparison

John Maria
$Y_J^0=3$ $Y_M^0=5$
$Y_J^1=4$ $Y_M^1=5$
$\color{#314f4f}{\delta_J=1}$ $\color{#314f4f}{\delta_M=0}$
$\color{#e64173}{Y_J=(Y_J^1)=4}$ $\color{#6A5ACD}{Y_M=(Y_M^0)=5}$
• John will choose to buy health insurance

• Maria will choose to not buy health insurance

• Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects $\color{#314f4f}{\delta_i}$)

• Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: \begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}

• Observed difference between John and Maria: $\color{#e64173}{Y_J}-\color{#6A5ACD}{Y_M}=-1$

## Counterfactuals

John Maria
$\color{#e64173}{Y_J=4}$ $\color{#6A5ACD}{Y_M=5}$

This is all the data we actually observe

• Observed difference between John and Maria:

$Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}$

• Recall:
• John has bought health insurance $\color{#e64173}{Y^1_J}$
• Maria has not bought insurance $\color{#6A5ACD}{Y^0_M}$
• We don’t see the counterfactuals:
• John’s score without insurance
• Maria score with insurance

## Counterfactuals

John Maria
$\color{#e64173}{Y_J=4}$ $\color{#6A5ACD}{Y_M=5}$

This is all the data we actually observe

• Observed difference between John and Maria:

$Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}$

• Algebra trick: add and subtract $\color{#6A5ACD}{Y^0_J}$ to equation:

\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}

## Counterfactuals

John Maria
$\color{#e64173}{Y_J=4}$ $\color{#6A5ACD}{Y_M=5}$

This is all the data we actually observe

\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}

• $\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}=1$: Causal effect for John1 of buying insurance, $\color{#314f4f}{\delta_J}$
• $\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}=-2$: Difference between John & Maria pre-treatment, “selection bias”

## Selection Bias I

$\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M} \neq 0$

• Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome
• i.e. John and Maria start out with very different health scores before either decides to buy insurance or not (“receive treatment” or not)

## Selection Bias II

$\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}\neq 0$

• The choice to get treatment is endogenous

• A choice made by optimizing agents

• John and Maria have different preferences, endowments, & constraints that cause them to make different decisions

## Example: Our Ideal Data

Ideal (but impossible) Data
Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5

## Example: Our Ideal Data

Ideal (but impossible) Data
Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5
• Individual treatment effect (for individual $i$):

$\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}$

## Example: Our Ideal Data

Ideal (but impossible) Data
Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5
• Individual treatment effect (for individual $i$):

$\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}$

• Average treatment effect:

$\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})$

## Example: Our Ideal Data

Ideal (but impossible) Data
Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5
• Individual treatment effect (for individual $i$):

$\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}$

• Average treatment effect:

$\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})$

Actual (observed) Data
Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?
• We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
• Maria with insurance $\color{#e64173}{Y^1_M}$
• John without insurance $\color{#6A5ACD}{Y^0_J}$

## Can’t We Just Take the Difference of Group Means?

• Can’t we just take the difference in group means?

\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}

Actual (observed) Data

Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?
• We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
• Maria with insurance $\color{#e64173}{Y^1_M}$
• John without insurance $\color{#6A5ACD}{Y^0_J}$

## Can’t We Just Take the Difference of Group Means?

• Can’t we just take the difference in group means?

\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}

• Suppose a uniform treatment effect, $\color{#314f4f}{\delta_i}$

\begin{align*} &= \color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#e64173}{Avg(}\color{#314f4f}{\delta_i}+\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#314f4f}{\delta_i}+\underbrace{\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}}_{\color{#FFA500}{\text{selection bias}}}\\ &= \color{#314f4f}{ATE} + \color{#FFA500}{\text{selection bias}} \\ \end{align*}

Actual (observed) Data

Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?
• We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
• Maria with insurance $\color{#e64173}{Y^1_M}$
• John without insurance $\color{#6A5ACD}{Y^0_J}$

## Example: Thinking About the Data

• Basic comparisons tell us something about outcomes, but not ATE

$diff. = \color{#314f4f}{ATE} + \color{#FFA500}{\text{Selection Bias}}$

• Selection bias: difference in average $Y^0_i$ between groups pre-treatment

• $Y^0_i$ includes everything about person $i$ relevant to health except treatment (insurance) status

• Age, sex, height, weight, climate, smoker, exercise, diet, etc.
• Imagine a world where nobody gets insurance (treatment), who would have highest health scores?

Actual (observed) Data

Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?
• We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
• Maria with insurance $\color{#e64173}{Y^1_M}$
• John without insurance $\color{#6A5ACD}{Y^0_J}$

## Understanding Selection Bias

• Treatment group and control group differ on average, for reasons other than getting treatment or not!

• Control group is not a good counterfactual for treatment group without treatment

• Average untreated outcome for the treatment group differs from average untreated outcome for untreated group

$\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}$

• Recall we cannot observe $\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}$!

## Understanding Selection Bias: Regression

• Consider the problem in regression form:

$Y = \beta_0+\beta_1 T_i + u_i$

• Where $T_i = \begin{cases} \color{#6A5ACD}{0} & \color{#6A5ACD}{\text{ if person is not treated}}\\\color{#e64173}{1} & \color{#e64173}{\text{ if person is treated}}\\ \end{cases}$

• The problem is $cor(T,u) \neq 0$!

• $T_i$ (Treatment) is endogenous!
• Getting treatment is correlated with other factors that determine health!

## Random Assignment: The Silver Bullet

• If treatment is randomly assigned for a large sample, it eliminates selection bias!

• Treatment and control groups differ on average by nothing except treatment status

• Creates ceterus paribus conditions in economics: groups are identical on average (holding constant age, sex, height, etc.)

Treatment Group

Control Group

## Random Assignment: Regression

• Consider the problem in regression form:

$Y = \beta_0+\beta_1 T_i + u_i$

• If treatment $T_i$ is administered randomly, it breaks the correlation with $u_i$!
• Treatment becomes exogenous!
• $cor(T,u)=0$

Treatment Group

Control Group

# Natural Experiments

## The Quest for Causal Effects I

• RCTs are considered the “gold standard” for causal claims

• But society is not our laboratory (probably a good thing!)

• We can rarely conduct experiments to get data

## The Quest for Causal Effects II

• Instead, we often rely on observational data

• This data is not random!

• Must take extra care in forming an identification strategy

• To make good claims about causation in society, we must get clever!

## Natural Experiments

• Economists often resort to searching for natural experiments

• “Natural” events beyond our control occur that separate otherwise similar entities into a “treatment” group and a “control” group that we can compare

• e.g. natural disasters, U.S. State laws, military draft

## The First Natural Experiment

• John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease

• Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not

## The First Natural Experiment

1813-1858

• John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease

• Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not

## Famous Natural Experiments in Empirical Economics

• Oregon Health Insurance Experiment: Oregon used lottery to grant Medicare access to 10,000 people, showing access to Medicaid increased use of health services, lowered debt, etc. relative to those not on Medicaid
• Angrist (1990) finds that lifetime earnings of (random) drafted Vietnam veterans is 15% lower than non-veterans
• Card & Kreuger (1994) find that minimum wage hike in fast-food restaurants on NJ side of border had no disemployment effects relative to restaurants on PA side of border during the same period
• Acemoglu, Johnson, and Robinson (2001) find that inclusive institutions lead to higher economic development than extractive institutions, determined by a colony’s disease environment in 1500
• We will look at some of these in greater detail throughout the course
• A great list, with explanations is here

# Attack of/on the Randomistas

## But Not Everyone Agrees I

Angus Deaton

Economics Nobel 2015

“The RCT is a useful tool, but I think that is a mistake to put method ahead of substance. I have written papers using RCTs…[but] no RCT can ever legitimately claim to have established causality. My theme is that RCTs have no special status, they have no exemption from the problems of inference that econometricians have always wrestled with, and there is nothing that they, and only they, can accomplish.”

Deaton, Angus, 2019, “Randomization in the Tropics Revisited: A Theme and Eleven Variations”, Working Paper

## But Not Everyone Agrees II

Lant Pritchett

“People keep saying that the recent Nobelists ‘studied global poverty.’ This is exactly wrong. They made a commitment to a method, not a subject, and their commitment to method prevented them from studying global poverty.”

“At a conference at Brookings in 2008 Paul Romer [2018 Nobelist] said:”You guys are like going to a doctor who says you have an allergy and you have cancer. With the skin rash we can divide you skin into areas and test variety of substances and identify with precision and some certainty the cause. Cancer we have some ideas how to treat it but there are a variety of approaches and since we cannot be sure and precise about which is best for you, we will ignore the cancer and not treat it.”

Source

## But Not Everyone Agrees III

Angus Deaton

Economics Nobel 2015

“Lant Pritchett is so fun to listen to, sometimes you could forget that he is completely full of shit.”

## RCTs and “Evidence-Based Policy”

• Programs randomly assign treatment to different individuals and measure causal effect of treatment

• RAND Health Insurance Study: randomly give people health insurance

• Oregon Medicaid Expansion: randomly give people Medicaid

• HUD’s Moving to Opportunity: randomly give people moving vouchers

• Tennessee STAR: randomly assign students to large vs. small classes

## RCTs and External Validity I

• Even if a study is internally valid (used statistics correctly, etc.) we must still worry about external validity:

• Is the finding generalizable to the whole population?

• If we find something in India, does that extend to Bolivia? France?

• Subjects of studies & surveys are often

• Western
• Educated
• Industrialized
• Rich
• Democracies