3.1 — Problem of Causal Inference

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner

Associate Professor of Economics

safner@hood.edu

ryansafner/metricsF22

metricsF22.classes.ryansafner.com

\[\color{orange}{Y}=\color{teal}{\beta}(\color{purple}{X})\]

where \(\color{orange}{Y}\) is numeric:

- Causal inference: estimate \(\color{teal}{\hat{\beta}}\) to determine how changes in \(\color{purple}{X}\)
**cause**changes in \(\color{orange}{Y}\)

- Care more about accurately estimating and understanding \(\color{teal}{\hat{\beta}}\)
- Remove as much
**bias**in \(\color{teal}{\hat{\beta}}\) as possible - Don’t care much about
**goodness of fit**! (You’ll never get it in the complex real world)

- Prediction: predict \(\color{orange}{\hat{Y}}\) using an estimated \(\color{teal}{\hat{\beta}}\)

- Care more about getting \(\color{orange}{\hat{Y}}\) as accurate as possible, \(\color{teal}{\hat{\beta}}\) is an unknown “black-box”
- Tweak models to maximize \(R^2\), minimize \(\hat{\sigma}_u\) (at all costs)

- We use econometrics to identify causal relationships & make inferences about them:

- Problem for identification: endogeneity
- \(X\) is
**exogenous**if its variation is**unrelated**to other factors \((u)\) that affect \(Y\) - \(X\) is
**endogenous**if its variation is**related**to other factors \((u)\) that affect \(Y\)

- \(X\) is
- Problem for inference: randomness
- Data is random due to
**natural sampling variation** - Taking one sample of a population will yield slightly different information than another sample of the same population

- Data is random due to

Sample \(\color{#6A5ACD}{\xrightarrow{\text{statistical inference}}}\) Population \(\color{#e64173}{\xrightarrow{\text{causal indentification}}}\) Unobserved Parameters

- We saw how to statistically infer values of population parameters using our sample
- Purely empirical, math & statistics 🤓

- We now confront the problem of identifying causal relationships within population
- Endogeneity problem
- Even if we had perfect data on the whole population, “Does X truly cause Y?”, and can we measure that effect?
- More philosophy & theory than math & statistics! 🧐

- Truly you should do this first,
*before*you get data to make inferences!

We are going to reflect on one of the biggest problems in epistemology, the philosophy of knowledge

We see that X and Y are associated (or quantitatively, correlated), but how do we know if X

*causes*Y?

- The
*ideal*way to demonstrate causation is through a randomized control trial (RCT) or “random experiment”*Randomly*assign experimental units (e.g. people, firms, etc.) into groups- Treatment group(s) get a treatment
- Control group gets no treatment
- Compare average results of treatment vs control groups after treatment o observe the average treatment effect (ATE)

- We will understand “causality” (for now) to mean the ATE from an ideal RCT

- Random assignment to groups ensures that the
*only*differences between members of the treatment(s) and control groups is*receiving treatment or not*

Random assignment to groups ensures that the

*only*differences between members of the treatment(s) and control groups is*receiving treatment or not*Selection bias: (pre-existing) differences between members of treatment and control groups

*other*than treatment, that affect the outcome

- Suppose we have some outcome variable \(Y\)

- Individuals \((i)\) face a choice between two outcomes (such as being treated or not treated):
- \(\color{#6A5ACD}{Y_i^{0}}\): outcome when individual \(i\) is not treated
- \(\color{#e64173}{Y_i^{1}}\): outcome when individual \(i\) is treated

✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨

- \(\color{#314f4f}{\delta_i}\) is the causal effect of treatment on individual \(i\)

✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨

- This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

\[\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}}\]

This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

Individual counterfactuals do not exist (“the path not taken”)

You will always only ever get one of these per individual!

\[\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}\]

This is a nice way to think about the ideal proof of causality, but this is impossible to observe!

Individual counterfactuals do not exist (“the path not taken”)

You will always only ever get one of these per individual!

- e.g. what would your life have been like if you did not go to Hood College?? 🧐

So what can we do?

\[\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}\]

Have large groups, and take

*averages*instead!Average Treatment Effect (ATE): difference in the average (expected value) of outcome \(Y\) between treated individuals and untreated individuals

\[\color{#314f4f}{\delta} = \color{#e64173}{(\bar{Y}|T=1)}-\color{#6A5ACD}{(\bar{Y}|T=0)}\]

- \(T_i\) is a binary variable, \(= \begin{cases} \color{#6A5ACD}{0} & \color{#6A5ACD}{\text{ if person is not treated}}\\\color{#e64173}{1} & \color{#e64173}{\text{ if person is treated}}\\ \end{cases}\)

\[\color{#314f4f}{ATE} = \color{#e64173}{\mathbb{E}[Y_i^{1}]} - \color{#6A5ACD}{\mathbb{E}[Y_i^{0}]}\]

Again:

**Either**we observe individual \(i\) in the treatment group \(\color{#e64173}{(T=1)}\), i.e.

\[\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{red}{?}\]

**Or**we observe individual \(i\) in the control group \(\color{#6A5ACD}{(T=0)}\), i.e.

\[\color{#314f4f}{\delta_i} = \color{red}{?} - \color{#6A5ACD}{Y_i^{0}} \]

**Never both**at the same time:

✨ \(\color{#314f4f}{\delta_i} = \color{#e64173}{Y_i^{1}} - \color{#6A5ACD}{Y_i^{0}}\) ✨

Angrist, Joshua & Jorn-Steffen Pischke, 2015, *Mostly Harmless Econometrics*

\(Y\): outcome variable (health index score, 1-5)

\(Y_i\): health score of an individual \(i\)

Individual \(i\) has a choice, leading to one of two outcomes:

- \(\color{#6A5ACD}{Y^0_i}\): individual \(i\) has
*not*purchased health insurance (“Control”) - \(\color{#e64173}{Y^1_i}\): individual \(i\) has purchased health insurance (“Treatment”)

- \(\color{#6A5ACD}{Y^0_i}\): individual \(i\) has
\(\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\): causal effect for individual \(i\) of purchasing health insurance

John | Maria |
---|---|

\(Y_J^0=3\) | \(Y_M^0=5\) |

\(Y_J^1=4\) | \(Y_M^1=5\) |

John will choose to buy health insurance

Maria will choose to not buy health insurance

John | Maria |
---|---|

\(Y_J^0=3\) | \(Y_M^0=5\) |

\(Y_J^1=4\) | \(Y_M^1=5\) |

✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |

John will choose to buy health insurance

Maria will choose to not buy health insurance

Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))

John | Maria |
---|---|

\(Y_J^0=3\) | \(Y_M^0=5\) |

\(Y_J^1=4\) | \(Y_M^1=5\) |

✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |

\(\color{#e64173}{Y_J=(Y_J^1)=4}\) | \(\color{#6A5ACD}{Y_M=(Y_M^0)=5}\) |

John will choose to buy health insurance

Maria will choose to not buy health insurance

Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))

Note, all we can observe in the data are their health outcomes

*after*they have chosen (not) to buy health insurance: \[\begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}\]

John | Maria |
---|---|

\(Y_J^0=3\) | \(Y_M^0=5\) |

\(Y_J^1=4\) | \(Y_M^1=5\) |

✨ \(\color{#314f4f}{\delta_J=1}\) | \(\color{#314f4f}{\delta_M=0}\) ✨ |

\(\color{#e64173}{Y_J=(Y_J^1)=4}\) | \(\color{#6A5ACD}{Y_M=(Y_M^0)=5}\) |

John will choose to buy health insurance

Maria will choose to not buy health insurance

Health insurance improves John’s score by 1, has no effect on Maria’s score (individual causal effects \(\color{#314f4f}{\delta_i}\))

Note, all we can observe in the data are their health outcomes

*after*they have chosen (not) to buy health insurance: \[\begin{align*} \color{#e64173}{Y_J}&\color{#e64173}{=4}\\ \color{#6A5ACD}{Y_M}&\color{#6A5ACD}{=5}\\ \end{align*}\]*Observed*difference between John and Maria: \[\color{#e64173}{Y_J}-\color{#6A5ACD}{Y_M}=-1\]

John | Maria |
---|---|

\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |

This is all the data we *actually* observe

- Observed difference between John and Maria:

\[Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}\]

- Recall:
- John has bought health insurance \(\color{#e64173}{Y^1_J}\)
- Maria has not bought insurance \(\color{#6A5ACD}{Y^0_M}\)

- We don’t see the counterfactuals:
- John’s score
*without*insurance - Maria score
*with*insurance

- John’s score

John | Maria |
---|---|

\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |

This is all the data we *actually* observe

- Observed difference between John and Maria:

\[Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_M}}_{=-1}\]

- Algebra trick: add and subtract \(\color{#6A5ACD}{Y^0_J}\) to equation:

\[\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}\]

John | Maria |
---|---|

\(\color{#e64173}{Y_J=4}\) | \(\color{#6A5ACD}{Y_M=5}\) |

This is all the data we *actually* observe

\[\begin{align*} Y_J-Y_M=\underbrace{\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}}_{\color{#314f4f}{=1}}+\underbrace{\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}}_{\color{orange}{=-2}} \end{align*}\]

- \(\color{#e64173}{Y^1_J}-\color{#6A5ACD}{Y^0_J}=1\): Causal effect for John
^{1}of buying insurance, \(\color{#314f4f}{\delta_J}\) - \(\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}=-2\): Difference between John & Maria pre-treatment, “selection bias”

\[\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M} \neq 0\]

- Selection bias: (pre-existing) differences between members of treatment and control groups
*other*than treatment, that affect the outcome- i.e. John and Maria
*start out*with very*different*health scores before either decides to buy insurance or not (“receive treatment” or not)

- i.e. John and Maria

\[\color{#6A5ACD}{Y^0_J}-\color{#6A5ACD}{Y^0_M}\neq 0\]

The choice to get treatment is endogenous

A choice made by optimizing agents

John and Maria have different preferences, endowments, & constraints that cause them to make different decisions

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | 3.0 | 1.0 |

Maria | 5.0 | 5.0 | 0.0 |

Average |
4.5 | 4.0 | 0.5 |

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | 3.0 | 1.0 |

Maria | 5.0 | 5.0 | 0.0 |

Average |
4.5 | 4.0 | 0.5 |

- Individual treatment effect (for individual \(i\)):

\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | 3.0 | 1.0 |

Maria | 5.0 | 5.0 | 0.0 |

Average |
4.5 | 4.0 | 0.5 |

- Individual treatment effect (for individual \(i\)):

\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]

*Average*treatment effect:

\[\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})\]

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | 3.0 | 1.0 |

Maria | 5.0 | 5.0 | 0.0 |

Average |
4.5 | 4.0 | 0.5 |

- Individual treatment effect (for individual \(i\)):

\[\color{#314f4f}{\delta_i}=\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i}\]

*Average*treatment effect:

\[\color{#314f4f}{ATE}=\frac{1}{n}\sum^n_{i=1}(\color{#e64173}{Y^1_i}-\color{#6A5ACD}{Y^0_i})\]

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | ? | ? |

Maria | ? | 5.0 | ? |

Average |
? | ? | ? |

- We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
- Maria with insurance \(\color{#e64173}{Y^1_M}\)
- John without insurance \(\color{#6A5ACD}{Y^0_J}\)

- Can’t we just take the difference in group means?

\[\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}\]

**Actual** (observed) Data

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | ? | ? |

Maria | ? | 5.0 | ? |

Average |
? | ? | ? |

- We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
- Maria with insurance \(\color{#e64173}{Y^1_M}\)
- John without insurance \(\color{#6A5ACD}{Y^0_J}\)

- Can’t we just take the difference in group means?

\[\begin{align*} diff.=\color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ \end{align*}\]

- Suppose a uniform treatment effect, \(\color{#314f4f}{\delta_i}\)

\[\begin{align*} &= \color{#e64173}{Avg(Y_i^{1}|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#e64173}{Avg(}\color{#314f4f}{\delta_i}+\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\\ &= \color{#314f4f}{\delta_i}+\underbrace{\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}}_{\color{#FFA500}{\text{selection bias}}}\\ &= \color{#314f4f}{ATE} + \color{#FFA500}{\text{selection bias}} \\ \end{align*}\]

**Actual** (observed) Data

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | ? | ? |

Maria | ? | 5.0 | ? |

Average |
? | ? | ? |

- We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
- Maria with insurance \(\color{#e64173}{Y^1_M}\)
- John without insurance \(\color{#6A5ACD}{Y^0_J}\)

- Basic comparisons tell us
*something*about outcomes, but not ATE

\[diff. = \color{#314f4f}{ATE} + \color{#FFA500}{\text{Selection Bias}}\]

Selection bias: difference in average \(Y^0_i\) between groups pre-treatment

\(Y^0_i\) includes

*everything*about person \(i\) relevant to health*except*treatment (insurance) status- Age, sex, height, weight, climate, smoker, exercise, diet, etc.
- Imagine a world where
*nobody*gets insurance (treatment), who would have highest health scores?

**Actual** (observed) Data

Individual | Insured | Not Insured | Diff |
---|---|---|---|

John | 4.0 | ? | ? |

Maria | ? | 5.0 | ? |

Average |
? | ? | ? |

- We never get to see each person’s counterfactual state to compare and calculate ITEs or ATE
- Maria with insurance \(\color{#e64173}{Y^1_M}\)
- John without insurance \(\color{#6A5ACD}{Y^0_J}\)

Treatment group and control group differ on average, for reasons

*other*than getting treatment or not!Control group is not a good counterfactual for treatment group without treatment

- Average
*untreated*outcome for the treatment group differs from average untreated outcome for*untreated*group

- Average

\[\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}-\color{#6A5ACD}{Avg(Y_i^{0}|T=0)}\]

- Recall we cannot observe \(\color{#e64173}{Avg(}\color{#6A5ACD}{Y_i^{0}}\color{#e64173}{|T=1)}\)!

- Consider the problem in regression form:

\[Y = \beta_0+\beta_1 T_i + u_i\]

Where \(T_i = \begin{cases} \color{#6A5ACD}{0} & \color{#6A5ACD}{\text{ if person is not treated}}\\\color{#e64173}{1} & \color{#e64173}{\text{ if person is treated}}\\ \end{cases}\)

The problem is \(cor(T,u) \neq 0\)!

- \(T_i\) (Treatment) is endogenous!
*Getting*treatment is correlated with other factors that determine health!

If treatment is randomly assigned for a large sample, it eliminates selection bias!

Treatment and control groups differ

**on average**by nothing*except*treatment statusCreates ceterus paribus conditions in economics: groups are identical

**on average**(holding constant age, sex, height, etc.)

Treatment Group

Control Group

- Consider the problem in regression form:

\[Y = \beta_0+\beta_1 T_i + u_i\]

- If treatment \(T_i\) is administered
*randomly*, it breaks the correlation with \(u_i\)!- Treatment becomes exogenous!
- \(cor(T,u)=0\)

Treatment Group

Control Group

RCTs are considered the “gold standard” for causal claims

But society is not our laboratory (probably a good thing!)

We can rarely conduct experiments to get data

Instead, we often rely on observational data

This data is

*not random*!Must take extra care in forming an identification strategy

To make good claims about causation in society, we must get clever!

Economists often resort to searching for natural experiments

“Natural” events beyond our control occur that separate

*otherwise similar*entities into a “treatment” group and a “control” group that we can comparee.g. natural disasters, U.S. State laws, military draft

John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease

Water pumps with sources

*downstream*of a sewage dump in the Thames river spread cholera while water pumps with sources*upstream*did not

1813-1858

John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease

Water pumps with sources

*downstream*of a sewage dump in the Thames river spread cholera while water pumps with sources*upstream*did not

**Oregon Health Insurance Experiment**: Oregon used lottery to grant Medicare access to 10,000 people, showing access to Medicaid increased use of health services, lowered debt, etc. relative to those not on Medicaid**Angrist (1990)**finds that lifetime earnings of (random) drafted Vietnam veterans is 15% lower than non-veterans**Card & Kreuger (1994)**find that minimum wage hike in fast-food restaurants on NJ side of border had no disemployment effects relative to restaurants on PA side of border during the same period**Acemoglu, Johnson, and Robinson (2001)**find that inclusive institutions lead to higher economic development than extractive institutions, determined by a colony’s disease environment in 1500- We will look at some of these in greater detail throughout the course
- A great list, with explanations is here

Professors Esther Duflo and Abhijit Banerjee, co-directors of MIT's @JPAL, receive congratulations on the big news this morning. They share in the #NobelPrize in economic sciences “for their experimental approach to alleviating global poverty.”

— Massachusetts Institute of Technology (MIT) (@MIT) October 14, 2019

Photo: Bryce Vickmark pic.twitter.com/NWeTrjR2Bq

Angus Deaton

Economics Nobel 2015

“The RCT is a useful tool, but I think that is a mistake to put method ahead of substance. I have written papers using RCTs…[but] no RCT can ever legitimately claim to have established causality. My theme is that RCTs have no special status, they have no exemption from the problems of inference that econometricians have always wrestled with, and there is nothing that they, and only they, can accomplish.”

Deaton, Angus, 2019, “Randomization in the Tropics Revisited: A Theme and Eleven Variations”, Working Paper

Lant Pritchett

“People keep saying that the recent Nobelists ‘studied global poverty.’ This is exactly wrong. They made a commitment to a method, not a subject, and their commitment to method prevented them from studying global poverty.”

“At a conference at Brookings in 2008 Paul Romer [2018 Nobelist] said:”You guys are like going to a doctor who says you have an allergy and you have cancer. With the skin rash we can divide you skin into areas and test variety of substances and identify with precision and some certainty the cause. Cancer we have some ideas how to treat it but there are a variety of approaches and since we cannot be sure and precise about which is best for you, we will ignore the cancer and not treat it.”

Angus Deaton

Economics Nobel 2015

“Lant Pritchett is so fun to listen to, sometimes you could forget that he is completely full of shit.”

[Source](https://medium.com/@ismailalimanik/lant-pritchett-the-debate-about-rcts-in-development-is-over-ec7a28a82c17

Programs

*randomly*assign treatment to different individuals and measure causal effect of treatment**RAND Health Insurance Study**: randomly give people health insurance**Oregon Medicaid Expansion**: randomly give people Medicaid**HUD’s Moving to Opportunity**: randomly give people moving vouchers**Tennessee STAR**: randomly assign students to large vs. small classes

Even if a study is internally valid (used statistics correctly, etc.) we must still worry about external validity:

Is the finding generalizable to the whole population?

If we find something in India, does that extend to Bolivia? France?

Subjects of studies & surveys are often

- Western
- Educated
- Industrialized
- Rich
- Democracies

IN MICEhttps://t.co/mLuKBRhsAb

— justsaysinmice (@justsaysinmice) September 15, 2020