3.3 — Omitted Variable Bias — Class Content

Meeting Dates

Wednesday, October 26, 2022

Overview

Today we return to our regression models, now knowing something about identifying causal effects. We know from DAGs that we often need to “adjust for” or “control for variables” in order to identify the causal effect we are interested in. Now we give a particular name and set of conditions for when we need to control a variable: .b[“omitted variable bias”], where some variable both causes $Y$ (is in $u)$ , and is correlated with $X$ . To avoid introducing the bias, we now include it as an additional independent variable in our regression.

Thus, we now begin exploring multivariate regression with multiple regressors:

$Y_{i} = β_{0} + β_{1} X_{1 i} + β_{2} X_{2 i} + u_{i}$

Next class we will learn more about how the introduction of additional variables affects our model.

We continue the extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:¹

caschool.dta

I have also made a RStudio Cloud project documenting all of the things we have been doing with this data that may help you when you start working with regressions:

Readings

Ch. 5.1 in Bailey, Real Econometrics

Slides

Below, you can find the slides in two formats. Clicking the image will bring you to the html version of the slides in a new tab. The lower button will allow you to download a PDF version of the slides.

I suggest printing the slides beforehand and using them to take additional notes in class (not everything is in the slides)!

Download as PDF

Footnotes

Note this is a .dta Stata file. You will need to (install and) load the package haven to read_dta() Stata files into a dataframe.↩︎