2.5 — Precision and Diagnostics — Class Content

Meeting Dates

Wednesday, September 28, 2022


Last class and this class we are looking at the sampling distibution of OLS estimators (particularly \(\hat{\beta_1})\). Last class we looked at what the center of the distribution was - the true \(\beta_1\) - so long as the assumptions about \(u\) hold:

  • When \(cor(X,u)=0\), \(X\) is exogenous and the OLS estimators are unbiased.
  • What \(cor(X,u)\neq 0\), \(X\) is endogenous and the OLS estimators are biased.

Today we continue looking at the sampling distibution by determining the variation in \(\hat{beta_1}\) (it’s variance or its standard error1). We look at the formula and see the three major determinants of variation in \(\hat{\beta_1}\):

  1. Goodness of fit of the regression \((SER\) or \(\hat{\sigma_u}\)
  2. Sample size \(n\)
  3. Variation in \(X\)

We also look at the diagnostics of a regression by looking at its residuals \((\hat{u_i})\) for anomalies. We focus on the problem of heteroskedasticity (where the variation in \(\hat{u_i])\) changes over the range of \(X\), which violates assumption 2 (errors are homoskedastic): how to detect it, test it, and fix it with some packages. We also look at outliers, which can bias the regression. Finally, we also look at how to present regression results.

We continue our extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:2

I have also made a RStudio Cloud project documenting all of the things we have been doing with this data that may help you when you start working with regressions (next class):


  • Finish Ch.3 in Bailey, Real Econometrics

R Practice

Today (and next class) you will be working on practice problems. Answers will be posted on that page later.


See the online appendix for today’s content:


Below, you can find the slides in two formats. Clicking the image will bring you to the html version of the slides in a new tab. The lower button will allow you to download a PDF version of the slides.

I suggest printing the slides beforehand and using them to take additional notes in class (not everything is in the slides)!


Download as PDF


  1. The square root of variance, as always!↩︎

  2. Note this is a .dta Stata file. You will need to (install and) load the package haven to read_dta() Stata files into a dataframe.↩︎