2.3 — Simple Linear Regression — Class Content

Meeting Dates

Wednesday, September 21, 2022

Upcoming Assignment

Problem Set 1 is due on Wednesday September 21.

Problem Set 2 is due on Wednesday September 28.

Overview

Today we start looking at associations between variables, which we will first attempt to quantify with measures like covariance and correlation. Then we turn to fitting a line to data via linear regression. We overview the basic regression model, the parameters and how they are derived, and see how to work with regressions in R with lm and the tidyverse package broom.

We consider an extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:¹

caschool.dta

I have also made a RStudio Cloud project documenting all of the things we have been doing with this data that may help you when you start working with regressions (next class):

Class Size Regression Analysis

Readings

Ch. 3.1, Math and Probability Background Appendix A in Bailey

Now that we return to the statistics, we will do a minimal overview of basic statistics and distributions. Review all of Bailey’s appendices.

Chapter 2 is optional, but will give you a good overview of using data.

Appendix

See the online appendix for today’s content:

Slides

Below, you can find the slides in two formats. Clicking the image will bring you to the html version of the slides in a new tab. The lower button will allow you to download a PDF version of the slides.

I suggest printing the slides beforehand and using them to take additional notes in class (not everything is in the slides)!

Download as PDF

Footnotes

Note this is a .dta Stata file. You will need to (install and) load the package haven to read_dta() Stata files into a dataframe.↩︎