2.3 — Simple Linear Regression

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

## Contents

Exploring Relationships

Quantifying Relationships

Linear Regression

Deriving OLS Estimators

Our Class Size Example in R

# Exploring Relationships

## Bivariate Data and Relationships I

• We looked at single variables for descriptive statistics
• Most uses of statistics in economics and business investigate relationships between variables

Examples

• # of police & crime rates
• healthcare spending & life expectancy
• government spending & GDP growth
• carbon dioxide emissions & temperatures

## Bivariate Data and Relationships II

• We will begin with bivariate data for relationships between $X$ and $Y$

• Immediate aim is to explore associations between variables, quantified with correlation and linear regression

• Later we want to develop more sophisticated tools to argue for causation

• Rows are individual observations (countries)
• Columns are variables on all individuals

econfreedom %>%
glimpse()
Rows: 112
Columns: 6
$...1 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…$ Country   <chr> "Albania", "Algeria", "Angola", "Argentina", "Australia", "A…
$ISO <chr> "ALB", "DZA", "AGO", "ARG", "AUS", "AUT", "BHR", "BGD", "BEL…$ ef        <dbl> 7.40, 5.15, 5.08, 4.81, 7.93, 7.56, 7.60, 6.35, 7.51, 6.22, …
$gdp <dbl> 4543.0880, 4784.1943, 4153.1463, 10501.6603, 54688.4459, 476…$ continent <chr> "Europe", "Africa", "Africa", "Americas", "Oceania", "Europe…

source("summaries.R")
econfreedom %>%
summary_table(ef, gdp)

## Bivariate Data: Scatterplots I

ggplot(data = econfreedom)+
aes(x = ef,
y = gdp)+
geom_point(aes(color = continent),
size = 2)+
labs(x = "Economic Freedom Index (2014)",
y = "GDP per Capita (2014 USD)",
color = "")+
scale_y_continuous(labels = scales::dollar)+
theme_pander(base_family = "Fira Sans Condensed",
base_size=20)+
theme(legend.position = "bottom")

## Bivariate Data: Scatterplots II

• Look for association between independent and dependent variables
1. Direction: is the trend positive or negative?

2. Form: is the trend linear, quadratic, something else, or no pattern?

3. Strength: is the association strong or weak?

4. Outliers: do any observations deviate from the trends above?

# Quantifying Relationships

## Covariance

• For any two variables, we can measure their sample covariance, $cov(X,Y)$ or $s_{X,Y}$ to quantify how they vary together1

$s_{X,Y}=E\big[(X-\bar{X})(Y-\bar{Y}) \big]$

• Intuition: if $x_i$ is above the mean of $X$, would we expect the associated $y_i$:
• to be above the mean of $Y$ also $(X$ and $Y$ covary positively)
• to be below the mean of $Y$ $(X$ and $Y$ covary negatively)
• Covariance is a common measure, but the units are meaningless, thus we rarely need to use it so don’t worry about learning the formula

## Covariance, in R

econfreedom %>%
summarize(covariance = cov(ef, gdp))
# A tibble: 1 × 1
covariance
<dbl>
1      8923.

8923 what, exactly?

## Correlation

• Better to standardize covariance into a more intuitive concept: correlation, $r_{X,Y}$ $\in [-1, 1]$

$r_{X,Y}=\frac{s_{X,Y}}{s_X s_Y}=\frac{cov(X,Y)}{sd(X)sd(Y)}$

• Simply weight covariance by the product of the standard deviations of $X$ and $Y$
• Alternatively, take the average1 of the product of standardized $(Z$-scores for) each $(x_i,y_i)$ pair:2

\begin{align*} r&=\frac{1}{n-1}\sum^n_{i=1}\bigg(\frac{x_i-\bar{X}}{s_X}\bigg)\bigg(\frac{y_i-\bar{Y}}{s_Y}\bigg)\\ r&=\frac{1}{n-1}\sum^n_{i=1}Z_XZ_Y\\ \end{align*}

## Correlation: Interpretation

• Correlation is standardized to

$-1 \leq r \leq 1$

• Negative values $\implies$ negative association

• Positive values $\implies$ positive association

• Correlation of 0 $\implies$ no association

• As $|r| \rightarrow 1 \implies$ the stronger the association

• Correlation of $|r|=1 \implies$ perfectly linear

## Guess the Correlation!

Guess the Correlation Game

## Correlation and Covariance in R

econfreedom %>%
summarize(covariance = cov(ef, gdp),
correlation = cor(ef, gdp))
# A tibble: 1 × 2
covariance correlation
<dbl>       <dbl>
1      8923.       0.587

## Correlation and Endogeneity

• Your Occasional Reminder: Correlation does not imply causation!

• I’ll show you the difference in a few weeks (when we can actually talk about causation)
• If $X$ and $Y$ are strongly correlated, $X$ can still be endogenous!

• See today’s appendix page for more on Covariance and Correlation

# Linear Regression

## Fitting a Line to Data

• If an association appears linear, we can estimate the equation of a line that would “fit” the data

$Y = a + bX$

• A linear equation describing a line has two parameters:
• $a$: vertical intercept
• $b$: slope

## Fitting a Line to Data

• If an association appears linear, we can estimate the equation of a line that would “fit” the data

$Y = a + bX$

• A linear equation describing a line has two parameters:
• $a$: vertical intercept
• $b$: slope
• How do we choose the equation that best fits the data?

## Fitting a Line to Data

• If an association appears linear, we can estimate the equation of a line that would “fit” the data

$Y = a + bX$

• A linear equation describing a line has two parameters:

• $a$: vertical intercept
• $b$: slope
• How do we choose the equation that best fits the data?

• This process is called linear regression

## Population Linear Regression Model

• Linear regression lets us estimate the slope of the population regression line between $X$ and $Y$ using sample data

• We can make statistical inferences about what the true population slope coefficient is

• eventually & hopefully: a causal inference
• $\text{slope}=\frac{\Delta Y}{\Delta X}$: for a 1-unit change in $X$, how many units will this cause $Y$ to change?

## Class Size Example

Example

What is the relationship between class size and educational performance?

## Class Size Example: Data Import

# Load the Data

# install.packages("haven") # install for first use

# Packages
library("haven") # load for importing .dta files

# Import and save as ca_school

ca_school <- read_dta("../files/data/caschool.dta")

Data are student-teacher-ratio and average test scores on Stanford 9 Achievement Test for 5th grade students for 420 K-6 and K-8 school districts in California in 1999, (Stock and Watson, 2015: p. 141)

## Class Size Example: Data

ca_school %>%
glimpse()
Rows: 420
Columns: 21
$observat <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…$ dist_cod <dbl> 75119, 61499, 61549, 61457, 61523, 62042, 68536, 63834, 62331…
$county <chr> "Alameda", "Butte", "Butte", "Butte", "Butte", "Fresno", "San…$ district <chr> "Sunol Glen Unified", "Manzanita Elementary", "Thermalito Uni…
$gr_span <chr> "KK-08", "KK-08", "KK-08", "KK-08", "KK-08", "KK-08", "KK-08"…$ enrl_tot <dbl> 195, 240, 1550, 243, 1335, 137, 195, 888, 379, 2247, 446, 987…
$teachers <dbl> 10.90, 11.15, 82.90, 14.00, 71.50, 6.40, 10.00, 42.50, 19.00,…$ calw_pct <dbl> 0.5102, 15.4167, 55.0323, 36.4754, 33.1086, 12.3188, 12.9032,…
$meal_pct <dbl> 2.0408, 47.9167, 76.3226, 77.0492, 78.4270, 86.9565, 94.6237,…$ computer <dbl> 67, 101, 169, 85, 171, 25, 28, 66, 35, 0, 86, 56, 25, 0, 31, …
$testscr <dbl> 690.80, 661.20, 643.60, 647.70, 640.85, 605.55, 606.75, 609.0…$ comp_stu <dbl> 0.34358975, 0.42083332, 0.10903226, 0.34979424, 0.12808989, 0…
$expn_stu <dbl> 6384.911, 5099.381, 5501.955, 7101.831, 5235.988, 5580.147, 5…$ str      <dbl> 17.88991, 21.52466, 18.69723, 17.35714, 18.67133, 21.40625, 1…
$avginc <dbl> 22.690001, 9.824000, 8.978000, 8.978000, 9.080333, 10.415000,…$ el_pct   <dbl> 0.000000, 4.583333, 30.000002, 0.000000, 13.857677, 12.408759…
$read_scr <dbl> 691.6, 660.5, 636.3, 651.9, 641.8, 605.7, 604.5, 605.5, 608.9…$ math_scr <dbl> 690.0, 661.9, 650.9, 643.5, 639.9, 605.4, 609.0, 612.5, 616.1…
$aowijef <dbl> 35.77982, 43.04933, 37.39445, 34.71429, 37.34266, 42.81250, 3…$ es_pct   <dbl> 1.000000, 3.583333, 29.000002, 1.000000, 12.857677, 11.408759…
\$ es_frac  <dbl> 0.01000000, 0.03583334, 0.29000002, 0.01000000, 0.12857677, 0…

## Class Size Example: Scatterplot

ggplot(data = ca_school)+
aes(x = str,
y = testscr)+
geom_point(color = "blue")+
labs(x = "Student to Teacher Ratio",
y = "Test Score")+
theme_bw(base_family = "Fira Sans Condensed",
base_size = 20)

## Class Size Example: Slope I

• If we change $(\Delta)$ the class size by an amount, what would we expect the change in test scores to be?

$\beta = \frac{\text{change in test score}}{\text{change in class size}} = \frac{\Delta \text{test score}}{\Delta \text{class size}}$

• If we knew $\beta$, we could say that changing class size by 1 student will change test scores by $\beta$

## Class Size Example: Slope II

• Rearranging:

$\Delta \text{test score} = \beta \times \Delta \text{class size}$

## Class Size Example: Slope III

• Rearranging:

$\Delta \text{test score} = \beta \times \Delta \text{class size}$

• Suppose $\beta=-0.6$. If we shrank class size by 2 students, our model predicts:

\begin{align*} \Delta \text{test score} &= -2 \times \beta\\ \Delta \text{test score} &= -2 \times -0.6\\ \Delta \text{test score}&= 1.2 \\ \end{align*}

Test scores would improve by 1.2 points, on average.

## Class Size Example: Slope and Average Effect

$\text{test score} = \beta_0 + \beta_{1} \times \text{class size}$

• The line relating class size and test scores has the above equation

• $\beta_0$ is the vertical-intercept, test score where class size is 0

• $\beta_{1}$ is the slope of the regression line

• This relationship only holds on average for all districts in the population, individual districts are also affected by other factors

## Class Size Example: Marginal Effect

• To get an equation that holds for each district, we need to include other factors

$\text{test score} = \beta_0 + \beta_1 \text{class size}+\text{other factors}$

• For now, we will ignore these until Unit III

• Thus, $\beta_0 + \beta_1 \text{class size}$ gives the average effect of class sizes on scores

• Later, we will want to estimate the marginal effect (causal effect) of each factor on an individual district’s test score, holding all other factors constant

## Econometric Models: Overview I

$Y = \beta_0 + \beta_1 X + u$

• $Y$ is the dependent variable of interest
• AKA “response variable,” “regressand,” “Left-hand side (LHS) variable”
• $X_1$ is an independent variable
• AKA “explanatory variable”, “regressor,” “Right-hand side (RHS) variable”, “covariate”
• Our data consists of a spreadsheet of observed values of $(X_{1i}, X_{2i}, Y_i)$

## Econometric Models: Overview II

$Y = \beta_0 + \beta_1 X + u$

• To model, we “regress Y on $X_1$
• $\beta_0$ and $\beta_1$ are parameters that describe the population relationships between the variables
• unknown! to be estimated
• $u$ is a random error term
• ’U’nobservable, we can’t measure it, and must model with assumptions about it

## The Population Regression Model

• How do we draw a line through the scatterplot? We do not know the “true” $\beta_0$ or $\beta_1$

• We do have data from a sample of class sizes and test scores

• So the real question is, how can we estimate $\beta_0$ and $\beta_1$?

# Deriving OLS Estimators

## Actual, Predicted, and Residual Values

• With a simple linear regression model, for each associated $X$ value, we have
1. The observed (or actual) values of $\color{#0047AB}{Y_i}$

## Actual, Predicted, and Residual Values

• With a simple linear regression model, for each associated $X$ value, we have
1. The observed (or actual) values of $\color{#0047AB}{Y_i}$
2. Predicted (or fitted) values, $\color{#047806}{\hat{Y}_i}$

## Actual, Predicted, and Residual Values

• With a simple linear regression model, for each associated $X$ value, we have
1. The observed (or actual) values of $\color{#0047AB}{Y_i}$
2. Predicted (or fitted) values, $\color{#047806}{\hat{Y}_i}$
3. The residual (or error), $\color{#D7250E}{\hat{u}_i}=\color{#0047AB}{Y_i}-\color{#047806}{\hat{Y}_i}$ … the difference between predicted and observed values

\begin{align*} \color{#0047AB}{Y_i} &= \color{#047806}{\hat{Y}_i} + \color{#D7250E}{\hat{u}_i} \\ \color{#0047AB}{\text{Observed}_i} &= \color{#047806}{\text{Model}_i} + \color{#D7250E}{\text{Error}_i} \\ \end{align*}

## Deriving OLS Estimators

• Take the residuals $\color{#D7250E}{\hat{u}_i}$ and square them (why)?

## Deriving OLS Estimators

• Take the residuals $\color{#D7250E}{\hat{u}_i}$ and square them (why)?

• The regression line minimizes the sum of the squared residuals (SSR)

$SSR = \sum^n_{i=1} \color{#D7250E}{\hat{u}_i}^2$

## O-rdinary L-east S-quares Estimators

• The Ordinary Least Squares (OLS) estimators of the unknown population parameters $\beta_0$ and $\beta_1$, solve the calculus problem:

$\min_{\beta_0, \beta_1} \sum^n_{i=1}[\underbrace{Y_i-(\underbrace{\beta_0+\beta_1 X_i}_{\hat{Y_i}})}_{\hat{u_i}}]^2$

• Intuitively, OLS estimators minimize the sum of the squared residuals (distance between the actual values $Y_i$ and the predicted values $\hat{Y_i}$) along the estimated regression line

## The OLS Regression Line

• The OLS regression line or sample regression line is the linear function constructed using the OLS estimators:

$\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i$

• $\hat{\beta_0}$ and $\hat{\beta_1}$ (“beta 0 hat” & “beta 1 hat”) are the OLS estimators of population parameters $\beta_0$ and $\beta_1$ using sample data
• The predicted value of Y given X, based on the regression, is $E(Y_i|X_i)=\hat{Y_i}$
• The residual or prediction error for the $i^{th}$ observation is the difference between observed $Y_i$ and its predicted value, $\hat{u_i}=Y_i-\hat{Y_i}$

## The OLS Regression Estimators

• The solution to the SSE minimization problem yields:1

$\hat{\beta}_0=\bar{Y}-\hat{\beta}_1\bar{X}$

$\hat{\beta}_1=\frac{\displaystyle\sum^n_{i=1}(X_i-\bar{X})(Y_i-\bar{Y})}{\displaystyle\sum^n_{i=1}(X_i-\bar{X})^2}=\frac{s_{XY}}{s^2_X}= \frac{cov(X,Y)}{var(X)}$

## (Some) Properties of OLS

1. The regression line goes through the “center of mass” point $(\bar{X},\bar{Y})$
• Again, $\hat{\beta}_0= \bar{Y}-\hat{\beta}_1 \bar{X}$
1. The slope, $\hat{\beta}_1$ has the same sign as the correlation coefficient $r_{X,Y}$, and is related

$\hat{\beta}_1=r\frac{s_Y}{s_X}$

1. The residuals sum and average to zero

\begin{align*} \sum^n_{i=1} \hat{u}_i &= 0\\ \mathbb{E}[\hat{u}] &= 0 \\ \end{align*}

1. The residuals and $X$ are uncorrelated

# Our Class Size Example in R

## Class Size Scatterplot (Again)

• There is some true (unknown) population relationship:

$\text{test score}_i=\beta_0+\beta_1 str_i$

• $\beta_1=\frac{\Delta \text{test score}}{\Delta \text{str}}= ??$

## Class Size Scatterplot with Regression Line

Code
scatter+
geom_smooth(method = "lm", color = "red")

## Linear Regression in R I

# run regression of testscr on str
school_reg <- lm(testscr ~ str,
data = ca_school)

Format for regression is lm(y ~ x, data = df)

• y is dependent variable (listed first!)

• ~ means “is modeled by” or “is explained by”

• x is the independent variable

• df is name of dataframe where data is stored

This is base R (there’s no good tidyverse way to do this yet…ish1)

## Linear Regression in R II

# look at reg object
school_reg 

Call:
lm(formula = testscr ~ str, data = ca_school)

Coefficients:
(Intercept)          str
698.93        -2.28  
• Stored as an lm object called school_reg, a type of list object

## Linear Regression in R II

# get full summary
school_reg %>% summary()

Call:
lm(formula = testscr ~ str, data = ca_school)

Residuals:
Min      1Q  Median      3Q     Max
-47.727 -14.251   0.483  12.822  48.540

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 698.9330     9.4675  73.825  < 2e-16 ***
str          -2.2798     0.4798  -4.751 2.78e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared:  0.05124,   Adjusted R-squared:  0.04897
F-statistic: 22.58 on 1 and 418 DF,  p-value: 2.783e-06
• Looking at the summary, there’s a lot of information here!

• These objects are cumbersome, come from a much older, pre-tidyverse era of base R

• Luckily, we now have some more tidy ways of working with regression output!

## Tidy Regression with broom

• The broom package allows us to work with regression objects as tidier tibbles

• Several useful commands:

Command Does
tidy() Create tibble of regression coefficients & stats
glance() Create tibble of regression fit statistics
augment() Create tibble of data with regression-based variables

## Tidy Regression with broom: tidy()

• The tidy() function creates a tidy tibble of regression output
# load packages
library(broom)

# tidy regression output
school_reg %>%
tidy()
# A tibble: 2 × 5
term        estimate std.error statistic   p.value
<chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   699.       9.47      73.8  6.57e-242
2 str            -2.28     0.480     -4.75 2.78e-  6

## Tidy Regression with broom: tidy()

• The tidy() function creates a tidy tibble of regression output…with confidence intervals
# load packages
library(broom)

# tidy regression output
school_reg %>%
tidy(conf.int = TRUE)
# A tibble: 2 × 7
term        estimate std.error statistic   p.value conf.low conf.high
<chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)   699.       9.47      73.8  6.57e-242   680.      718.
2 str            -2.28     0.480     -4.75 2.78e-  6    -3.22     -1.34

## Tidy Regression with broom: glance()

• glance() shows us a lot of overall regression statistics and diagnostics
• We’ll interpret these in next class and beyond
# look at regression statistics and diagnostics
school_reg %>%
glance()
# A tibble: 1 × 12
r.squ…¹ adj.r…² sigma stati…³ p.value    df logLik   AIC   BIC devia…⁴ df.re…⁵
<dbl>   <dbl> <dbl>   <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>   <int>
1  0.0512  0.0490  18.6    22.6 2.78e-6     1 -1822. 3650. 3663. 144315.     418
# … with 1 more variable: nobs <int>, and abbreviated variable names
#   ¹​r.squared, ²​adj.r.squared, ³​statistic, ⁴​deviance, ⁵​df.residual

## Tidy Regression with broom: augment()

• augment() creates a new tibble with the data $(X,Y)$ and regression-based variables, including:
• .fitted are fitted (predicted) values from model, i.e. $\hat{Y}_i$
• .resid are residuals (errors) from model, i.e. $\hat{u}_i$
# add regression-based values to data
school_reg %>%
augment()
# A tibble: 420 × 8
testscr   str .fitted .resid    .hat .sigma  .cooksd .std.resid
<dbl> <dbl>   <dbl>  <dbl>   <dbl>  <dbl>    <dbl>      <dbl>
1    691.  17.9    658.   32.7 0.00442   18.5 0.00689       1.76
2    661.  21.5    650.   11.3 0.00475   18.6 0.000893      0.612
3    644.  18.7    656.  -12.7 0.00297   18.6 0.000700     -0.685
4    648.  17.4    659.  -11.7 0.00586   18.6 0.00117      -0.629
5    641.  18.7    656.  -15.5 0.00301   18.6 0.00105      -0.836
6    606.  21.4    650.  -44.6 0.00446   18.5 0.0130       -2.40
7    607.  19.5    654.  -47.7 0.00239   18.5 0.00794      -2.57
8    609   20.9    651.  -42.3 0.00343   18.5 0.00895      -2.28
9    612.  19.9    653.  -41.0 0.00244   18.5 0.00597      -2.21
10    613.  20.8    652.  -38.9 0.00329   18.5 0.00723      -2.09
# … with 410 more rows

## Class Size Regression Result

• Using OLS, we find:

$\widehat{\text{test score}_i}=689.93-2.28 \, str_i$

• $\hat{\beta_0} = 689.93$: test score for $str=0$
• $\hat{\beta_1} = -2.28$: for every 1 unit change in $str$, $\hat{\text{test_score}}$ changes by -2.28 points

$\text{test score}_i = 689.93 - 2.28 \, str_i + \hat{u}_i$

## Class Size Regression Residuals

.resid = testscr - .fitted

$\hat{u}_i = \text{test score}_i - \widehat{\text{test score}}_i$

$\hat{u}_i = \text{test score}_i - (689.93-2.28 \, str_i)$

## Class Size Regression: Fitted and Residual Values

aug_reg <- school_reg %>%
augment()

aug_reg %>%
dplyr::select(testscr, str, .fitted, .resid)
# A tibble: 420 × 4
testscr   str .fitted .resid
<dbl> <dbl>   <dbl>  <dbl>
1    691.  17.9    658.   32.7
2    661.  21.5    650.   11.3
3    644.  18.7    656.  -12.7
4    648.  17.4    659.  -11.7
5    641.  18.7    656.  -15.5
6    606.  21.4    650.  -44.6
7    607.  19.5    654.  -47.7
8    609   20.9    651.  -42.3
9    612.  19.9    653.  -41.0
10    613.  20.8    652.  -38.9
# … with 410 more rows

testscr = .fitted + .resid

## Class Size Regression: An Example Data Point I

• One district in our sample is Richmond Elementary
aug_reg %>%
slice(355) #

## Class Size Regression: An Example Data Point II

• .fitted value:

$\widehat{\text{Test Score}}_{\text{Richmond}}=698-2.28(22) \approx 648$

• .resid value:

$\hat{u}_{Richmond}=672-648 \approx 24$

## Making Predictions

• We can use the regression model to make a prediction for a particular $x_i$

Example

Suppose we have a school district with a student/teacher ratio of 18. What is the predicted average district test score?

\begin{align*} \widehat{\text{test score}_i} &= \hat{\beta_0}+\hat{\beta_1} \, \text{str}_i \\ &= 698.93 - 2.28 (18)\\ &= 657.89\\ \end{align*}

## Making Predictions In R

• We can do this in R with the predict()1 function, which requires (at least) two inputs:
1. An lm object (saved regression)
2. newdata with $X$ value(s) to predict $\hat{Y}$ for, as a data.frame (or tibble)
some_district <- tibble(str = 18) # make a dataframe of "new data"'

some_district # look at it just to see
# A tibble: 1 × 1
str
<dbl>
1    18

predict(school_reg, # regression lm object
newdata = some_district) # a dataframe of new data)
       1
657.8964 

## Making Predictions In R, Manually I

• Of course we could do it ourselves…
# save tidied regression

tidy_reg <- tidy(school_reg)

# look at it, again
tidy_reg
# A tibble: 2 × 5
term        estimate std.error statistic   p.value
<chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   699.       9.47      73.8  6.57e-242
2 str            -2.28     0.480     -4.75 2.78e-  6

## Making Predictions In R, Manually II

• Of course we could do it ourselves…
# extract and save beta_0
beta_0 <- tidy_reg %>%
filter(term == "(Intercept)") %>%
pull(estimate)

# check it
beta_0
 698.933

## Making Predictions In R, Manually II

• Of course we could do it ourselves…
# extract and save beta_1
beta_1 <- tidy_reg %>%
filter(term == "str") %>%
pull(estimate)
# check it
beta_1
 -2.279808

# predict for str = 18
beta_0 + beta_1 * 18
 657.8964 