3.4 — Multivariate OLS Estimators

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

Contents

The Multivariate OLS Estimators

The Expected Value of $\hat{\beta}_j$: Bias

Precision of $\hat{\beta_j}$

A Summary of Multivariate OLS Estimator Properties

(Updated) Measures of Fit

The Multivariate OLS Estimators

The Multivariate OLS Estimators

$Y_i=\beta_0+\beta_1X_{1i}+\beta_2X_{2i}+\cdots+\beta_kX_{ki}+u_i$

• The ordinary least squares (OLS) estimators of the unknown population parameters $\beta_0, \beta_1, \beta_2, \cdots, \beta_k$ solves:

$\min_{\hat{\beta_0}, \hat{\beta_1}, \hat{\beta_2}, \cdots, \hat{\beta_k}} \sum^n_{i=1}\left[\underbrace{Y_i-\underbrace{(\hat{\beta_0}+\hat{\beta_1}X_{1i}+\hat{\beta_2}X_{2i}+\cdots+ \hat{\beta_k}X_{ki})}_{\color{gray}{\hat{Y}_i}}}_{\color{gray}{\hat{u}_i}}\right]^2$

• Again, OLS estimators are chosen to minimize the sum of squared residuals (SSR)
• i.e. sum of squared “distances” between actual values of $Y_i$ and predicted values $\hat{Y_i}$

The Multivariate OLS Estimators: FYI

Math FYI

in linear algebra terms, a regression model with $n$ observations of $k$ independent variables:

$\mathbf{Y} = \mathbf{X \beta}+\mathbf{u}$

$\underbrace{\begin{pmatrix} y_1\\ y_2\\ \vdots \\ y_n\\ \end{pmatrix}}_{\mathbf{Y}_{(n \times 1)}} = \underbrace{\begin{pmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,n}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{k,1} & x_{k,2} & \cdots & x_{k,n}\\ \end{pmatrix}}_{\mathbf{X}_{(n \times k)}} \underbrace{\begin{pmatrix} \beta_1\\ \beta_2\\ \vdots \\ \beta_k \\ \end{pmatrix}}_{\mathbf{\beta}_{(k \times 1)}} + \underbrace{\begin{pmatrix} u_1\\ u_2\\ \vdots \\ u_n \\ \end{pmatrix}}_{\mathbf{u}_{(n \times 1)}}$

• The OLS estimator for $\beta$ is $\hat{\beta}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}$ 😱

• Appreciate that I am saving you from such sorrow 🤖

The Sampling Distribution of $\hat{\beta_j}$

• For any individual $\beta_j$, it has a sampling distribution:

$\hat{\beta_j} \sim N \left(E[\hat{\beta_j}], \;se(\hat{\beta_j})\right)$

• We want to know its sampling distribution’s:
• Center: $\color{#6A5ACD}{E[\hat{\beta_j}]}$; what is the expected value of our estimator?
• Spread: $\color{#6A5ACD}{se(\hat{\beta_j})}$; how precise or uncertain is our estimator?

The Sampling Distribution of $\hat{\beta_j}$

• For any individual $\beta_j$, it has a sampling distribution:

$\hat{\beta_j} \sim N \left(E[\hat{\beta_j}], \;se(\hat{\beta_j})\right)$

• We want to know its sampling distribution’s:
• Center: $\color{#6A5ACD}{E[\hat{\beta_j}]}$; what is the expected value of our estimator?
• Spread: $\color{#6A5ACD}{se(\hat{\beta_j})}$; how precise or uncertain is our estimator?

The Expected Value of $\hat{\beta_j}$: Bias

Exogeneity and Unbiasedness

• As before, $\mathbb{E}[\hat{\beta_j}]=\beta_j$ when $X_j$ is exogenous (i.e. $cor(X_j, u)=0$)
• We know the true $\mathbb{E}[\hat{\beta_j}]=\beta_j+\underbrace{cor(X_j,u)\frac{\sigma_u}{\sigma_{X_j}}}_{\text{O.V. Bias}}$
• If $X_j$ is endogenous (i.e. $cor(X_j, u)\neq 0$), contains omitted variable bias
• Let’s “see” an example of omitted variable bias and quantify it with our example

Measuring Omitted Variable Bias I

• Suppose the true population model of a relationship is:

$\color{#047806}{Y_i=\beta_0+\beta_1 X_{1i}+\beta_2 X_{2i}+u_i}$

• What happens when we run a regression and omit $X_{2i}$?
• Suppose we estimate the following omitted regression of just $Y_i$ on $X_{1i}$ (omitting $X_{2i})$:1

$\color{#0047AB}{Y_i=\alpha_0+\alpha_1 X_{1i}+\nu_i}$

Measuring Omitted Variable Bias II

• Key Question: are $X_{1i}$ and $X_{2i}$ correlated?
• Run an auxiliary regression of $X_{2i}$ on $X_{1i}$ to see:1

$\color{#6A5ACD}{X_{2i}=\delta_0+\delta_1 X_{1i}+\tau_i}$

• If $\color{#6A5ACD}{\delta_1}=0$, then $X_{1i}$ and $X_{2i}$ are not linearly related

• If $|\color{#6A5ACD}{\delta_1}|$ is very big, then $X_{1i}$ and $X_{2i}$ are strongly linearly related

Measuring Omitted Variable Bias III

• Now substitute our auxiliary regression between $X_{2i}$ and $X_{1i}$ into the true model:
• We know $\color{#6A5ACD}{X_{2i}=\delta_0+\delta_1 X_{1i}+\tau_i}$

\begin{align*} Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{X_{2i}}+u_i \\ \end{align*}

Measuring Omitted Variable Bias III

• Now substitute our auxiliary regression between $X_{2i}$ and $X_{1i}$ into the true model:
• We know $\color{#6A5ACD}{X_{2i}=\delta_0+\delta_1 X_{1i}+\tau_i}$

\begin{align*} Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{X_{2i}}+u_i \\ Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{\big(\delta_0+\delta_1 X_{1i}+\tau_i \big)}+u_i \\ \end{align*}

Measuring Omitted Variable Bias III

• Now substitute our auxiliary regression between $X_{2i}$ and $X_{1i}$ into the true model:
• We know $\color{#6A5ACD}{X_{2i}=\delta_0+\delta_1 X_{1i}+\tau_i}$

\begin{align*} Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{X_{2i}}+u_i \\ Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{\big(\delta_0+\delta_1 X_{1i}+\tau_i \big)}+u_i \\ Y_i&=(\beta_0+\beta_2 \color{#6A5ACD}{\delta_0})+(\beta_1+\beta_2 \color{#6A5ACD}{\delta_1})\color{#6A5ACD}{X_{1i}}+(\beta_2 \color{#6A5ACD}{\tau_i}+u_i)\\ \end{align*}

Measuring Omitted Variable Bias III

• Now substitute our auxiliary regression between $X_{2i}$ and $X_{1i}$ into the true model:
• We know $\color{#6A5ACD}{X_{2i}=\delta_0+\delta_1 X_{1i}+\tau_i}$

\begin{align*} Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{X_{2i}}+u_i \\ Y_i&=\beta_0+\beta_1 X_{1i}+\beta_2 \color{#6A5ACD}{\big(\delta_0+\delta_1 X_{1i}+\tau_i \big)}+u_i \\ Y_i&=(\underbrace{\beta_0+\beta_2 \color{#6A5ACD}{\delta_0}}_{\color{#0047AB}{\alpha_0}})+(\underbrace{\beta_1+\beta_2 \color{#6A5ACD}{\delta_1}}_{\color{#0047AB}{\alpha_1}})\color{#6A5ACD}{X_{1i}}+(\underbrace{\beta_2 \color{#6A5ACD}{\tau_i}+u_i}_{\color{#0047AB}{\nu_i}})\\ \end{align*}

• Now relabel each of the three terms as the OLS estimates $(\alpha$’s) and error $(\nu_i)$ from the omitted regression, so we again have:

$\color{#0047AB}{Y_i=\alpha_0+\alpha_1X_{1i}+\nu_i}$

• Crucially, this means that our OLS estimate for $X_{1i}$ in the omitted regression is:

$\color{#0047AB}{\alpha_1}=\beta_1+\beta_2 \color{#6A5ACD}{\delta_1}$

Measuring Omitted Variable Bias IV

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The Omitted Regression OLS estimate for $X_{1}$, $(\color{#0047AB}{\alpha_1})$ picks up both:
1. The true effect of $X_{1}$ on $Y$: $\beta_1$
1. The true effect of $X_2$ on $Y$: $\beta_2$…as pulled through the relationship between $X_1$ and $X_2$: $\delta_1$
• Recall our conditions for omitted variable bias from some variable $\mathbf{Z_i}$:
1. $\mathbf{Z_i}$ must be a determinant of $Y_i$ $\implies$ $\beta_2 \neq 0$
1. $\mathbf{Z_i}$ must be correlated with $X_i$ $\implies$ $\delta_1 \neq 0$
• Otherwise, if $Z_i$ does not fit these conditions, $\alpha_1=\beta_1$ and the omitted regression is unbiased!

Measuring OVB in Our Class Size Example I

• The “True” Regression $(Y_i$ on $X_{1i}$ and $X_{2i})$

$\color{#047806}{\widehat{\text{Test Score}_i}=686.03-1.10\text{ STR}_i-0.65\text{ %EL}_i}$

Measuring OVB in Our Class Size Example II

• The “Omitted” Regression $(Y_{i}$ on just $X_{1i})$

$\color{#0047AB}{\widehat{\text{Test Score}_i}=698.93-2.28\text{ STR}_i}$

Measuring OVB in Our Class Size Example III

• The “Auxiliary” Regression $(X_{2i}$ on $X_{1i})$

$\color{#6A5ACD}{\widehat{\text{%EL}_i}=-19.85+1.81\text{ STR}_i}$

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03-1.10\text{ STR}_i-0.65\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+1.81\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03 \color{#047806}{-1.10}\text{ STR}_i-0.65\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+1.81\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The true effect of STR on Test Score: -1.10

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03 \color{#047806}{-1.10}\text{ STR}_i\color{#D7250E}{-0.65}\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+1.81\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The true effect of STR on Test Score: -1.10

• The true effect of %EL on Test Score: -0.65

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03 \color{#047806}{-1.10}\text{ STR}_i\color{#D7250E}{-0.65}\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+\color{#6A5ACD}{1.81}\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The true effect of STR on Test Score: -1.10

• The true effect of %EL on Test Score: -0.65

• The relationship between STR and %EL: 1.81

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03 \color{#047806}{-1.10}\text{ STR}_i\color{#D7250E}{-0.65}\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+\color{#6A5ACD}{1.81}\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The true effect of STR on Test Score: -1.10

• The true effect of %EL on Test Score: -0.65

• The relationship between STR and %EL: 1.81

• So, for the omitted regression:

$\color{#0047AB}{-2.28}=\color{#047806}{-1.10}+\color{#D7250E}{(-0.65)} \color{#6A5ACD}{(1.81)}$

Measuring OVB in Our Class Size Example IV

“True” Regression

$\widehat{\text{Test Score}_i}=686.03 \color{#047806}{-1.10}\text{ STR}_i\color{#D7250E}{-0.65}\text{ %EL}$

“Omitted” Regression

$\widehat{\text{Test Score}_i}=698.93\color{#0047AB}{-2.28}\text{ STR}_i$

“Auxiliary” Regression

$\widehat{\text{%EL}_i}=-19.85+\color{#6A5ACD}{1.81}\text{ STR}_i$

• Omitted Regression $\alpha_1$ on STR is -2.28

$\color{#0047AB}{\alpha_1}=\color{#047806}{\beta_1}+\color{#D7250E}{\beta_2} \color{#6A5ACD}{\delta_1}$

• The true effect of STR on Test Score: -1.10

• The true effect of %EL on Test Score: -0.65

• The relationship between STR and %EL: 1.81

• So, for the omitted regression:

$\color{#0047AB}{-2.28}=\color{#047806}{-1.10}+\underbrace{\color{#D7250E}{(-0.65)} \color{#6A5ACD}{(1.81)}}_{O.V.Bias=\mathbf{-1.18}}$

Precision of $\hat{\beta_j}$

Precision of $\hat{\beta_j}$ I

• $\sigma_{\hat{\beta_j}}$; how precise or uncertain are our estimates?

• Variance $\sigma^2_{\hat{\beta_j}}$ or standard error $\sigma_{\hat{\beta_j}}$

Precision of $\hat{\beta_j}$ II

$var(\hat{\beta_j})=\underbrace{\color{#6A5ACD}{\frac{1}{1-R^2_j}}}_{\color{#6A5ACD}{VIF}} \times \frac{(SER)^2}{n \times var(X)}$

$se(\hat{\beta_j})=\sqrt{var(\hat{\beta_j})}$

• Variation in $\hat{\beta_j}$ is affected by four things now1:
1. Goodness of fit of the model (SER)
• Larger $SER$ $\rightarrow$ larger $var(\hat{\beta_j})$
2. Sample size, n
• Larger $n$ $\rightarrow$ smaller $var(\hat{\beta_j})$
3. Variance of X
• Larger $var(X)$ $\rightarrow$ smaller $var(\hat{\beta_j})$
4. Variance Inflation Factor $\color{#6A5ACD}{\frac{1}{(1-R^2_j)}}$
• Larger $VIF$, larger $var(\hat{\beta_j})$
• This is the only new effect

VIF and Multicollinearity I

• Two independent (X) variables are multicollinear:

$cor(X_j, X_l) \neq 0 \quad \forall j \neq l$

• Multicollinearity between X variables does not bias OLS estimates
• Remember, we pulled another variable out of $u$ into the regression
• If it were omitted, then it would cause omitted variable bias!
• Multicollinearity does increase the variance of each OLS estimator by

$VIF=\frac{1}{(1-R^2_j)}$

VIF and Multicollinearity II

$VIF=\frac{1}{(1-R^2_j)}$

• $R^2_j$ is the $R^2$ from an auxiliary regression of $X_j$ on all other regressors $(X$’s)
• i.e. proportion of $var(X_j)$ explained by other $X$’s

VIF and Multicollinearity III

Example

Suppose we have a regression with three regressors $(k=3)$:

$Y_i=\beta_0+\beta_1X_{1i}+\beta_2X_{2i}+\beta_3X_{3i}+u_i$

• There will be three different $R^2_j$’s, one for each regressor:

\begin{align*} R^2_1 \text{ for } X_{1i}&=\gamma+\gamma X_{2i} + \gamma X_{3i} \\ R^2_2 \text{ for } X_{2i}&=\zeta_0+\zeta_1 X_{1i} + \zeta_2 X_{3i} \\ R^2_3 \text{ for } X_{3i}&=\eta_0+\eta_1 X_{1i} + \eta_2 X_{2i} \\ \end{align*}

VIF and Multicollinearity IV

$VIF=\frac{1}{(1-R^2_j)}$

• $R^2_j$ is the $R^2$ from an auxiliary regression of $X_j$ on all other regressors $(X$’s)

• i.e. proportion of $var(X_j)$ explained by other $X$’s
• The $R_j^2$ tells us how much other regressors explain regressor $X_j$

• Key Takeaway: If other $X$ variables explain $X_j$ well (high $R^2_J$), it will be harder to tell how cleanly $X_j \rightarrow Y_i$, and so $var(\hat{\beta_j})$ will be higher

VIF and Multicollinearity V

• Common to calculate the Variance Inflation Factor (VIF) for each regressor:

$VIF=\frac{1}{(1-R^2_j)}$

• VIF quantifies the factor (scalar) by which $var(\hat{\beta_j})$ increases because of multicollinearity
• e.g. VIF of 2, 3, etc. $\implies$ variance increases by 2x, 3x, etc.
• Baseline: $R^2_j=0$ $\implies$ no multicollinearity $\implies VIF = 1$ (no inflation)
• Larger $R^2_j$ $\implies$ larger VIF
• Rule of thumb: $VIF>10$ is problematic

VIF and Multicollinearity in Our Example I

• Higher $\%EL$ predicts higher $STR$
• Hard to get a precise marginal effect of $STR$ holding $\%EL$ constant
• Don’t have much data on districts with low STR and high $\%EL$ (and vice versa)!

VIF and Multicollinearity in Our Example II

• Again, consider the correlation between the variables
ca_school %>%
# Select only the three variables we want (there are many)
select(str, testscr, el_pct) %>%
# make a correlation table (all variables must be numeric)
cor()
               str    testscr     el_pct
str      1.0000000 -0.2263628  0.1876424
testscr -0.2263628  1.0000000 -0.6441237
el_pct   0.1876424 -0.6441237  1.0000000
• $cor(STR, \%EL) = -0.644$

VIF and Multicollinearity in R I

# our multivariate regression
elreg <- lm(testscr ~ str + el_pct,
data = ca_school)

# use the "car" package for VIF function
library("car")

elreg %>% vif()
     str   el_pct
1.036495 1.036495 
• $var(\hat{\beta_1})$ on str increases by 1.036 times (3.6%) due to multicollinearity with el_pct
• $var(\hat{\beta_2})$ on el_pct increases by 1.036 times (3.6%) due to multicollinearity with str

VIF and Multicollinearity in R II

• Let’s calculate VIF manually to see where it comes from:
# run auxiliary regression of x2 on x1
auxreg <- lm(el_pct ~ str,
data = ca_school)

library(broom)
auxreg %>% tidy() # look at reg output

VIF and Multicollinearity in R III

auxreg %>% glance() # look at aux reg stats for R^2
# extract our R-squared from aux regression (R_j^2)

aux_r_sq <- glance(auxreg) %>%
pull(r.squared)

aux_r_sq # look at it
[1] 0.03520966

VIF and Multicollinearity in R IV

# calculate VIF manually

our_vif <- 1 / (1 - aux_r_sq) # VIF formula

our_vif
[1] 1.036495
• Again, multicollinearity between the two $X$ variables inflates the variance on each by 1.036 times

Another Example: Expenditures/Student I

Example

What about district expenditures per student?

ca_school %>%
select(testscr, str, el_pct, expn_stu) %>%
cor()
            testscr        str      el_pct    expn_stu
testscr   1.0000000 -0.2263628 -0.64412374  0.19127277
str      -0.2263628  1.0000000  0.18764237 -0.61998215
el_pct   -0.6441237  0.1876424  1.00000000 -0.07139604
expn_stu  0.1912728 -0.6199821 -0.07139604  1.00000000

Another Example: Expenditures/Student II

• Higher $spend$ predicts lower $STR$
• Hard to get a precise marginal effect of $STR$ holding $spend$ constant
• Don’t have much data on districts with high STR and high $spend$ (and vice versa)!

Another Example: Expenditures/Student II

Would omitting Expenditures per student cause omitted variable bias?

1. $cor(Test, spend) \neq 0$

2. $cor(STR, spend) \neq 0$

Another Example: Expenditures/Student III

vif(reg3)
     str   el_pct expn_stu
1.680787 1.040031 1.629915 
• Including expn_stu reduces bias but increases variance of $\beta_1$ by 1.68x (68%)
• and variance of $\beta_2$ by 1.04x (4%)

Multicollinearity Increases Variance

Test Scores Test Scores Test Scores
Constant 698.93*** 686.03*** 649.58***
(9.47) (7.41) (15.21)
Student Teacher Ratio −2.28*** −1.10*** −0.29
(0.48) (0.38) (0.48)
Percent ESL Students −0.65*** −0.66***
(0.04) (0.04)
Spending per Student 0.00***
(0.00)
n 420 420 420
R2 0.05 0.43 0.44
SER 18.54 14.41 14.28
* p < 0.1, ** p < 0.05, *** p < 0.01

Perfect Multicollinearity

• Perfect multicollinearity is when a regressor is an exact linear function of (an)other regressor(s)

$\widehat{Sales} = \hat{\beta_0}+\hat{\beta_1}\text{Temperature (C)} + \hat{\beta_2}\text{Temperature (F)}$

$\text{Temperature (F)}=32+1.8*\text{Temperature (C)}$

• $cor(\text{temperature (F), temperature (C)})=1$
• $R^2_j=1 \rightarrow VIF=\frac{1}{1-1} \rightarrow var(\hat{\beta_j})=0$!
• This is fatal for a regression
• A logical impossiblity, always caused by human error

Perfect Multicollinearity: Example

Example

$\widehat{TestScore_i} = \hat{\beta_0}+\hat{\beta_1}STR_i +\hat{\beta_2}\%EL+\hat{\beta_3}\%EF$

• $\%EL$: the percentage of students learning English

• $\%EF$: the percentage of students fluent in English

• $\%EF=100-\%EL$

• $|cor(\%EF, \%EL)|=1$

Perfect Multicollinearity: Example II

# generate %EF variable from %EL
ca_school_ex <- ca_school %>%
mutate(ef_pct = 100 - el_pct)

# get correlation between %EL and %EF
ca_school_ex %>%
summarize(cor = cor(ef_pct, el_pct))

Perfect Multicollinearity Example IV

mcreg <- lm(testscr ~ str + el_pct + ef_pct,
data = ca_school_ex)
summary(mcreg)

Call:
lm(formula = testscr ~ str + el_pct + ef_pct, data = ca_school_ex)

Residuals:
Min      1Q  Median      3Q     Max
-48.845 -10.240  -0.308   9.815  43.461

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.03225    7.41131  92.566  < 2e-16 ***
str          -1.10130    0.38028  -2.896  0.00398 **
el_pct       -0.64978    0.03934 -16.516  < 2e-16 ***
ef_pct             NA         NA      NA       NA
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.46 on 417 degrees of freedom
Multiple R-squared:  0.4264,    Adjusted R-squared:  0.4237
F-statistic:   155 on 2 and 417 DF,  p-value: < 2.2e-16
mcreg %>% tidy()
• Note R drops one of the multicollinear regressors (ef_pct) if you include both 🤡

A Summary of Multivariate OLS Estimator Properties

A Summary of Multivariate OLS Estimator Properties

• $\hat{\beta_j}$ on $X_j$ is biased only if there is an omitted variable $(Z)$ such that:
1. $cor(Y,Z)\neq 0$
2. $cor(X_j,Z)\neq 0$
• If $Z$ is included and $X_j$ is collinear with $Z$, this does not cause a bias
• $var[\hat{\beta_j}]$ and $se[\hat{\beta_j}]$ measure precision (or uncertainty) of estimate:

$var[\hat{\beta_j}]=\frac{1}{(1-R^2_j)}*\frac{SER^2}{n \times var[X_j]}$

• VIF from multicollinearity: $\frac{1}{(1-R^2_j)}$
• $R_j^2$ for auxiliary regression of $X_j$ on all other $X$’s
• mutlicollinearity does not bias $\hat{\beta_j}$ but raises its variance
• perfect multicollinearity if $X$’s are linear function of others

(Updated) Measures of Fit

(Updated) Measures of Fit

• Again, how well does a linear model fit the data?

• How much variation in $Y_i$ is “explained” by variation in the model $(\hat{Y_i})$?

\begin{align*} Y_i&=\hat{Y_i}+\hat{u_i}\\ \hat{u_i}&= Y_i-\hat{Y_i}\\ \end{align*}

(Updated) Measures of Fit: SER

• Again, the Standard errror of the regression (SER) estimates the standard error of $u$

$SER=\frac{SSR}{n-\mathbf{k}-1}$

• A measure of the spread of the observations around the regression line (in units of $Y$), the average “size” of the residual

• Only new change: divided by $n-\color{#6A5ACD}{k}-1$ due to use of $k+1$ degrees of freedom to first estimate $\beta_0$ and then all of the other $\beta$’s for the $k$ number of regressors1

(Updated) Measures of Fit: $R^2$

\begin{align*} R^2&=\frac{SSM}{SST}\\ &=1-\frac{SSR}{SST}\\ &=(r_{X,Y})^2 \\ \end{align*}

• Again, $R^2$ is fraction of total variation in $Y_i$ (“total sum of squares”) that is explained by variation in predicted values $(\hat{Y_i})$, i.e. our model (“model sum of squares”)

$R^2 = \frac{var(\hat{Y})}{var(Y)}$

Visualizing $R^2$

• Total Variation in Y: Areas A + D + E + G

$SST = \sum^n_{i=1}(Y_i-\bar{Y})^2$

• Variation in Y explained by X1 and X2: Areas D + E + G

$SSM = \sum^n_{i=1}(\hat{Y_i}-\bar{Y})^2$

• Unexplained variation in Y: Area A

$SSR = \sum^n_{i=1}(\hat{u_i})^2$

Compare with one X variable

$R^2 = \frac{SSM}{SST} = \frac{D+E+G}{\color{red}{A}+D+E+G}$

Visualizing $R^2$

# make a function to calc. sum of sq. devs
sum_sq <- function(x){sum((x - mean(x))^2)}

# find total sum of squares
SST <- elreg %>%
augment() %>%
summarize(SST = sum_sq(testscr))

# find explained sum of squares
SSM <- elreg %>%
augment() %>%
summarize(SSM = sum_sq(.fitted))

# look at them and divide to get R^2
tribble(
~SSM, ~SST, ~R_sq,
SSM, SST, SSM/SST
) %>%
knitr::kable()
SSM SST R_sq
64864.3 152109.6 0.4264314

$R^2 = \frac{SSM}{SST} = \frac{D+E+G}{\color{red}{A}+D+E+G}$

(Updated) Measures of Fit: Adjusted $\bar{R}^2$

• Problem: $R^2$ mechanically increases every time a new variable is added (it reduces SSR!)
• Think in the diagram: more area of $Y$ covered by more $X$ variables!
• This does not mean adding a variable improves the fit of the model per se, $R^2$ gets inflated
• We correct for this effect with the adjusted $\bar{R}^2$ which penalizes adding new variables:

$\bar{R}^2 = 1- \underbrace{\frac{n-1}{n-k-1}}_{penalty} \times \frac{SSR}{SST}$

• In the end, recall $R^2$ was never that useful1, so don’t worry about the formula
• Large sample sizes $(n)$ make $R^2$ and $\bar{R}^2$ very close

$\bar{R}^2$ In R

summary(elreg)

Call:
lm(formula = testscr ~ str + el_pct, data = ca_school)

Residuals:
Min      1Q  Median      3Q     Max
-48.845 -10.240  -0.308   9.815  43.461

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.03225    7.41131  92.566  < 2e-16 ***
str          -1.10130    0.38028  -2.896  0.00398 **
el_pct       -0.64978    0.03934 -16.516  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.46 on 417 degrees of freedom
Multiple R-squared:  0.4264,    Adjusted R-squared:  0.4237
F-statistic:   155 on 2 and 417 DF,  p-value: < 2.2e-16
glance(elreg)
• Base $R^2$ (R calls it “Multiple R-squared”) went up
• Adjusted R-squared $(\bar{R}^2)$ went down

Coefficient Plots (with modelsummary)

library(modelsummary)
modelplot(reg3,  # our regression object
coef_omit = 'Intercept') # don't show intercept

Regression Table (with modelsummary)

Simple Model MV Model 1 MV Model 2
Constant 698.93*** 686.03*** 649.58***
(9.47) (7.41) (15.21)
STR −2.28*** −1.10*** −0.29
(0.48) (0.38) (0.48)
% ESL Students −0.65*** −0.66***
(0.04) (0.04)
Spending per Student 0.00***
(0.00)
N 420 420 420
SER 18.54 14.41 14.28
* p < 0.1, ** p < 0.05, *** p < 0.01
modelsummary(models = list("Simple Model" = school_reg,
"MV Model 1" = elreg,
"MV Model 2" = reg3),
fmt = 2, # round to 2 decimals
output = "html",
coef_rename = c("(Intercept)" = "Constant",
"str" = "STR",
"el_pct" = "% ESL Students",
"expn_stu" = "Spending per Student"),
gof_map = list(
list("raw" = "nobs", "clean" = "N", "fmt" = 0),
#list("raw" = "r.squared", "clean" = "R<sup>2</sup>", "fmt" = 2),
)