2.6 — Inference for Regression
ECON 480 • Econometrics • Fall 2022
Dr. Ryan Safner
Associate Professor of Economics
safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com
\[Y_i = \beta_0+\beta_1 X_i+u_i\]
OLS estimators \((\hat{\beta_0}\) and \(\hat{\beta_1})\) are computed from a finite (specific) sample of data
Our OLS model contains 2 sources of randomness:
Sample \(\color{#6A5ACD}{\xrightarrow{\text{statistical inference}}}\) Population \(\color{#e64173}{\xrightarrow{\text{causal indentification}}}\) Unobserved Parameters
Population relationship
\(Y_i = 698.93 + -2.28 X_i + u_i\)
\(Y_i = \beta_0 + \beta_1 X_i + u_i\)
Sample 1: 50 random observations
Population relationship
\(Y_i = 698.93 + -2.28 X_i + u_i\)
Sample relationship
\(\hat{Y}_i = 708.12 + -2.54 X_i\)
Sample 2: 50 random individuals
Population relationship
\(Y_i = 698.93 + -2.28 X_i + u_i\)
Sample relationship
\(\hat{Y}_i = 708.12 + -2.54 X_i\)
Sample 3: 50 random individuals
Population relationship
\(Y_i = 698.93 + -2.28 X_i + u_i\)
Sample relationship
\(\hat{Y}_i = 708.12 + -2.54 X_i\)
Let’s repeat this process 10,000 times!
This exercise is called a (Monte Carlo) simulation
infer
package\[\mathbb{E}[\hat{\beta}_1] = \beta_1\]
But, any individual estimate can miss the mark
This leads to uncertainty about our estimated regression line
Sample \(\color{#6A5ACD}{\xrightarrow{\text{statistical inference}}}\) Population \(\color{#e64173}{\xrightarrow{\text{causal indentification}}}\) Unobserved Parameters
\[\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}X \color{#6A5ACD}{\xrightarrow{\text{🤞 hopefully 🤞}}} Y_i=\beta_0+\beta_1X+u_i\]
Our problem with uncertainty is we don’t know whether our sample estimate is close or far from the unknown population parameter
But we can use our errors to learn how well our model statistics likely estimate the true parameters
Use \(\hat{\beta_1}\) and its standard error, \(se(\hat{\beta_1})\) for statistical inference about true \(\beta_1\)
We have two options…
infer
Packageinfer
Package Iinfer
Package IIinfer
allows you to run through these steps manually to understand the process:specify()
a model
generate()
a bootstrap distribution
calculate()
the confidence interval
visualize()
with a histogram (optional)
infer
Package IIIinfer
Package IIIinfer
Package IIIinfer
Package IIIinfer
Package III👆 Bootstrapped from Our Sample
infer
Pipeline: specify()
infer
Pipeline: specify()
infer
Pipeline: generate()
infer
Pipeline: generate()
infer
Pipeline: generate()
%>%
generate(reps = n,
type = "bootstrap")
Now the magic starts, as we run a number of simulated samples
Set the number of reps
and set type
to "bootstrap"
replicate
: the “sample” number (1-1000)
creates x
and y
values (data points)
infer
Pipeline: calculate()
infer
Pipeline: calculate()
infer
Pipeline: calculate()
%>%
calculate(stat = "slope")
infer
Pipeline: calculate()
%>%
calculate(stat = "slope")
infer
Pipeline: get_confidence_interval()
%>%
get_confidence_interval()
infer
Pipeline: visualize()
infer
Pipeline: visualize()
%>%
visualize()
\(\bigg( \big[\) estimate - MOE \(\big]\), \(\big[\) estimate + MOE \(\big] \bigg)\)
Depending on our confidence level, we are essentially looking for the middle \((1-\alpha)\)% of the sampling distribution
This puts \(\alpha\) in the tails; \(\frac{\alpha}{2}\) in each tail
Recall the 68-95-99.7% empirical rule for (standard) normal distributions!1
95% of data falls within 2 standard deviations of the mean
Thus, in 95% of samples, the true parameter is likely to fall within about 2 standard deviations of the sample estimate
❌ 95% of the time, the true effect of class size on test score will be between -3.22 and -1.33
❌ We are 95% confident that a randomly selected school district will have an effect of class size on test score between -3.22 and -1.33
❌ The effect of class size on test score is -2.28 95% of the time.
✅ We are 95% confident that in similarly constructed samples, the true effect is between -3.22 and -1.33
R
base R
doesn’t show confidence intervals in the lm summary()
output, need the confint
command 2.5 % 97.5 %
(Intercept) 680.32313 717.542779
str -3.22298 -1.336637
broom
broom
’s tidy()
command can include confidence intervals