2.6 — Inference for Regression
ECON 480 • Econometrics • Fall 2022
Dr. Ryan Safner
Associate Professor of Economics
safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com


Yi=β0+β1Xi+ui
OLS estimators (^β0 and ^β1) are computed from a finite (specific) sample of data
Our OLS model contains 2 sources of randomness:
Sample statistical inference→ Population causal indentification→ Unobserved Parameters

Population relationship
Yi=698.93+−2.28Xi+ui
Yi=β0+β1Xi+ui

Sample 1: 50 random observations
Population relationship
Yi=698.93+−2.28Xi+ui
Sample relationship
ˆYi=708.12+−2.54Xi

Sample 2: 50 random individuals
Population relationship
Yi=698.93+−2.28Xi+ui
Sample relationship
ˆYi=708.12+−2.54Xi

Sample 3: 50 random individuals
Population relationship
Yi=698.93+−2.28Xi+ui
Sample relationship
ˆYi=708.12+−2.54Xi
Let’s repeat this process 10,000 times!
This exercise is called a (Monte Carlo) simulation
infer package
E[ˆβ1]=β1
But, any individual estimate can miss the mark
This leads to uncertainty about our estimated regression line

Sample statistical inference→ Population causal indentification→ Unobserved Parameters
^Yi=^β0+^β1X🤞 hopefully 🤞→Yi=β0+β1X+ui
Our problem with uncertainty is we don’t know whether our sample estimate is close or far from the unknown population parameter
But we can use our errors to learn how well our model statistics likely estimate the true parameters
Use ^β1 and its standard error, se(^β1) for statistical inference about true β1
We have two options…





infer Packageinfer Package Iinfer Package II
infer allows you to run through these steps manually to understand the process:specify() a model
generate() a bootstrap distribution
calculate() the confidence interval
visualize() with a histogram (optional)
infer Package IIIinfer Package IIIinfer Package IIIinfer Package IIIinfer Package III| ABCDEFGHIJ0123456789 |
term <chr> | estimate <dbl> | std.error <dbl> | statistic <dbl> | p.value <dbl> |
|---|---|---|---|---|
| (Intercept) | 698.932952 | 9.4674914 | 73.824514 | 6.569925e-242 |
| str | -2.279808 | 0.4798256 | -4.751327 | 2.783307e-06 |
| ABCDEFGHIJ0123456789 |
term <chr> | estimate <dbl> | std.error <dbl> | statistic <dbl> | p.value <dbl> |
|---|---|---|---|---|
| (Intercept) | 671.5164920 | 8.9597708 | 74.947954 | 1.853313e-244 |
| str | -0.9595986 | 0.4521103 | -2.122488 | 3.438415e-02 |
👆 Bootstrapped from Our Sample
infer Pipeline: specify()infer Pipeline: specify()data %>%
specify(y ~ x)
specify() function, which is essentially a lm() function for regression (for our purposes)infer Pipeline: generate()infer Pipeline: generate()infer Pipeline: generate()%>% generate(reps = n, type = "bootstrap")
Now the magic starts, as we run a number of simulated samples
Set the number of reps and set type to "bootstrap"
replicate: the “sample” number (1-1000)
creates x and y values (data points)
infer Pipeline: calculate()infer Pipeline: calculate()infer Pipeline: calculate()infer Pipeline: calculate()%>% calculate(stat = "slope")
infer Pipeline: get_confidence_interval()%>% get_confidence_interval()
| ABCDEFGHIJ0123456789 |
term <chr> | estimate <dbl> | std.error <dbl> | statistic <dbl> | p.value <dbl> | |
|---|---|---|---|---|---|
| (Intercept) | 698.932952 | 9.4674914 | 73.824514 | 6.569925e-242 | |
| str | -2.279808 | 0.4798256 | -4.751327 | 2.783307e-06 |
infer Pipeline: visualize()infer Pipeline: visualize()%>% visualize()
([ estimate - MOE ], [ estimate + MOE ])


Depending on our confidence level, we are essentially looking for the middle (1−α)% of the sampling distribution
This puts α in the tails; α2 in each tail

Recall the 68-95-99.7% empirical rule for (standard) normal distributions!1
95% of data falls within 2 standard deviations of the mean
Thus, in 95% of samples, the true parameter is likely to fall within about 2 standard deviations of the sample estimate

❌ 95% of the time, the true effect of class size on test score will be between -3.22 and -1.33
❌ We are 95% confident that a randomly selected school district will have an effect of class size on test score between -3.22 and -1.33
❌ The effect of class size on test score is -2.28 95% of the time.
✅ We are 95% confident that in similarly constructed samples, the true effect is between -3.22 and -1.33
Rbase R doesn’t show confidence intervals in the lm summary() output, need the confint command 2.5 % 97.5 %
(Intercept) 680.32313 717.542779
str -3.22298 -1.336637
broombroom’s tidy() command can include confidence intervals| ABCDEFGHIJ0123456789 |
term <chr> | estimate <dbl> | std.error <dbl> | statistic <dbl> | p.value <dbl> | |
|---|---|---|---|---|---|
| (Intercept) | 698.932952 | 9.4674914 | 73.824514 | 6.569925e-242 | |
| str | -2.279808 | 0.4798256 | -4.751327 | 2.783307e-06 |

