Midterm Concepts
OLS Regression
Bivariate data and associations between variables (e.g.
Apparent relationships are best viewed by looking at a scatterplot
Check for associations to be positive/negative, weak/strong, linear/nonlinear, etc
: dependent variable : independent variable
Correlation coefficient (
) can quantify the strength of an association and only measures linear associations closer to 1 imply stronger correlation (near a perfect straight line)Correlation does not imply causation! Might be confounding or lurking variables (e.g.$Z$) affecting
and/or
Population regression model
: : the slope between $X$ and $Y$, number of units of from a 1 unit change in is the -intercept: literally, value of when is the error, difference between actual value of vs. predicted value of
Ordinary Least Squares (OLS) regression model
OLS estimators
and estimate population regression line from sample dataMinimize sum of squared errors (SSR)
OLS regression line
Measures of Fit
: fraction of total variation on explained by variation in according to model
Where
Standard error of the regression (or residuals), SER: average size of
, i.e. average distance between points and the regression line
Sampling Distribution of
Mean of OLS estimator
& Bias: Endogeneity & Exogeneity is exogenous if it is not correlated with the error termequivalently, knowing
should tell us nothing about (zero conditional mean assumption)if
is exogenous, OLS estimate of is unbiased
is endogenous if it is correlated with the error termIf
is endogenous, OLS estimate of is biased:Can measure strength and direction (+ or -) of bias
Note if unbiased,
, so
Assumptions about u
The mean of the errors is 0
The variance of the errors is constant over all values of
(homoskedasticity)Errors are not correlated across observations
and (no autocorrelation)There is no correlation between
and , i.e. the model is exogenous
Precision of OLS estimator
measures uncertainty/variability of estimateAffected by three factors:
Model fit, (SER)
Sample size,
Variation in
Heteroskedasticity & Homoskedasticity
Homoskedastic errors (
) have the same variance over all values ofHeteroskedastic errors (
) have different variance over values ofHeteroskedasticity does not bias our estimates, but incorrectly lowers variance & standard errors (inflating $t$-statistics and significance!)
Can correct for heteroskedasticity by using robust standard errors
Hypothesis Testing of
, oftenTwo sided alternative
One sided alternatives
or -statistic
Compare
against critical value *, or compute -value as usualConfidence intervals (95%):