2.4 — Goodness of Fit and Bias — Appendix

Deriving the OLS Estimators

The population linear regression model is:

$Y_{i} = β_{0} + β_{1} X_{i} + u_{i}$

The errors $(u_{i})$ are unobserved, but for candidate values of $\hat{β_{0}}$ and $\hat{β_{1}}$ , we can obtain an estimate of the residual. Algebraically, the error is:

$\hat{u_{i}} = Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}$

Recall our goal is to find $\hat{β_{0}}$ and $\hat{β_{1}}$ that minimizes the sum of squared errors (SSE):

$S S E = \sum_{i = 1}^{n} {\hat{u_{i}}}^{2}$

So our minimization problem is:

$min_{\hat{β_{0}}, \hat{β_{1}}} \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i})^{2}$

Using calculus, we take the partial derivatives and set it equal to 0 to find a minimum. The first order conditions are:

$\begin{aligned} \frac{\partial S S E}{\partial \hat{β_{0}}} & = - 2 \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) = 0 \\ \frac{\partial S S E}{\partial \hat{β_{1}}} & = - 2 \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) X_{i} = 0 \end{aligned}$

Finding $\hat{β_{0}}$

Working with the first FOC, divide both sides by $- 2$ :

$\sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) = 0$

Then expand the summation across all terms and divide by $n$ :

$\underset{\bar{Y}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} Y_{i}}} - \underset{\hat{β_{0}}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} \hat{β_{0}}}} - \underset{\hat{β_{1}} \bar{X}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} \hat{β_{1}} X_{i}}} = 0$

Note the first term is $\bar{Y}$ , the second is $\hat{β_{0}}$ , the third is $\hat{β_{1}} \bar{X}$ .¹

So we can rewrite as:

$\bar{Y} - \hat{β_{0}} - β_{1} = 0$

Rearranging:

$\hat{β_{0}} = \bar{Y} - \bar{X} β_{1}$

Finding $\hat{β_{1}}$

To find $\hat{β_{1}}$ , take the second FOC and divide by $- 2$ :

$\sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) X_{i} = 0$

From the formula for $\hat{β_{0}}$ , substitute in for $\hat{β_{0}}$ :

$\sum_{i = 1}^{n} (Y_{i} - [\bar{Y} - \hat{β_{1}} \bar{X}] - \hat{β_{1}} X_{i}) X_{i} = 0$

Combining similar terms:

$\sum_{i = 1}^{n} ([Y_{i} - \bar{Y}] - [X_{i} - \bar{X}] \hat{β_{1}}) X_{i} = 0$

Distribute $X_{i}$ and expand terms into the subtraction of two sums (and pull out $\hat{β_{1}}$ as a constant in the second sum:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} - \hat{β_{1}} \sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i} = 0$

Move the second term to the righthand side:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} = \hat{β_{1}} \sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i}$

Divide to keep just $\hat{β_{1}}$ on the right:

$\frac{\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i}}{\sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i}} = \hat{β_{1}}$

Note that from the rules about summation operators:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} = \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})$

and:

$\sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i} = \sum_{i = 1}^{n} (X_{i} - \bar{X}) (X_{i} - \bar{X}) = \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$

Plug in these two facts:

$\frac{\sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} = \hat{β_{1}}$

Algebraic Properties of OLS Estimators

The OLS residuals $\hat{u}$ and predicted values $\hat{Y}$ are chosen by the minimization problem to satisfy:

The expected value (average) error is 0:

$E (u_{i}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{u_{i}} = 0$

The covariance between $X$ and the errors is 0:

${\hat{σ}}_{X, u} = 0$

Note the first two properties imply strict exogeneity. That is, this is only a valid model if $X$ and $u$ are not correlated.

The expected predicted value of $Y$ is equal to the expected value of $Y$ :

$\bar{\hat{Y}} = \frac{1}{n} \sum_{i = 1}^{n} \hat{Y_{i}} = \bar{Y}$

Total sum of squares is equal to the explained sum of squares plus sum of squared errors:

$\begin{aligned} T S S & = E S S + S S E \\ \sum_{i = 1}^{n} (Y_{i} - \bar{Y})^{2} & = \sum_{i = 1}^{n} (\hat{Y_{i}} - \bar{Y})^{2} + \sum_{i = 1}^{n} u^{2} \end{aligned}$

Recall $R^{2}$ is $\frac{E S S}{T S S}$ or $1 - S S E$

The regression line passes through the point $(\bar{X}, \bar{Y})$ , i.e. the mean of $X$ and the mean of $Y$ .

Bias in $\hat{β_{1}}$

Begin with the formula we derived for $\hat{β_{1}}$ :

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Recall from Rule 6 of summations, we can rewrite the numerator as

$\begin{aligned} = & \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X}) \\ = & \sum_{i = 1}^{n} Y_{i} (X_{i} - \bar{X}) \end{aligned}$

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} Y_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

We know the true population relationship is expressed as:

$Y_{i} = β_{0} + β_{1} X_{i} + u_{i}$

Substituting this in for $Y_{i}$ in equation 2:

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} (β_{0} + β_{1} X_{i} + u_{i}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Breaking apart the sums in the numerator:

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X}) + \sum_{i = 1}^{n} β_{1} X_{i} (X_{i} - \bar{X}) + \sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

We can simplify equation 4 using Rules 4 and 5 of summations

The first term in the numerator $[\sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X})]$ has the constant $β_{0}$ , which can be pulled out of the summation. This gives us the summation of deviations, which add up to 0 as per Rule 4:

$\begin{aligned} \sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X}) & = β_{0} \sum_{i = 1}^{n} (X_{i} - \bar{X}) \\ = β_{0} (0) \\ = 0 \end{aligned}$

The second term in the numerator $[\sum_{i = 1}^{n} β_{1} X_{i} (X_{i} - \bar{X})]$ has the constant $β_{1}$ , which can be pulled out of the summation. Additionally, Rule 5 tells us $\sum_{i = 1}^{n} X_{i} (X_{i} - \bar{X}) = \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$ :

$\begin{aligned} \sum_{i = 1}^{n} β_{1} X_{1} (X_{i} - \bar{X}) & = β_{1} \sum_{i = 1}^{n} X_{i} (X_{i} - \bar{X}) \\ = β_{1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2} \end{aligned}$

When placed back in the context of being the numerator of a fraction, we can see this term simplifies to just $β_{1}$ :

$\begin{aligned} \frac{β_{1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} & = \frac{β_{1}}{1} \times \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ = β_{1} \end{aligned}$

Thus, we are left with:

$\hat{β_{1}} = β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Now, take the expectation of both sides:

$E [\hat{β_{1}}] = E [β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}]$

We can break this up, using properties of expectations. First, recall $E [a + b] = E [a] + E [b]$ , so we can break apart the two terms.

$E [\hat{β_{1}}] = E [β_{1}] + E [\frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}]$

Second, the true population value of $β_{1}$ is a constant, so $E [β_{1}] = β_{1}$ .

Third, since we assume $X$ is also “fixed” and not random, the variance of $X$ , $\sum_{i = 1}^{n} (X_{i} - \bar{X})$ , in the denominator, is just a constant, and can be brought outside the expectation.

$E [\hat{β_{1}}] = β_{1} + \frac{E [\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})]}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Thus, the properties of the equation are primarily driven by the expectation $E [\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})]$ . We now turn to this term.

Use the property of summation operators to expand the numerator term:

$\begin{aligned} \hat{β_{1}} & = β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ \hat{β_{1}} & = β_{1} + \frac{\sum_{i = 1}^{n} (u_{i} - \bar{u}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \end{aligned}$

Now divide the numerator and denominator of the second term by $\frac{1}{n}$ . Realize this gives us the covariance between $X$ and $u$ in the numerator and variance of $X$ in the denominator, based on their respective definitions.

$\begin{aligned} \hat{β_{1}} & = β_{1} + \frac{\frac{1}{n} \sum_{i = 1}^{n} (u_{i} - \bar{u}) (X_{i} - \bar{X})}{\frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ \hat{β_{1}} & = β_{1} + \frac{c o v (X, u)}{v a r (X)} \\ \hat{β_{1}} & = β_{1} + \frac{s_{X, u}}{s_{X}^{2}} \end{aligned}$

By the Zero Conditional Mean assumption of OLS, $s_{X, u} = 0$ .

Alternatively, we can express the bias in terms of correlation instead of covariance:

$E [\hat{β_{1}}] = β_{1} + \frac{c o v (X, u)}{v a r (X)}$

From the definition of correlation:

$\begin{aligned} c o r (X, u) & = \frac{c o v (X, u)}{s_{X} s_{u}} \\ c o r (X, u) s_{X} s_{u} & = c o v (X, u) \end{aligned}$

Plugging this in:

$\begin{aligned} E [\hat{β_{1}}] & = β_{1} + \frac{c o v (X, u)}{v a r (X)} \\ E [\hat{β_{1}}] & = β_{1} + \frac{[c o r (X, u) s_{x} s_{u}]}{s_{X}^{2}} \\ E [\hat{β_{1}}] & = β_{1} + \frac{c o r (X, u) s_{u}}{s_{X}} \\ E [\hat{β_{1}}] & = β_{1} + c o r (X, u) \frac{s_{u}}{s_{X}} \end{aligned}$

Proof of the Unbiasedness of $\hat{β_{1}}$

Begin with equation:²

$\hat{β_{1}} = \frac{\sum Y_{i} X_{i}}{\sum X_{i}^{2}}$

Substitute for $Y_{i}$ :

$\hat{β_{1}} = \frac{\sum (β_{1} X_{i} + u_{i}) X_{i}}{\sum X_{i}^{2}}$

Distribute $X_{i}$ in the numerator:

$\hat{β_{1}} = \frac{\sum β_{1} X_{i}^{2} + u_{i} X_{i}}{\sum X_{i}^{2}}$

Separate the sum into additive pieces:

$\hat{β_{1}} = \frac{\sum β_{1} X_{i}^{2}}{\sum X_{i}^{2}} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

$β_{1}$ is constant, so we can pull it out of the first sum:

$\hat{β_{1}} = β_{1} \frac{\sum X_{i}^{2}}{\sum X_{i}^{2}} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

Simplifying the first term, we are left with:

$\hat{β_{1}} = β_{1} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

Now if we take expectations of both sides:

$E [\hat{β_{1}}] = E [β_{1}] + E [\frac{u_{i} X_{i}}{\sum X_{i}^{2}}]$

$β_{1}$ is a constant, so the expectation of $β_{1}$ is itself.

$E [\hat{β_{1}}] = β_{1} + E [\frac{u_{i} X_{i}}{\sum X_{i}^{2}}]$

Using the properties of expectations, we can pull out $\frac{1}{\sum X_{i}^{2}}$ as a constant:

$E [\hat{β_{1}}] = β_{1} + \frac{1}{\sum X_{i}^{2}} E [\sum u_{i} X_{i}]$

Again using the properties of expectations, we can put the expectation inside the summation operator (the expectation of a sum is the sum of expectations):

$E [\hat{β_{1}}] = β_{1} + \frac{1}{\sum X_{i}^{2}} \sum E [u_{i} X_{i}]$

Under the exogeneity condition, the correlation between $X_{i}$ and $u_{i}$ is 0.

$E [\hat{β_{1}}] = β_{1}$

Footnotes

From the rules about summation operators, we define the mean of a random variable $X$ as $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ . The mean of a constant, like $β_{0}$ or $β_{1}$ is itself.↩︎
Admittedly, this is a simplified version where $\hat{β_{0}} = 0$ , but there is no loss of generality in the results.↩︎

Deriving the OLS Estimators

Finding β0^

Finding β1^