ECON 480 — Econometrics
  • Syllabus
  • Schedule
  • Content
  • Assignments
  • Resources

2.4 — Goodness of Fit and Bias — Appendix

  • Overview
    • Resources
  • Computing Guides
    • How to Make a PDF
    • Zipping and Unzipping Files
  • R Resources
    • Installing R and R Studio
    • Suggested Code Style Guide
  • Advice
    • Tips for Success
    • Data Sources & Suggestions
    • Math & R Appendices
    • Statistics Guide
  • Math & Stats Appendices
    • Appendix 2.1
    • Appendix 2.2
    • Appendix 2.3
    • Appendix 2.4
    • Appendix 2.6

On this page

  • Deriving the OLS Estimators
    • Finding β0^
    • Finding β1^
  • Algebraic Properties of OLS Estimators
  • Bias in β1^
  • Proof of the Unbiasedness of β1^

2.4 — Goodness of Fit and Bias — Appendix

Deriving the OLS Estimators

The population linear regression model is:

Yi=β0+β1Xi+ui

The errors (ui) are unobserved, but for candidate values of β0^ and β1^, we can obtain an estimate of the residual. Algebraically, the error is:

ui^=Yi−β0^−β1^Xi

Recall our goal is to find β0^ and β1^ that minimizes the sum of squared errors (SSE):

SSE=∑i=1nui^2

So our minimization problem is:

minβ0^,β1^∑i=1n(Yi−β0^−β1^Xi)2

Using calculus, we take the partial derivatives and set it equal to 0 to find a minimum. The first order conditions are:

∂SSE∂β0^=−2∑i=1n(Yi−β0^−β1^Xi)=0∂SSE∂β1^=−2∑i=1n(Yi−β0^−β1^Xi)Xi=0

Finding β0^

Working with the first FOC, divide both sides by −2:

∑i=1n(Yi−β0^−β1^Xi)=0

Then expand the summation across all terms and divide by n:

1n∑i=1nYi⏟Y¯−1n∑i=1nβ0^⏟β0^−1n∑i=1nβ1^Xi⏟β1^X¯=0

Note the first term is Y¯, the second is β0^, the third is β1^X¯.1

So we can rewrite as:

Y¯−β0^−β1=0

Rearranging:

β0^=Y¯−X¯β1

Finding β1^

To find β1^, take the second FOC and divide by −2:

∑i=1n(Yi−β0^−β1^Xi)Xi=0

From the formula for β0^, substitute in for β0^:

∑i=1n(Yi−[Y¯−β1^X¯]−β1^Xi)Xi=0

Combining similar terms:

∑i=1n([Yi−Y¯]−[Xi−X¯]β1^)Xi=0

Distribute Xi and expand terms into the subtraction of two sums (and pull out β1^ as a constant in the second sum:

∑i=1n[Yi−Y¯]Xi−β1^∑i=1n[Xi−X¯]Xi=0

Move the second term to the righthand side:

∑i=1n[Yi−Y¯]Xi=β1^∑i=1n[Xi−X¯]Xi

Divide to keep just β1^ on the right:

∑i=1n[Yi−Y¯]Xi∑i=1n[Xi−X¯]Xi=β1^

Note that from the rules about summation operators:

∑i=1n[Yi−Y¯]Xi=∑i=1n(Yi−Y¯)(Xi−X¯)

and:

∑i=1n[Xi−X¯]Xi=∑i=1n(Xi−X¯)(Xi−X¯)=∑i=1n(Xi−X¯)2

Plug in these two facts:

∑i=1n(Yi−Y¯)(Xi−X¯)∑i=1n(Xi−X¯)2=β1^

Algebraic Properties of OLS Estimators

The OLS residuals u^ and predicted values Y^ are chosen by the minimization problem to satisfy:

  1. The expected value (average) error is 0:

E(ui)=1n∑i=1nui^=0

  1. The covariance between X and the errors is 0:

σ^X,u=0

Note the first two properties imply strict exogeneity. That is, this is only a valid model if X and u are not correlated.

  1. The expected predicted value of Y is equal to the expected value of Y:

Y^¯=1n∑i=1nYi^=Y¯

  1. Total sum of squares is equal to the explained sum of squares plus sum of squared errors:

TSS=ESS+SSE∑i=1n(Yi−Y¯)2=∑i=1n(Yi^−Y¯)2+∑i=1nu2

Recall R2 is ESSTSS or 1−SSE

  1. The regression line passes through the point (X¯,Y¯), i.e. the mean of X and the mean of Y.

Bias in β1^

Begin with the formula we derived for β1^:

β1^=∑i=1n(Yi−Y¯)(Xi−X¯)∑i=1n(Xi−X¯)2

Recall from Rule 6 of summations, we can rewrite the numerator as

=∑i=1n(Yi−Y¯)(Xi−X¯)=∑i=1nYi(Xi−X¯)

β1^=∑i=1nYi(Xi−X¯)∑i=1n(Xi−X¯)2

We know the true population relationship is expressed as:

Yi=β0+β1Xi+ui

Substituting this in for Yi in equation 2:

β1^=∑i=1n(β0+β1Xi+ui)(Xi−X¯)∑i=1n(Xi−X¯)2

Breaking apart the sums in the numerator:

β1^=∑i=1nβ0(Xi−X¯)+∑i=1nβ1Xi(Xi−X¯)+∑i=1nui(Xi−X¯)∑i=1n(Xi−X¯)2

We can simplify equation 4 using Rules 4 and 5 of summations

  1. The first term in the numerator [∑i=1nβ0(Xi−X¯)] has the constant β0, which can be pulled out of the summation. This gives us the summation of deviations, which add up to 0 as per Rule 4:

∑i=1nβ0(Xi−X¯)=β0∑i=1n(Xi−X¯)=β0(0)=0

  1. The second term in the numerator [∑i=1nβ1Xi(Xi−X¯)] has the constant β1, which can be pulled out of the summation. Additionally, Rule 5 tells us ∑i=1nXi(Xi−X¯)=∑i=1n(Xi−X¯)2:

∑i=1nβ1X1(Xi−X¯)=β1∑i=1nXi(Xi−X¯)=β1∑i=1n(Xi−X¯)2

When placed back in the context of being the numerator of a fraction, we can see this term simplifies to just β1:

β1∑i=1n(Xi−X¯)2∑i=1n(Xi−X¯)2=β11×∑i=1n(Xi−X¯)2∑i=1n(Xi−X¯)2=β1

Thus, we are left with:

β1^=β1+∑i=1nui(Xi−X¯)∑i=1n(Xi−X¯)2

Now, take the expectation of both sides:

E[β1^]=E[β1+∑i=1nui(Xi−X¯)∑i=1n(Xi−X¯)2]

We can break this up, using properties of expectations. First, recall E[a+b]=E[a]+E[b], so we can break apart the two terms.

E[β1^]=E[β1]+E[∑i=1nui(Xi−X¯)∑i=1n(Xi−X¯)2]

Second, the true population value of β1 is a constant, so E[β1]=β1.

Third, since we assume X is also “fixed” and not random, the variance of X, ∑i=1n(Xi−X¯), in the denominator, is just a constant, and can be brought outside the expectation.

E[β1^]=β1+E[∑i=1nui(Xi−X¯)]∑i=1n(Xi−X¯)2

Thus, the properties of the equation are primarily driven by the expectation E[∑i=1nui(Xi−X¯)]. We now turn to this term.

Use the property of summation operators to expand the numerator term:

β1^=β1+∑i=1nui(Xi−X¯)∑i=1n(Xi−X¯)2β1^=β1+∑i=1n(ui−u¯)(Xi−X¯)∑i=1n(Xi−X¯)2

Now divide the numerator and denominator of the second term by 1n. Realize this gives us the covariance between X and u in the numerator and variance of X in the denominator, based on their respective definitions.

β1^=β1+1n∑i=1n(ui−u¯)(Xi−X¯)1n∑i=1n(Xi−X¯)2β1^=β1+cov(X,u)var(X)β1^=β1+sX,usX2

By the Zero Conditional Mean assumption of OLS, sX,u=0.

Alternatively, we can express the bias in terms of correlation instead of covariance:

E[β1^]=β1+cov(X,u)var(X)

From the definition of correlation:

cor(X,u)=cov(X,u)sXsucor(X,u)sXsu=cov(X,u)

Plugging this in:

E[β1^]=β1+cov(X,u)var(X)E[β1^]=β1+[cor(X,u)sxsu]sX2E[β1^]=β1+cor(X,u)susXE[β1^]=β1+cor(X,u)susX

Proof of the Unbiasedness of β1^

Begin with equation:2

β1^=∑YiXi∑Xi2

Substitute for Yi:

β1^=∑(β1Xi+ui)Xi∑Xi2

Distribute Xi in the numerator:

β1^=∑β1Xi2+uiXi∑Xi2

Separate the sum into additive pieces:

β1^=∑β1Xi2∑Xi2+uiXi∑Xi2

β1 is constant, so we can pull it out of the first sum:

β1^=β1∑Xi2∑Xi2+uiXi∑Xi2

Simplifying the first term, we are left with:

β1^=β1+uiXi∑Xi2

Now if we take expectations of both sides:

E[β1^]=E[β1]+E[uiXi∑Xi2]

β1 is a constant, so the expectation of β1 is itself.

E[β1^]=β1+E[uiXi∑Xi2]

Using the properties of expectations, we can pull out 1∑Xi2 as a constant:

E[β1^]=β1+1∑Xi2E[∑uiXi]

Again using the properties of expectations, we can put the expectation inside the summation operator (the expectation of a sum is the sum of expectations):

E[β1^]=β1+1∑Xi2∑E[uiXi]

Under the exogeneity condition, the correlation between Xi and ui is 0.

E[β1^]=β1

Footnotes

  1. From the rules about summation operators, we define the mean of a random variable X as X¯=1n∑i=1nXi. The mean of a constant, like β0 or β1 is itself.↩︎

  2. Admittedly, this is a simplified version where β0^=0, but there is no loss of generality in the results.↩︎

Content 2022 by Ryan Safner
All content licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0)
Made with and Quarto
View the source at GitHub