4.3 — Nonlinearity & Transformations

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Contents

Nonlinear Effects

Polynomial Models

Quadratic Model

Logarithmic Models

Linear-Log Model

Log-Linear Model

Log-Log Model

Standardizing & Comparing Across Units

Joint Hypothesis Testing

Nonlinear Effects

Linear Regression

  • OLS is commonly known as “linear regression” as it fits a straight line to data points

  • Often, data and relationships between variables may not be linear

Linear Regression

Linear Regression

\[\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}\]

Linear Regression

\[\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}\]

\[\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}\]

Linear Regression

\[\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}\]

\[\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}\]

\[\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}\]

Sources of Nonlinearities

  • Effect of \(X_1 \rightarrow Y\) might be nonlinear if:
  1. \(X_1 \rightarrow Y\) is different for different levels of \(X_1\)
    • e.g. diminishing returns: \(\uparrow X_1\) increases \(Y\) at a decreasing rate
    • e.g. increasing returns: \(\uparrow X_1\) increases \(Y\) at an increasing rate
  1. \(X_1 \rightarrow Y\) is different for different levels of \(X_2\)
    • e.g. interaction effects (last lesson)

Nonlinearities Alter Marginal Effects

  • Linear:

\[Y=\hat{\beta_0}+\hat{\beta_1}X\]

  • marginal effect (slope), \((\hat{\beta_1}) = \frac{\Delta Y}{\Delta X}\) is constant for all \(X\)

Nonlinearities Alter Marginal Effects

  • Polynomial:

\[Y=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2\]

  • Marginal effect, “slope” \(\left(\neq \hat{\beta_1}\right)\) depends on the value of \(X\)!

Nonlinearities Alter Marginal Effects

  • Interaction Effect:

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X_1+\hat{\beta_2}X_2+\hat{\beta_3}X_1 \times X_2\]

  • Marginal effect, “slope” depends on the value of \(X_2\)!

  • Easy example: if \(X_2\) is a dummy variable:

    • \(X_2=0\) (control) vs. \(X_2=1\) (treatment)

Polynomial Models

Polynomial Functions of \(X\) I

  • Linear

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X\]

Polynomial Functions of \(X\) I

  • Linear

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X\]

  • Quadratic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2\]

Polynomial Functions of \(X\) I

  • Linear

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X\]

  • Quadratic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2\]

  • Cubic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3\]

Polynomial Functions of \(X\) I

  • Linear

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X\]

  • Quadratic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2\]

  • Cubic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3\]

  • Quartic

\[\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3+\hat{\beta_4}X^4\]

Polynomial Functions of \(X\) II

\[\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2 + \cdots + \hat{\beta_{\color{#e64173}{r}}} X_i^{\color{#e64173}{r}} + u_i\]

  • Where \(\color{e64173}{r}\) is the highest power \(X_i\) is raised to
    • quadratic \(\color{e64173}{r=2}\)
    • cubic \(\color{e64173}{r=3}\)
  • The graph of an \(r\)th-degree polynomial function has \((r-1)\) bends
  • Just another multivariate OLS regression model!

Quadratic Model

Quadratic Model

\[\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2\]

  • Quadratic model has \(X\) and \(X^2\) variables in it (yes, need both!)
  • How to interpret coefficients (betas)?
    • \(\beta_0\) as “intercept” and \(\beta_1\) as “slope” makes no sense 🧐
    • \(\beta_1\) as effect \(X_i \rightarrow Y_i\) holding \(X_i^2\) constant??1
  • Estimate marginal effects by calculating predicted \(\hat{Y_i}\) for different levels of \(X_i\)

Quadratic Model: Calculating Marginal Effects

\[\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2\]

  • What is the marginal effect of \(\Delta X_i \rightarrow \Delta Y_i\)?
  • Take the derivative of \(Y_i\) with respect to \(X_i\):

\[\frac{\partial \, Y_i}{\partial \, X_i} = \hat{\beta_1}+2\hat{\beta_2} X_i\]

  • Marginal effect of a 1 unit change in \(X_i\) is a \(\color{#6A5ACD}{\left(\hat{\beta_1}+2\hat{\beta_2} X_i \right)}\) unit change in \(Y\)

Quadratic Model: Example I

Example

\[\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} \, \text{GDP per capita}_i+\hat{\beta_2}\, \text{GDP per capita}^2_i\]

  • Use gapminder package and data
library(gapminder)

Quadratic Model: Example II

  • These coefficients will be very large, so let’s transform gdpPercap to be in $1,000’s
gapminder <- gapminder %>%
  mutate(GDP_t = gdpPercap/1000)

gapminder %>% head() # look at it

Quadratic Model: Example II

  • Let’s also create a squared term, gdp_sq
gapminder <- gapminder %>%
  mutate(GDP_sq = GDP_t^2)

gapminder %>% head() # look at it

Quadratic Model: Example IV

  • Can “manually” run a multivariate regression with GDP_t and GDP_sq
library(broom)
reg1 <- lm(lifeExp ~ GDP_t + GDP_sq, data = gapminder)

reg1 %>% tidy()

Quadratic Model: Example IV

  • OR use gdp_t and add the I() operator to transform the variable in the regression, I(gdp_t^2)1
reg1_alt <- lm(lifeExp ~ GDP_t + I(GDP_t^2), data = gapminder)

reg1_alt %>% tidy()

Quadratic Model: Example V

\[\widehat{\text{Life Expectancy}_i} = 50.52+1.55 \, \text{GDP}_i - 0.02\, \text{GDP}^2_i\]

  • Positive effect \((\hat{\beta_1}>0)\), with diminishing returns \((\hat{\beta_2}<0)\)

  • Marginal effect of GDP on Life Expectancy depends on initial value of GDP!

Quadratic Model: Example VI

  • Marginal effect of GDP on Life Expectancy:

\[\begin{align*} \frac{\partial \, Y}{\partial \; X} &= \hat{\beta_1}+2\hat{\beta_2} X_i\\ \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &\approx 1.55+2(-0.02) \, \text{GDP}\\ &\approx \color{#e64173}{1.55-0.04 \, \text{GDP}}\\ \end{align*}\]

Quadratic Model: Example VII

\[\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}\]

Marginal effect of GDP if GDP \(=5\) ($ thousand):

\[\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(5)\\ &= 1.55-0.20\\ &=1.35\\ \end{align*}\]

  • i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 1.35 years

Quadratic Model: Example VIII

\[\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}\]

Marginal effect of GDP if GDP \(=25\) ($ thousand):

\[\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(25)\\ &= 1.55-1.00\\ &=0.55\\ \end{align*}\]

  • i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 0.55 years

Quadratic Model: Example X

\[\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}\]

Marginal effect of GDP if GDP \(=50\) ($ thousand):

\[\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(50)\\ &= 1.55-2.00\\ &=-0.45\\ \end{align*}\]

  • i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy decreases by 0.45 years

Quadratic Model: Example XI

\[\begin{align*}\widehat{\text{Life Expectancy}_i} &= 50.52+1.55 \, \text{GDP per capita}_i - 0.02\, \text{GDP per capita}^2_i \\ \frac{\partial \, \text{Life Expectancy}}{d \, \text{GDP}} &= 1.55-0.04\text{GDP} \\ \end{align*}\]

Initial GDP per capita Marginal Effect1
$5,000 \(1.35\) years
$25,000 \(0.55\) years
$50,000 \(-0.45\) years

Quadratic Model: Example XII

Code
ggplot(data = gapminder)+
  aes(x = GDP_t,
      y = lifeExp)+
  geom_point(color = "blue", alpha=0.5)+
  stat_smooth(method = "lm",
              formula = y ~ x + I(x^2),
              color = "green")+ 
  geom_vline(xintercept = c(5,25,50),
             linetype = "dashed",
             color = "red", size = 1)+
  scale_x_continuous(labels = scales::dollar,
                     breaks = seq(0,120,10))+
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100))+
  labs(x = "GDP per Capita (in Thousands)",
       y = "Life Expectancy (Years)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=16)

Quadratic Model: Maxima and Minima I

  • For a polynomial model, we can also find the predicted maximum or minimum of \(\hat{Y_i}\)
  • A quadratic model has a single global maximum or minimum (1 bend)
  • By calculus, a minimum or maximum occurs where:

\[\begin{align*} \frac{ \partial \, Y_i}{\partial \, X_i} &=0\\ \beta_1 + 2\beta_2 X_i &= 0\\ 2\beta_2 X_i&= -\beta_1\\ X_i^*&=-\frac{\beta_1}{2\beta_2}\\ \end{align*}\]

Quadratic Model: Maxima and Minima II

\[\begin{align*} GDP_i^*&=-\frac{\beta_1}{2\beta_2}\\ GDP_i^*&=-\frac{(1.55)}{2(-0.015)}\\ GDP_i^*& \approx 51.67\\ \end{align*}\]

Quadratic Model: Maxima and Minima III

Code
ggplot(data = gapminder)+
  aes(x = GDP_t,
      y = lifeExp)+
  geom_point(color = "blue", alpha=0.5)+
  stat_smooth(method = "lm",
              formula = y ~ x + I(x^2),
              color = "green")+
  geom_vline(xintercept=51.67, linetype="dashed", color="red", size = 1)+
  geom_label(x=51.67, y=90, label="$51.67", color="red")+
  scale_x_continuous(labels = scales::dollar,
                     breaks = seq(0,120,10))+
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100))+
  labs(x = "GDP per Capita (in Thousands)",
       y = "Life Expectancy (Years)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=16)

Determining If Polynomials Are Necessary I

  • Is the quadratic term necessary?
  • Determine if \(\hat{\beta_2}\) (on \(X_i^2)\) is statistically significant:
    • \(H_0: \hat{\beta_2}=0\)
    • \(H_a: \hat{\beta_2} \neq 0\)
  • Statistically significant \(\implies\) we should keep the quadratic model
    • If we only ran a linear model, it would be incorrect!

Determining Polynomials are Necessary II

  • Should we keep going up in polynomials?

Determining Polynomials are Necessary II

  • Should we keep going up in polynomials?

\[\color{#6A5ACD}{\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} GDP_i+\hat{\beta_2}GDP^2_i+\hat{\beta_3}GDP_i^3}\]

Determining Polynomials are Necessary III

  • In general, you should have a compelling theoretical reason why data or relationships should “change direction” multiple times

  • Or clear data patterns that have multiple “bends”

  • Recall, we care more about accurately measuring the causal effect of \(X \rightarrow Y\), rather than getting the most accurate prediction possible for \(\hat{Y}\)

Determining Polynomials are Necessary IV

  • \(\hat{\beta_3}\) is statistically significant…
  • …but can we really think of a good reason to complicate the model?

If You Kept Going…

  • It takes until a 9th-degree polynomial for one of the terms to become insignificant…

  • …but does this make the model better? more interpretable?

  • A famous problem of overfitting

If You Kept Going…Visually

If You Kept Going…Visually

A 4th-degree polynomial

If You Kept Going…Visually

A 9th-degree polynomial

If You Kept Going…Visually

A 14th-degree polynomial

Strategy for Polynomial Model Specification

  1. Are there good theoretical reasons for relationships changing (e.g. increasing/decreasing returns)?
  1. Plot your data: does a straight line fit well enough?
  1. Specify a polynomial function of a higher power (start with 2) and estimate OLS regression
  1. Use \(t\)-test to determine if higher-power term is significant
  1. Interpret effect of change in \(X\) on \(Y\)
  1. Repeat steps 3-5 as necessary (if there are good theoretical reasons)

Logarithmic Models

Linear Regression

\[\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}\]

\[\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}\]

\[\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}\]

Logarithmic Models

  • Another useful model for nonlinear data is the logarithmic model1
    • We transform either \(X\), \(Y\), or both by taking the (natural) logarithm
  • Logarithmic model has two additional advantages
    1. We can easily interpret coefficients as percentage changes or elasticities
    2. Useful economic shape: diminishing returns (production functions, utility functions, etc)

The Natural Logarithm

  • The exponential function, \(Y=e^X\) or \(Y=exp(X)\), where base \(e=2.71828...\)

  • Natural logarithm is the inverse, \(Y=ln(X)\)

The Natural Logarithm: Review I

  • Exponents are defined as

\[\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}\]

  • where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times
  • Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)
  • Logarithms are the inverse, defined as the exponents in the expressions above

\[\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}\]

  • \(\color{#e64173}{n}\) is the number you must raise \(\color{#6A5ACD}{b}\) to in order to get \(\color{#314f4f}{y}\)
  • Example: \(log_{\color{#6A5ACD}{2}}(\color{#314f4f}{8})=\color{#e64173}{3}\)

The Natural Logarithm: Review II

  • Logarithms can have any base, but common to use the natural logarithm \((\ln)\) with base \(\mathbf{e=2.71828...}\)

\[\text{If } e^n=y\text{, then } \ln(y)=n\]

The Natural Logarithm: Properties

  • Natural logs have a lot of useful properties:
    1. \(\ln(\frac{1}{x})=-\ln(x)\)
    2. \(\ln(ab)=\ln(a)+\ln(b)\)
    3. \(\ln(\frac{x}{a})=\ln(x)-\ln(a)\)
    4. \(\ln(x^a)=a \, \ln(x)\)
    5. \(\frac{d \, \ln \, x}{d \, x} = \frac{1}{x}\)

The Natural Logarithm: Example

  • Most useful property: for small change in \(x\), \(\Delta x\):

\[\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}\]

Example

Let \(x=100\) and \(\Delta x =1\), relative change is:

\[\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%\]

  • The logged difference:

\[\ln(101)-\ln(100) = 0.00995 \approx 1\%\]

  • This allows us to very easily interpret coefficients as percent changes or elasticities

Elasticity

  • An elasticity between any two variables, \(\epsilon_{Y,X}\) describes the responsiveness (in %) of one variable \((Y)\) to a change in another \((X)\)

\[\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}\]

  • Numerator is relative change in \(Y\), Denominator is relative change in \(X\)
  • Interpretation: a 1% change in \(X\) will cause a \(\epsilon_{Y,X}\)% change in \(Y\)

Math FYI: Cobb Douglas Functions and Logs

  • One of the (many) reasons why economists love Cobb-Douglas functions:

\[Y=AL^{\alpha}K^{\beta}\]

  • Taking logs, relationship becomes linear:

\[\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)\]

  • With data on \((Y, L, K)\) and linear regression, can estimate \(\alpha\) and \(\beta\)
    • \(\alpha\): elasticity of \(Y\) with respect to \(L\)
      • A 1% change in \(L\) will lead to an \(\alpha\)% change in \(Y\)
    • \(\beta\): elasticity of \(Y\) with respect to \(K\)
      • A 1% change in \(K\) will lead to a \(\beta\)% change in \(Y\)

Math FYI: Cobb Douglas Functions and Logs

Example

\[Y=2L^{0.75}K^{0.25}\]

  • Taking logs:

\[\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K\]

  • A 1% change in \(L\) will yield a 0.75% change in output \(Y\)

  • A 1% change in \(K\) will yield a 0.25% change in output \(Y\)

Logarithms in R I

  • The log() function can easily take the logarithm
gapminder <- gapminder %>%
  mutate(loggdp = log(gdpPercap)) # log GDP per capita

gapminder %>% head() # look at it

Logarithms in R II

  • Note, log() by default is the natural logarithm \(ln()\), i.e. base e
    • Can change base with e.g. log(x, base = 5)
    • Some common built-in logs: log10, log2
log10(100)
[1] 2
log2(16)
[1] 4
log(19683, base=3)
[1] 9

Logarithms in R III

  • Note when running a regression, you can pre-transform the data into logs (as I did above), or just add log() around a variable in the regression

Types of Logarithmic Models

  • Three types of log regression models, depending on which variables we log
  1. Linear-log model: \(Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)
  1. Log-linear model: \(\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i\)
  1. Log-log model: \(\color{#e64173}{\ln Y_i}=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)

Linear-Log Model

Linear-Log Model: Interpretation

  • Linear-log model has an independent variable \((X)\) that is logged

\[\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}\]

  • Marginal effect of \(\mathbf{X \rightarrow Y}\): a 1% change in \(X \rightarrow\) a \(\frac{\beta_1}{100}\) unit change in \(Y\)

Linear-Log Model in R

\[\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i\]

  • A 1% change in GDP \(\rightarrow\) a \(\frac{9.41}{100}=\) 0.0841 year increase in Life Expectancy
  • A 25% fall in GDP \(\rightarrow\) a \((-25 \times 0.0841)=\) 2.1025 year decrease in Life Expectancy
  • A 100% rise in GDP \(\rightarrow\) a \((100 \times 0.0841)=\) 8.4100 year increase in Life Expectancy

Linear-Log Model Graph (Linear X-Axis)

Code
ggplot(data = gapminder)+
  aes(x = gdpPercap,
      y = lifeExp)+
  geom_point(color = "blue", alpha = 0.5)+
  geom_smooth(method = "lm",
              formula = y ~ log(x),
              color = "orange")+ 
  scale_x_continuous(labels = scales::dollar,
                     breaks = seq(0,120000,20000))+
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100))+
  labs(x = "GDP per Capita",
       y = "Life Expectancy (Years)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Linear-Log Model Graph (Log X-Axis)

Code
ggplot(data = gapminder)+
  aes(x = loggdp,
      y = lifeExp)+
  geom_point(color = "blue", alpha = 0.5)+
  geom_smooth(method = "lm",
              formula = y ~ log(x),
              color = "orange")+ 
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100))+
  labs(x = "Log GDP per Capita",
       y = "Life Expectancy (Years)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Log-Linear Model

Log-Linear Model: Interpretation

  • Log-linear model has the dependent variable \((Y)\) logged

\[\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}\]

  • Marginal effect of \(\mathbf{X \rightarrow Y}\): a 1 unit change in \(X \rightarrow\) a \(\beta_1 \times 100\) % change in \(Y\)

Log-Linear Model in R (Preliminaries)

  • We will again have very large/small coefficients if we deal with GDP directly, again let’s transform gdpPercap into $1,000s, call it gdp_t

  • Then log LifeExp

gapminder <- gapminder %>%
  mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s
         loglife = log(lifeExp)) # take the log of LifeExp
gapminder %>% head() # look at it

Log-Linear Model in R

\[\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i\]

  • A $1 (thousand) change in GDP \(\rightarrow\) a \(0.013 \times 100\%=\) 1.3% increase in Life Expectancy
  • A $25 (thousand) fall in GDP \(\rightarrow\) a \((-25 \times 1.3\%)=\) 32.5% decrease in Life Expectancy
  • A $100 (thousand) rise in GDP \(\rightarrow\) a \((100 \times 1.3\%)=\) 130% increase in Life Expectancy

Linear-Log Model Graph

Code
ggplot(data = gapminder)+
  aes(x = gdp_t,
      y = loglife)+ 
  geom_point(color = "blue", alpha = 0.5)+
  geom_smooth(method = "lm", color = "orange")+
  scale_x_continuous(labels = scales::dollar,
                     breaks = seq(0,120,20))+
  labs(x = "GDP per Capita ($ Thousands)",
       y = "Log Life Expectancy")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Log-Log Model

Log-Log Model

  • Log-log model has both variables \((X \text{ and } Y)\) logged

\[\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}\]

  • Marginal effect of \(\mathbf{X \rightarrow Y}\): a 1% change in \(X \rightarrow\) a \(\beta_1\) % change in \(Y\)

  • \(\beta_1\) is the elasticity of \(Y\) with respect to \(X\)!

Log-Log Model in R

\[\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i\]

  • A 1% change in GDP \(\rightarrow\) a 0.147% increase in Life Expectancy
  • A 25% fall in GDP \(\rightarrow\) a \((-25 \times 0.147\%)=\) 3.675% decrease in Life Expectancy
  • A 100% rise in GDP \(\rightarrow\) a \((100 \times 0.147\%)=\) 14.7% increase in Life Expectancy

Log-Log Model Graph

Code
ggplot(data = gapminder)+
  aes(x = loggdp,
      y = loglife)+ 
  geom_point(color = "blue", alpha = 0.5)+
  geom_smooth(method = "lm", color = "orange")+
  labs(x = "Log GDP per Capita",
       y = "Log Life Expectancy")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Comparing Log Models I

Model Equation Interpretation
Linear-Log \(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\) 1% change in \(X \rightarrow \frac{\hat{\beta_1}}{100}\) unit change in \(Y\)
Log-Linear \(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\) 1 unit change in \(X \rightarrow \hat{\beta_1}\times 100\)% change in \(Y\)
Log-Log \(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\) 1% change in \(X \rightarrow \hat{\beta_1}\)% change in \(Y\)
  • Hint: the variable that gets logged changes in percent terms, the linear variable (not logged) changes in unit terms
    • Going from units \(\rightarrow\) percent: multiply by 100
    • Going from percent \(\rightarrow\) units: divide by 100

Comparing Models II

Code
library(modelsummary)
modelsummary(models = list("Life Exp." = lin_log_reg,
                           "Log Life Exp." = log_lin_reg,
                           "Log Life Exp." = log_log_reg),
             fmt = 2, # round to 2 decimals
             output = "html",
             coef_rename = c("(Intercept)" = "Constant",
                             "gdp_t" = "GDP per capita ($1,000s)",
                             "loggdp" = "Log GDP per Capita"),
             gof_map = list(
               list("raw" = "nobs", "clean" = "n", "fmt" = 0),
               #list("raw" = "r.squared", "clean" = "R<sup>2</sup>", "fmt" = 2),
               list("raw" = "adj.r.squared", "clean" = "Adj. R<sup>2</sup>", "fmt" = 2),
               list("raw" = "rmse", "clean" = "SER", "fmt" = 2)
             ),
             escape = FALSE,
             stars = c('*' = .1, '**' = .05, '***' = 0.01)
)
Life Exp. Log Life Exp. Log Life Exp.
Constant −9.10*** 3.97*** 2.86***
(1.23) (0.01) (0.02)
Log GDP per Capita 8.41*** 0.15***
(0.15) (0.00)
GDP per capita ($1,000s) 0.01***
(0.00)
n 1704 1704 1704
Adj. R2 0.65 0.30 0.61
SER 7.62 0.19 0.14
* p < 0.1, ** p < 0.05, *** p < 0.01
  • Models are very different units, how to choose?
    1. Compare intuition
    2. Compare \(R^2\)’s
    3. Compare graphs

Comparing Models III

Linear-Log Log-Linear Log-Log
\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\) \(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\) \(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)
\(R^2=0.65\) \(R^2=0.30\) \(R^2=0.61\)

When to Log?

  • In practice, the following types of variables are usually logged:
    • Variables that must always be positive (prices, sales, market values)
    • Very large numbers (population, GDP)
    • Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
    • Variables that have diminishing returns (output, utility)
    • Variables that have nonlinear scatterplots
  • Avoid logs for:
    • Variables that are less than one, decimals, 0, or negative
    • Categorical variables (season, gender, political party)
    • Time variables (year, week, day)

Standardizing & Comparing Across Units

Comparing Coefficients of Different Units I

\[\hat{Y_i}=\beta_0+\beta_1 X_1+\beta_2 X_2\]

  • We often want to compare coefficients to see which variable \(X_1\) or \(X_2\) has a bigger effect on \(Y\)

  • What if \(X_1\) and \(X_2\) are different units?

Example

\[\begin{align*} \widehat{\text{Salary}_i}&=\beta_0+\beta_1\, \text{Batting average}_i+\beta_2\, \text{Home runs}_i\\ \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36}\, \text{Home runs}_i\\ \end{align*}\]

Comparing Coefficients of Different Units II

  • An easy way is to standardize1 the variables (i.e. take the \(Z\)-score)

\[X_Z=\frac{X_i-\overline{X}}{sd(X)}\]

  • Note doing this will make the constant 0, as both distributions of \(X\) and \(Y\) are now centered at 0.

Comparing Coefficients of Different Units: Example

Variable Mean Std. Dev.
Salary $2,024,616 $2,764,512
Batting Average 0.267 0.031
Home Runs 12.11 10.31

\[\begin{align*}\scriptsize \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36} \, \text{Home runs}_i\\ \widehat{\text{Salary}_Z}&=\text{0.00}+\text{0.14} \, \text{Batting average}_Z+\text{0.48} \, \text{Home runs}_Z\\ \end{align*}\]

  • Marginal effects on \(Y\) (in standard deviations of \(Y\)) from 1 standard deviation change in \(X\):
  • \(\hat{\beta_1}\): a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations

\[0.14 \times \$2,764,512=\$387,032\]

  • \(\hat{\beta_2}\): a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations

\[0.48 \times \$2,764,512=\$1,326,966\]

Standardizing in R

Variable Mean SD
LifeExp 59.47 12.92
gdpPercap $7215.32 $9857.46
  • Use the scale() command inside mutate() function to standardize a variable
Code
gapminder <- gapminder %>%
  mutate(life_Z = scale(lifeExp),
         gdp_Z = scale(gdpPercap))

std_reg <- lm(life_Z ~ gdp_Z, data = gapminder)
tidy(std_reg)
  • A 1 standard deviation increase in gdpPercap will increase lifeExp by 0.584 standard deviations \((0.584 \times 12.92 = 7.55\) years)

Rescaling: Visually

Code
ggplot(data = gapminder)+
  aes(x = gdpPercap,
      y = lifeExp)+
  geom_point(color = "blue", alpha = 0.5)+
  labs(x = "GDP per Capita",
       y = "Life Expectancy (Years)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Rescaling: Visually

Code
ggplot(data = gapminder)+
  aes(x = gdp_Z,
      y = life_Z)+
  geom_point(color = "blue", alpha = 0.5)+
  geom_hline(yintercept = 0)+
  geom_vline(xintercept = 0)+
    labs(x = "GDP per Capita (Standardized)",
       y = "Life Expectancy (Standardized)")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 16)

Rescaling: Visually

  • Both \(X\) and \(Y\) now have means of 0 and sd of 1
Code
gapminder %>%
  summarize(mean_gdp = mean(gdp_Z), sd_gdp = sd(gdp_Z), mean_life = mean(life_Z), sd_life = sd(life_Z)) %>%
  round(1)

Joint Hypothesis Testing

Joint Hypothesis Testing I

Example

Return again to:

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i\]

  • Maybe region doesn’t affect wages at all?
  • \(H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0\)
  • This is a joint hypothesis (of multiple parameters) to test

Joint Hypothesis Testing II

  • A joint hypothesis tests against the null hypothesis of a value for multiple parameters:

\[\mathbf{H_0: \beta_1= \beta_2=0}\]

the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome)

  • Our alternative hypothesis is that:

\[H_1: \text{ either } \beta_1\neq0\text{ or } \beta_2\neq0\text{ or both}\]

or simply, that \(H_0\) is not true

Types of Joint Hypothesis Tests

  1. \(H_0\): \(\beta_1=\beta_2=0\)
    • Testing against the claim that multiple variables don’t matter
    • Useful under high multicollinearity between variables
    • \(H_a\): at least one parameter \(\neq\) 0
  1. \(H_0\): \(\beta_1=\beta_2\)
    • Testing whether two variables matter the same
    • Variables must be the same units
    • \(H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2\)
  1. \(H_0:\) ALL \(\beta\)’s \(=0\)
    • The “Overall F-test”
    • Testing against claim that regression model explains NO variation in \(Y\)

Joint Hypothesis Tests: F-statistic

  • The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
  • This involves comparing two models:
    1. Unrestricted model: regression with all coefficients
    2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
  • \(F\) is an analysis of variance (ANOVA)
    • essentially tests whether \(R^2\) increases statistically significantly as we go from the restricted model\(\rightarrow\)unrestricted model
  • \(F\) has its own distribution, with two sets of degrees of freedom

Joint Hypothesis F-test: Example I

Example

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i\]

  • \(H_0: \beta_2=\beta_3=\beta_4=0\)
  • \(H_a\): \(H_0\) is not true (at least one \(\beta_i \neq 0\))

Joint Hypothesis F-test: Example II

Example

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i\]

  • Unrestricted model:

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i\]

  • Restricted model:

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i\]

  • \(F\)-test: does going from restricted to unrestricted model statistically significantly improve \(R^2\)?

Calculating the F-statistic

\[F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}\]

Calculating the F-statistic

\[F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}\]

  • \(\color{#e64173}{R^2_u}\): the \(R^2\) from the unrestricted model (all variables)

Calculating the F-statistic

\[F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}\]

  • \(\color{#e64173}{R^2_u}\): the \(R^2\) from the unrestricted model (all variables)

  • \(\color{#6A5ACD}{R^2_r}\): the \(R^2\) from the restricted model (null hypothesis)

Calculating the F-statistic

\[F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}\]

  • \(\color{#e64173}{R^2_u}\): the \(R^2\) from the unrestricted model (all variables)

  • \(\color{#6A5ACD}{R^2_r}\): the \(R^2\) from the restricted model (null hypothesis)

  • \(q\): number of restrictions (number of \(\beta's=0\) under null hypothesis)

  • \(k\): number of \(X\) variables in unrestricted model (all variables)

  • \(F\) has two sets of degrees of freedom:

    • \(q\) for the numerator, \((n-k-1)\) for the denominator

Calculating the F-statistic

\[F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}\]

  • Key takeaway: The bigger the difference between \((R^2_u-R^2_r)\), the greater the improvement in fit by adding variables, the larger the \(F\)!

  • This formula is (believe it or not) actually a simplified version (assuming homoskedasticity)

    • I give you this formula to build your intuition of what F is measuring

F-test Example I

  • We’ll use the wooldridge package’s wage1 data again
# load in data from wooldridge package
library(wooldridge)
wages <- wage1

# run regressions
unrestricted_reg <- lm(wage ~ female + northcen + west + south, data = wages)
restricted_reg <- lm(wage ~ female, data = wages)

F-test Example II

  • Unrestricted model:

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i\]

  • Restricted model:

\[\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i\]

  • \(H_0: \beta_2 = \beta_3 = \beta_4 =0\)

  • \(q = 3\) restrictions (F numerator df)

  • \(n-k-1 = 526-4-1=521\) (F denominator df)

F-test Example III

  • We can use the car package’s linearHypothesis() command to run an \(F\)-test:
    • first argument: name of the (unrestricted) regression
    • second argument: vector of variable names (in quotes) you are testing
# load car package for additional regression tools
library(car) 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south")) 
  • \(p\)-value on \(F\)-test \(<0.05\), so we can reject \(H_0\)

All F-test I


Call:
lm(formula = wage ~ female + northcen + west + south, data = wages)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.3269 -2.0105 -0.7871  1.1898 17.4146 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.5654     0.3466  21.827   <2e-16 ***
female       -2.5652     0.3011  -8.520   <2e-16 ***
northcen     -0.5918     0.4362  -1.357   0.1755    
west          0.4315     0.4838   0.892   0.3729    
south        -1.0262     0.4048  -2.535   0.0115 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.443 on 521 degrees of freedom
Multiple R-squared:  0.1376,    Adjusted R-squared:  0.131 
F-statistic: 20.79 on 4 and 521 DF,  p-value: 6.501e-16
  • Last line of regression output from summary() is an All F-test
    • \(H_0:\) all \(\beta's=0\)
      • the regression explains no variation in \(Y\)
    • Calculates an F-statistic that, if high enough, is significant (p-value \(<0.05)\) enough to reject \(H_0\)

All F-test II

  • Alternatively, if you use broom instead of summary():
    • glance() command makes table of regression summary statistics
    • tidy() only shows coefficients
glance(unrestricted_reg)
  • statistic is the All F-test, p.value next to it is the p-value from the F test