4.3 — Nonlinearity & Transformations

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

# Nonlinear Effects

## Linear Regression

• OLS is commonly known as “linear regression” as it fits a straight line to data points

• Often, data and relationships between variables may not be linear

## Linear Regression

$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$

## Linear Regression

$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$

$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$

## Linear Regression

$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$

$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$

$\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}$

## Sources of Nonlinearities

• Effect of $X_1 \rightarrow Y$ might be nonlinear if:
1. $X_1 \rightarrow Y$ is different for different levels of $X_1$
• e.g. diminishing returns: $\uparrow X_1$ increases $Y$ at a decreasing rate
• e.g. increasing returns: $\uparrow X_1$ increases $Y$ at an increasing rate
1. $X_1 \rightarrow Y$ is different for different levels of $X_2$
• e.g. interaction effects (last lesson)

## Nonlinearities Alter Marginal Effects

• Linear:

$Y=\hat{\beta_0}+\hat{\beta_1}X$

• marginal effect (slope), $(\hat{\beta_1}) = \frac{\Delta Y}{\Delta X}$ is constant for all $X$

## Nonlinearities Alter Marginal Effects

• Polynomial:

$Y=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$

• Marginal effect, “slope” $\left(\neq \hat{\beta_1}\right)$ depends on the value of $X$!

## Nonlinearities Alter Marginal Effects

• Interaction Effect:

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X_1+\hat{\beta_2}X_2+\hat{\beta_3}X_1 \times X_2$

• Marginal effect, “slope” depends on the value of $X_2$!

• Easy example: if $X_2$ is a dummy variable:

• $X_2=0$ (control) vs. $X_2=1$ (treatment)

# Polynomial Models

## Polynomial Functions of $X$ I

• Linear

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$

## Polynomial Functions of $X$ I

• Linear

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$

## Polynomial Functions of $X$ I

• Linear

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$

• Cubic

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3$

## Polynomial Functions of $X$ I

• Linear

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$

• Cubic

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3$

• Quartic

$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3+\hat{\beta_4}X^4$

## Polynomial Functions of $X$ II

$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2 + \cdots + \hat{\beta_{\color{#e64173}{r}}} X_i^{\color{#e64173}{r}} + u_i$

• Where $\color{e64173}{r}$ is the highest power $X_i$ is raised to
• quadratic $\color{e64173}{r=2}$
• cubic $\color{e64173}{r=3}$
• The graph of an $r$th-degree polynomial function has $(r-1)$ bends
• Just another multivariate OLS regression model!

$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2$

• Quadratic model has $X$ and $X^2$ variables in it (yes, need both!)
• How to interpret coefficients (betas)?
• $\beta_0$ as “intercept” and $\beta_1$ as “slope” makes no sense 🧐
• $\beta_1$ as effect $X_i \rightarrow Y_i$ holding $X_i^2$ constant??1
• Estimate marginal effects by calculating predicted $\hat{Y_i}$ for different levels of $X_i$

## Quadratic Model: Calculating Marginal Effects

$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2$

• What is the marginal effect of $\Delta X_i \rightarrow \Delta Y_i$?
• Take the derivative of $Y_i$ with respect to $X_i$:

$\frac{\partial \, Y_i}{\partial \, X_i} = \hat{\beta_1}+2\hat{\beta_2} X_i$

• Marginal effect of a 1 unit change in $X_i$ is a $\color{#6A5ACD}{\left(\hat{\beta_1}+2\hat{\beta_2} X_i \right)}$ unit change in $Y$

Example

$\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} \, \text{GDP per capita}_i+\hat{\beta_2}\, \text{GDP per capita}^2_i$

• Use gapminder package and data
library(gapminder)

• These coefficients will be very large, so let’s transform gdpPercap to be in 1,000’s gapminder <- gapminder %>% mutate(GDP_t = gdpPercap/1000) gapminder %>% head() # look at it ## Quadratic Model: Example II • Let’s also create a squared term, gdp_sq gapminder <- gapminder %>% mutate(GDP_sq = GDP_t^2) gapminder %>% head() # look at it ## Quadratic Model: Example IV • Can “manually” run a multivariate regression with GDP_t and GDP_sq library(broom) reg1 <- lm(lifeExp ~ GDP_t + GDP_sq, data = gapminder) reg1 %>% tidy() ## Quadratic Model: Example IV • OR use gdp_t and add the I() operator to transform the variable in the regression, I(gdp_t^2)1 reg1_alt <- lm(lifeExp ~ GDP_t + I(GDP_t^2), data = gapminder) reg1_alt %>% tidy() ## Quadratic Model: Example V $\widehat{\text{Life Expectancy}_i} = 50.52+1.55 \, \text{GDP}_i - 0.02\, \text{GDP}^2_i$ • Positive effect $(\hat{\beta_1}>0)$, with diminishing returns $(\hat{\beta_2}<0)$ • Marginal effect of GDP on Life Expectancy depends on initial value of GDP! ## Quadratic Model: Example VI • Marginal effect of GDP on Life Expectancy: \begin{align*} \frac{\partial \, Y}{\partial \; X} &= \hat{\beta_1}+2\hat{\beta_2} X_i\\ \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &\approx 1.55+2(-0.02) \, \text{GDP}\\ &\approx \color{#e64173}{1.55-0.04 \, \text{GDP}}\\ \end{align*} ## Quadratic Model: Example VII $\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$ Marginal effect of GDP if GDP $=5$ ( thousand):

\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(5)\\ &= 1.55-0.20\\ &=1.35\\ \end{align*}

• i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 1.35 years ## Quadratic Model: Example VIII $\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$ Marginal effect of GDP if GDP $=25$ ($ thousand):

\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(25)\\ &= 1.55-1.00\\ &=0.55\\ \end{align*}

• i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 0.55 years ## Quadratic Model: Example X $\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$ Marginal effect of GDP if GDP $=50$ ($ thousand):

\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(50)\\ &= 1.55-2.00\\ &=-0.45\\ \end{align*}

• i.e. for every addition 1 (thousand) in GDP per capita, average life expectancy decreases by 0.45 years ## Quadratic Model: Example XI \begin{align*}\widehat{\text{Life Expectancy}_i} &= 50.52+1.55 \, \text{GDP per capita}_i - 0.02\, \text{GDP per capita}^2_i \\ \frac{\partial \, \text{Life Expectancy}}{d \, \text{GDP}} &= 1.55-0.04\text{GDP} \\ \end{align*} Initial GDP per capita Marginal Effect15,000 $1.35$ years
$25,000 $0.55$ years$50,000 $-0.45$ years

Code
ggplot(data = gapminder)+
aes(x = GDP_t,
y = lifeExp)+
geom_point(color = "blue", alpha=0.5)+
stat_smooth(method = "lm",
formula = y ~ x + I(x^2),
color = "green")+
geom_vline(xintercept = c(5,25,50),
linetype = "dashed",
color = "red", size = 1)+
scale_x_continuous(labels = scales::dollar,
breaks = seq(0,120,10))+
scale_y_continuous(breaks = seq(0,100,10),
limits = c(0,100))+
labs(x = "GDP per Capita (in Thousands)",
y = "Life Expectancy (Years)")+
theme_bw(base_family = "Fira Sans Condensed",
base_size=16)

## Quadratic Model: Maxima and Minima I

• For a polynomial model, we can also find the predicted maximum or minimum of $\hat{Y_i}$
• A quadratic model has a single global maximum or minimum (1 bend)
• By calculus, a minimum or maximum occurs where:

\begin{align*} \frac{ \partial \, Y_i}{\partial \, X_i} &=0\\ \beta_1 + 2\beta_2 X_i &= 0\\ 2\beta_2 X_i&= -\beta_1\\ X_i^*&=-\frac{\beta_1}{2\beta_2}\\ \end{align*}

## Quadratic Model: Maxima and Minima II

\begin{align*} GDP_i^*&=-\frac{\beta_1}{2\beta_2}\\ GDP_i^*&=-\frac{(1.55)}{2(-0.015)}\\ GDP_i^*& \approx 51.67\\ \end{align*}

## Quadratic Model: Maxima and Minima III

Code
ggplot(data = gapminder)+
aes(x = GDP_t,
y = lifeExp)+
geom_point(color = "blue", alpha=0.5)+
stat_smooth(method = "lm",
formula = y ~ x + I(x^2),
color = "green")+
geom_vline(xintercept=51.67, linetype="dashed", color="red", size = 1)+
geom_label(x=51.67, y=90, label="51.67", color="red")+ scale_x_continuous(labels = scales::dollar, breaks = seq(0,120,10))+ scale_y_continuous(breaks = seq(0,100,10), limits = c(0,100))+ labs(x = "GDP per Capita (in Thousands)", y = "Life Expectancy (Years)")+ theme_bw(base_family = "Fira Sans Condensed", base_size=16) ## Determining If Polynomials Are Necessary I • Is the quadratic term necessary? • Determine if $\hat{\beta_2}$ (on $X_i^2)$ is statistically significant: • $H_0: \hat{\beta_2}=0$ • $H_a: \hat{\beta_2} \neq 0$ • Statistically significant $\implies$ we should keep the quadratic model • If we only ran a linear model, it would be incorrect! ## Determining Polynomials are Necessary II • Should we keep going up in polynomials? ## Determining Polynomials are Necessary II • Should we keep going up in polynomials? $\color{#6A5ACD}{\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} GDP_i+\hat{\beta_2}GDP^2_i+\hat{\beta_3}GDP_i^3}$ ## Determining Polynomials are Necessary III • In general, you should have a compelling theoretical reason why data or relationships should “change direction” multiple times • Or clear data patterns that have multiple “bends” • Recall, we care more about accurately measuring the causal effect of $X \rightarrow Y$, rather than getting the most accurate prediction possible for $\hat{Y}$ ## Determining Polynomials are Necessary IV • $\hat{\beta_3}$ is statistically significant… • …but can we really think of a good reason to complicate the model? ## If You Kept Going… • It takes until a 9th-degree polynomial for one of the terms to become insignificant… • …but does this make the model better? more interpretable? • A famous problem of overfitting ## If You Kept Going…Visually ## If You Kept Going…Visually A 4th-degree polynomial ## If You Kept Going…Visually A 9th-degree polynomial ## If You Kept Going…Visually A 14th-degree polynomial ## Strategy for Polynomial Model Specification 1. Are there good theoretical reasons for relationships changing (e.g. increasing/decreasing returns)? 1. Plot your data: does a straight line fit well enough? 1. Specify a polynomial function of a higher power (start with 2) and estimate OLS regression 1. Use $t$-test to determine if higher-power term is significant 1. Interpret effect of change in $X$ on $Y$ 1. Repeat steps 3-5 as necessary (if there are good theoretical reasons) # Logarithmic Models ## Linear Regression $\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$ $\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$ $\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}$ ## Logarithmic Models • Another useful model for nonlinear data is the logarithmic model1 • We transform either $X$, $Y$, or both by taking the (natural) logarithm • Logarithmic model has two additional advantages 1. We can easily interpret coefficients as percentage changes or elasticities 2. Useful economic shape: diminishing returns (production functions, utility functions, etc) ## The Natural Logarithm • The exponential function, $Y=e^X$ or $Y=exp(X)$, where base $e=2.71828...$ • Natural logarithm is the inverse, $Y=ln(X)$ ## The Natural Logarithm: Review I • Exponents are defined as $\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$ • where base $\color{#6A5ACD}{b}$ is multiplied by itself $\color{#e64173}{n}$ times • Example: $\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}$ • Logarithms are the inverse, defined as the exponents in the expressions above $\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$ • $\color{#e64173}{n}$ is the number you must raise $\color{#6A5ACD}{b}$ to in order to get $\color{#314f4f}{y}$ • Example: $log_{\color{#6A5ACD}{2}}(\color{#314f4f}{8})=\color{#e64173}{3}$ ## The Natural Logarithm: Review II • Logarithms can have any base, but common to use the natural logarithm $(\ln)$ with base $\mathbf{e=2.71828...}$ $\text{If } e^n=y\text{, then } \ln(y)=n$ ## The Natural Logarithm: Properties • Natural logs have a lot of useful properties: 1. $\ln(\frac{1}{x})=-\ln(x)$ 2. $\ln(ab)=\ln(a)+\ln(b)$ 3. $\ln(\frac{x}{a})=\ln(x)-\ln(a)$ 4. $\ln(x^a)=a \, \ln(x)$ 5. $\frac{d \, \ln \, x}{d \, x} = \frac{1}{x}$ ## The Natural Logarithm: Example • Most useful property: for small change in $x$, $\Delta x$: $\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$ Example Let $x=100$ and $\Delta x =1$, relative change is: $\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$ • The logged difference: $\ln(101)-\ln(100) = 0.00995 \approx 1\%$ • This allows us to very easily interpret coefficients as percent changes or elasticities ## Elasticity • An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$ $\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$ • Numerator is relative change in $Y$, Denominator is relative change in $X$ • Interpretation: a 1% change in $X$ will cause a $\epsilon_{Y,X}$% change in $Y$ ## Math FYI: Cobb Douglas Functions and Logs • One of the (many) reasons why economists love Cobb-Douglas functions: $Y=AL^{\alpha}K^{\beta}$ • Taking logs, relationship becomes linear: $\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$ • With data on $(Y, L, K)$ and linear regression, can estimate $\alpha$ and $\beta$ • $\alpha$: elasticity of $Y$ with respect to $L$ • A 1% change in $L$ will lead to an $\alpha$% change in $Y$ • $\beta$: elasticity of $Y$ with respect to $K$ • A 1% change in $K$ will lead to a $\beta$% change in $Y$ ## Math FYI: Cobb Douglas Functions and Logs Example $Y=2L^{0.75}K^{0.25}$ • Taking logs: $\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$ • A 1% change in $L$ will yield a 0.75% change in output $Y$ • A 1% change in $K$ will yield a 0.25% change in output $Y$ ## Logarithms in R I • The log() function can easily take the logarithm gapminder <- gapminder %>% mutate(loggdp = log(gdpPercap)) # log GDP per capita gapminder %>% head() # look at it ## Logarithms in R II • Note, log() by default is the natural logarithm $ln()$, i.e. base e • Can change base with e.g. log(x, base = 5) • Some common built-in logs: log10, log2 log10(100) [1] 2 log2(16) [1] 4 log(19683, base=3) [1] 9 ## Logarithms in R III • Note when running a regression, you can pre-transform the data into logs (as I did above), or just add log() around a variable in the regression ## Types of Logarithmic Models • Three types of log regression models, depending on which variables we log 1. Linear-log model: $Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$ 1. Log-linear model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i$ 1. Log-log model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$ # Linear-Log Model ## Linear-Log Model: Interpretation • Linear-log model has an independent variable $(X)$ that is logged \begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*} • Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\frac{\beta_1}{100}$ unit change in $Y$ ## Linear-Log Model in R $\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$ • A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy • A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.0841)=$ 2.1025 year decrease in Life Expectancy • A 100% rise in GDP $\rightarrow$ a $(100 \times 0.0841)=$ 8.4100 year increase in Life Expectancy ## Linear-Log Model Graph (Linear X-Axis) Code ggplot(data = gapminder)+ aes(x = gdpPercap, y = lifeExp)+ geom_point(color = "blue", alpha = 0.5)+ geom_smooth(method = "lm", formula = y ~ log(x), color = "orange")+ scale_x_continuous(labels = scales::dollar, breaks = seq(0,120000,20000))+ scale_y_continuous(breaks = seq(0,100,10), limits = c(0,100))+ labs(x = "GDP per Capita", y = "Life Expectancy (Years)")+ theme_bw(base_family = "Fira Sans Condensed", base_size = 16) ## Linear-Log Model Graph (Log X-Axis) Code ggplot(data = gapminder)+ aes(x = loggdp, y = lifeExp)+ geom_point(color = "blue", alpha = 0.5)+ geom_smooth(method = "lm", formula = y ~ log(x), color = "orange")+ scale_y_continuous(breaks = seq(0,100,10), limits = c(0,100))+ labs(x = "Log GDP per Capita", y = "Life Expectancy (Years)")+ theme_bw(base_family = "Fira Sans Condensed", base_size = 16) # Log-Linear Model ## Log-Linear Model: Interpretation • Log-linear model has the dependent variable $(Y)$ logged \begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*} • Marginal effect of $\mathbf{X \rightarrow Y}$: a 1 unit change in $X \rightarrow$ a $\beta_1 \times 100$ % change in $Y$ ## Log-Linear Model in R (Preliminaries) • We will again have very large/small coefficients if we deal with GDP directly, again let’s transform gdpPercap into1,000s, call it gdp_t

• Then log LifeExp

gapminder <- gapminder %>%
mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s loglife = log(lifeExp)) # take the log of LifeExp gapminder %>% head() # look at it ## Log-Linear Model in R $\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$ • A$1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy
• A $25 (thousand) fall in GDP $\rightarrow$ a $(-25 \times 1.3\%)=$ 32.5% decrease in Life Expectancy • A$100 (thousand) rise in GDP $\rightarrow$ a $(100 \times 1.3\%)=$ 130% increase in Life Expectancy

## Linear-Log Model Graph

Code
ggplot(data = gapminder)+
aes(x = gdp_t,
y = loglife)+
geom_point(color = "blue", alpha = 0.5)+
geom_smooth(method = "lm", color = "orange")+
scale_x_continuous(labels = scales::dollar,
breaks = seq(0,120,20))+
labs(x = "GDP per Capita (Thousands)", y = "Log Life Expectancy")+ theme_bw(base_family = "Fira Sans Condensed", base_size = 16) # Log-Log Model ## Log-Log Model • Log-log model has both variables $(X \text{ and } Y)$ logged \begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*} • Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\beta_1$ % change in $Y$ • $\beta_1$ is the elasticity of $Y$ with respect to $X$! ## Log-Log Model in R $\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$ • A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy • A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.147\%)=$ 3.675% decrease in Life Expectancy • A 100% rise in GDP $\rightarrow$ a $(100 \times 0.147\%)=$ 14.7% increase in Life Expectancy ## Log-Log Model Graph Code ggplot(data = gapminder)+ aes(x = loggdp, y = loglife)+ geom_point(color = "blue", alpha = 0.5)+ geom_smooth(method = "lm", color = "orange")+ labs(x = "Log GDP per Capita", y = "Log Life Expectancy")+ theme_bw(base_family = "Fira Sans Condensed", base_size = 16) ## Comparing Log Models I Model Equation Interpretation Linear-Log $Y=\beta_0+\beta_1 \color{#e64173}{\ln X}$ 1% change in $X \rightarrow \frac{\hat{\beta_1}}{100}$ unit change in $Y$ Log-Linear $\color{#e64173}{\ln Y}=\beta_0+\beta_1X$ 1 unit change in $X \rightarrow \hat{\beta_1}\times 100$% change in $Y$ Log-Log $\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}$ 1% change in $X \rightarrow \hat{\beta_1}$% change in $Y$ • Hint: the variable that gets logged changes in percent terms, the linear variable (not logged) changes in unit terms • Going from units $\rightarrow$ percent: multiply by 100 • Going from percent $\rightarrow$ units: divide by 100 ## Comparing Models II Code library(modelsummary) modelsummary(models = list("Life Exp." = lin_log_reg, "Log Life Exp." = log_lin_reg, "Log Life Exp." = log_log_reg), fmt = 2, # round to 2 decimals output = "html", coef_rename = c("(Intercept)" = "Constant", "gdp_t" = "GDP per capita (1,000s)",
"loggdp" = "Log GDP per Capita"),
gof_map = list(
list("raw" = "nobs", "clean" = "n", "fmt" = 0),
#list("raw" = "r.squared", "clean" = "R<sup>2</sup>", "fmt" = 2),
list("raw" = "rmse", "clean" = "SER", "fmt" = 2)
),
escape = FALSE,
stars = c('*' = .1, '**' = .05, '***' = 0.01)
)
Life Exp. Log Life Exp. Log Life Exp.
Constant −9.10*** 3.97*** 2.86***
(1.23) (0.01) (0.02)
Log GDP per Capita 8.41*** 0.15***
(0.15) (0.00)
GDP per capita (1,000s) 0.01*** (0.00) n 1704 1704 1704 Adj. R2 0.65 0.30 0.61 SER 7.62 0.19 0.14 * p < 0.1, ** p < 0.05, *** p < 0.01 • Models are very different units, how to choose? 1. Compare intuition 2. Compare $R^2$’s 3. Compare graphs ## Comparing Models III Linear-Log Log-Linear Log-Log $\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}$ $\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i$ $\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}$ $R^2=0.65$ $R^2=0.30$ $R^2=0.61$ ## When to Log? • In practice, the following types of variables are usually logged: • Variables that must always be positive (prices, sales, market values) • Very large numbers (population, GDP) • Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP) • Variables that have diminishing returns (output, utility) • Variables that have nonlinear scatterplots • Avoid logs for: • Variables that are less than one, decimals, 0, or negative • Categorical variables (season, gender, political party) • Time variables (year, week, day) # Standardizing & Comparing Across Units ## Comparing Coefficients of Different Units I $\hat{Y_i}=\beta_0+\beta_1 X_1+\beta_2 X_2$ • We often want to compare coefficients to see which variable $X_1$ or $X_2$ has a bigger effect on $Y$ • What if $X_1$ and $X_2$ are different units? Example \begin{align*} \widehat{\text{Salary}_i}&=\beta_0+\beta_1\, \text{Batting average}_i+\beta_2\, \text{Home runs}_i\\ \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36}\, \text{Home runs}_i\\ \end{align*} ## Comparing Coefficients of Different Units II • An easy way is to standardize1 the variables (i.e. take the $Z$-score) $X_Z=\frac{X_i-\overline{X}}{sd(X)}$ • Note doing this will make the constant 0, as both distributions of $X$ and $Y$ are now centered at 0. ## Comparing Coefficients of Different Units: Example Variable Mean Std. Dev. Salary2,024,616 2,764,512 Batting Average 0.267 0.031 Home Runs 12.11 10.31 \begin{align*}\scriptsize \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36} \, \text{Home runs}_i\\ \widehat{\text{Salary}_Z}&=\text{0.00}+\text{0.14} \, \text{Batting average}_Z+\text{0.48} \, \text{Home runs}_Z\\ \end{align*} • Marginal effects on $Y$ (in standard deviations of $Y$) from 1 standard deviation change in $X$: • $\hat{\beta_1}$: a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations $0.14 \times \2,764,512=\387,032$ • $\hat{\beta_2}$: a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations $0.48 \times \2,764,512=\1,326,966$ ## Standardizing in R Variable Mean SD LifeExp 59.47 12.92 gdpPercap7215.32 \$9857.46
• Use the scale() command inside mutate() function to standardize a variable
Code
gapminder <- gapminder %>%
mutate(life_Z = scale(lifeExp),
gdp_Z = scale(gdpPercap))

std_reg <- lm(life_Z ~ gdp_Z, data = gapminder)
tidy(std_reg)
• A 1 standard deviation increase in gdpPercap will increase lifeExp by 0.584 standard deviations $(0.584 \times 12.92 = 7.55$ years)

## Rescaling: Visually

Code
ggplot(data = gapminder)+
aes(x = gdpPercap,
y = lifeExp)+
geom_point(color = "blue", alpha = 0.5)+
labs(x = "GDP per Capita",
y = "Life Expectancy (Years)")+
theme_bw(base_family = "Fira Sans Condensed",
base_size = 16)

## Rescaling: Visually

Code
ggplot(data = gapminder)+
aes(x = gdp_Z,
y = life_Z)+
geom_point(color = "blue", alpha = 0.5)+
geom_hline(yintercept = 0)+
geom_vline(xintercept = 0)+
labs(x = "GDP per Capita (Standardized)",
y = "Life Expectancy (Standardized)")+
theme_bw(base_family = "Fira Sans Condensed",
base_size = 16)

## Rescaling: Visually

• Both $X$ and $Y$ now have means of 0 and sd of 1
Code
gapminder %>%
summarize(mean_gdp = mean(gdp_Z), sd_gdp = sd(gdp_Z), mean_life = mean(life_Z), sd_life = sd(life_Z)) %>%
round(1)

# Joint Hypothesis Testing

## Joint Hypothesis Testing I

Example

Return again to:

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i$

• Maybe region doesn’t affect wages at all?
• $H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0$
• This is a joint hypothesis (of multiple parameters) to test

## Joint Hypothesis Testing II

• A joint hypothesis tests against the null hypothesis of a value for multiple parameters:

$\mathbf{H_0: \beta_1= \beta_2=0}$

the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome)

• Our alternative hypothesis is that:

$H_1: \text{ either } \beta_1\neq0\text{ or } \beta_2\neq0\text{ or both}$

or simply, that $H_0$ is not true

## Types of Joint Hypothesis Tests

1. $H_0$: $\beta_1=\beta_2=0$
• Testing against the claim that multiple variables don’t matter
• Useful under high multicollinearity between variables
• $H_a$: at least one parameter $\neq$ 0
1. $H_0$: $\beta_1=\beta_2$
• Testing whether two variables matter the same
• Variables must be the same units
• $H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2$
1. $H_0:$ ALL $\beta$’s $=0$
• The “Overall F-test”
• Testing against claim that regression model explains NO variation in $Y$

## Joint Hypothesis Tests: F-statistic

• The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
• This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
• $F$ is an analysis of variance (ANOVA)
• essentially tests whether $R^2$ increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model
• $F$ has its own distribution, with two sets of degrees of freedom

## Joint Hypothesis F-test: Example I

Example

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i$

• $H_0: \beta_2=\beta_3=\beta_4=0$
• $H_a$: $H_0$ is not true (at least one $\beta_i \neq 0$)

## Joint Hypothesis F-test: Example II

Example

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i$

• Unrestricted model:

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i$

• Restricted model:

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i$

• $F$-test: does going from restricted to unrestricted model statistically significantly improve $R^2$?

## Calculating the F-statistic

$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$

## Calculating the F-statistic

$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$

• $\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)

## Calculating the F-statistic

$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$

• $\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)

• $\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)

## Calculating the F-statistic

$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$

• $\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)

• $\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)

• $q$: number of restrictions (number of $\beta's=0$ under null hypothesis)

• $k$: number of $X$ variables in unrestricted model (all variables)

• $F$ has two sets of degrees of freedom:

• $q$ for the numerator, $(n-k-1)$ for the denominator

## Calculating the F-statistic

$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$

• Key takeaway: The bigger the difference between $(R^2_u-R^2_r)$, the greater the improvement in fit by adding variables, the larger the $F$!

• This formula is (believe it or not) actually a simplified version (assuming homoskedasticity)

• I give you this formula to build your intuition of what F is measuring

## F-test Example I

• We’ll use the wooldridge package’s wage1 data again
# load in data from wooldridge package
library(wooldridge)
wages <- wage1

# run regressions
unrestricted_reg <- lm(wage ~ female + northcen + west + south, data = wages)
restricted_reg <- lm(wage ~ female, data = wages)

## F-test Example II

• Unrestricted model:

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i+\hat{\beta_2}\text{Northeast}_i+\hat{\beta_3}\,\text{Midwest}_i+\hat{\beta_4}\,\text{South}_i$

• Restricted model:

$\widehat{\text{Wage}}_i=\hat{\beta_0}+\hat{\beta_1} \, \text{Male}_i$

• $H_0: \beta_2 = \beta_3 = \beta_4 =0$

• $q = 3$ restrictions (F numerator df)

• $n-k-1 = 526-4-1=521$ (F denominator df)

## F-test Example III

• We can use the car package’s linearHypothesis() command to run an $F$-test:
• first argument: name of the (unrestricted) regression
• second argument: vector of variable names (in quotes) you are testing
# load car package for additional regression tools
library(car)
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south")) 
• $p$-value on $F$-test $<0.05$, so we can reject $H_0$

## All F-test I


Call:
lm(formula = wage ~ female + northcen + west + south, data = wages)

Residuals:
Min      1Q  Median      3Q     Max
-6.3269 -2.0105 -0.7871  1.1898 17.4146

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   7.5654     0.3466  21.827   <2e-16 ***
female       -2.5652     0.3011  -8.520   <2e-16 ***
northcen     -0.5918     0.4362  -1.357   0.1755
west          0.4315     0.4838   0.892   0.3729
south        -1.0262     0.4048  -2.535   0.0115 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.443 on 521 degrees of freedom
Multiple R-squared:  0.1376,    Adjusted R-squared:  0.131
F-statistic: 20.79 on 4 and 521 DF,  p-value: 6.501e-16
• Last line of regression output from summary() is an All F-test
• $H_0:$ all $\beta's=0$
• the regression explains no variation in $Y$
• Calculates an F-statistic that, if high enough, is significant (p-value $<0.05)$ enough to reject $H_0$

## All F-test II

• Alternatively, if you use broom instead of summary():
• glance() command makes table of regression summary statistics
• tidy() only shows coefficients
glance(unrestricted_reg)
• statistic is the All F-test, p.value next to it is the p-value from the F test