1.1 — Introduction to Econometrics

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

About Me

  • Ph.D (Economics) — George Mason University, 2015

  • B.A. (Economics) — University of Connecticut, 2011

  • 7th year teaching at Hood

  • Specializations:

    • Law and Economics
    • Austrian Economics
  • Research interests

    • modeling innovation & economic growth
    • political economy & economic history of intellectual property

What’s Keeping Me Busy

What is Econometrics?

Why Everyone, Yes Everyone, Should Learn Statistics

We’re Not So Good at Statistics: Votes I

  • Votes in the U.S. House of Representatives in favor of passing the Civil Rights Act of 1964:
Democrat Republican
61% 80%
  • On average, Republicans tended to vote for passage more than Democrats

We’re Not So Good at Statistics: Votes

  • Votes in the U.S. House of Representatives in favor of passing the Civil Rights Act of 1964:
Democrat Republican
North 94% 85%
(145/154) (138/162)
South 7% 0%
(7/94) (0/10)
Overall 61% 80%
(152/248) (138/172)
  • Larger proportion of Democrats \((\frac{94}{248}\), 38%) than Republicans \((\frac{10}{172}\), 6%) were from South

  • The 7% of southern Democrats voting for the Act dragged down the Democrats’ overall percentage more than the 0% of southern Republicans

We’re Not So Good at Statistics: Kidney Stones

  • Suppose you suffer from kidney stones, your doctor offers you treatment A or treatment B

  • In clinical trials, Treatment A was effective for a higher percentage of patients with large stones and a higher percentage of patients with small stones

  • Treatment B was effective for a larger percentage of patients overall than treatment A

  • Wait, what?

We’re Not So Good at Statistics: Kidney Stones

From a real medical study:

Treatment A Treatment B
Small Stones 93% 87%
(81/87) (234/270)
Large Stones 73% 69%
(192/263) (55/80)
Overall 78% 83%
(273/350) (289/350)

C R Charig, D R Webb, S R Payne, and J E Wickham, 1986, “Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy,” Br Med J (Clin Res Ed) 292(6524): 879–882.

We’re Not So Good at Statistics: Kidney Stones

From a real medical study:

Treatment A Treatment B
Small Stones 93% 87%
(81/87) (234/270)
Large Stones 73% 69%
(192/263) (55/80)
Overall 78% 83%
(273/350) (289/350)

C R Charig, D R Webb, S R Payne, and J E Wickham, 1986, “Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy,” Br Med J (Clin Res Ed) 292(6524): 879–882.

  • The sizes of the two groups (i.e. who gets A vs B) are very different

We’re Not So Good at Statistics: Kidney Stones

  • The sizes of the two groups (i.e. who gets A vs B) are very different
  • A lurking variable in the study is the severity of the case: doctors tended to give treatment B for less severe cases

Simpson’s Paradox

Simpson’s Paradox: The correlation between two variables can change (even reverse!) when additional variables are considered]

We’re Not so Good at Statistics: Smoking

  • 1964: U.S. Surgeon General issued a report claiming that cigarette smoking causes lung cancer

  • Evidence based primarily on correlations between cigarette smoking and lung cancer

We’re Not so Good at Statistics: Smoking

  • Tobacco companies attacked the report, naturally

We’re Not so Good at Statistics: Smoking

Ronald A. Fisher

1890—1924

We’re Not so Good at Statistics: Smoking

  • There could be a confounding variable (“smoking gene”) that causes both lung cancer and the urge to smoke

  • Would imply: decision to smoke or not would have no impact on lung cancer!

  • Correlation between smoking and cancer is spurious!

Correlation Does Not Imply Causation

  • The goal of every intro statistics class ever

XKCD: Correlation

Correlation Does Not Imply Causation

Spurious Correlations

Correlation Does Not Imply Causation…

  • It’s always good to be skeptical of causal claims

  • But this is actually where econometrics shines

Econometrics

  • Econometrics is the application of statistical tools to quantify economic relationships in the real world

  • Uses real data to

    • test economic hypotheses
    • quantitatively estimate the magnitude of relationships between economic variables
    • forecast future events

Econometrics and Causal Inference

  • What sets econometrics apart from mere statistics (or uses of statistics in other disciplines) is its role in causal inference

  • We can, with proper tools and interprations, make quantitative causal claims

    • about the effects of individual choices
    • about the effects of policy interventions
    • about the impact of political institutions
    • about economic history and economic development
    • etc…

Causal Inference: Examples

A 50% increase in police presence in a metropolitan area lowers crime rates by 15%, on average1

Being an incumbent in office raises the probability of re-election by 40-45 percentage points2

European cities with at least one printing press in 1500 were at least 29% more likely to become Protestant by 16003

Example 1: Education

Example

  • Does reducing class sizes improve student performance?
  • A policy-relevant tradeoff with a budget constraint
  • What is the precise effect of class size on performance?
  • Is it worth hiring new teachers and building more schools over?

Example 2: Discrimination in Lending

Example

  • Is there racial discrimination in home mortgage lending?
  • Boston Fed: 28% of African-Americans are denied mortgages compared to only 9% of White Americans
  • Is this due to factors such as credit history, income, or discrimination purely because of race?

Example 3: Public Health and Public Finance

Example

  • How much do state cigarette taxes reduce smoking rates?
  • Econ 101: raise price \(\rightarrow\) lower quantity consumed
  • What is the price elasticity of demand for smoking?
  • How much tax revenue will this generate?
  • Probably: \(Taxes \rightarrow Smokers\)
  • Maybe?: \(Taxes \leftarrow Smokers\)

About This Course

Real Talk: The Math

Real Talk: The Math

Real Talk: The Math

Real Talk: Difficulty

  • This will be one of the hardest courses you take at Hood
  • There will be moments where you have no idea WTF is going on 🤯 (this is normal)
  • But this is one of the best courses you can take at Hood
  • Yes, you can still get an A

This Class Is

  • Economics: take your preexisting intuition and models for causal inference
  • Statistics: add regression and statistical inference
  • Computer Programming: using R and R Studio for analyzing and presenting data

Old School Statistics Courses

  • \(\bar{x} = \frac{1}{n} \displaystyle\sum^n_{i=1} x_i\)

  • \(\sigma_x = \displaystyle \sqrt{\frac{1}{n} \sum^n_{i=1} (x_i-\bar{x})^2}\)

  • \(r_{xy}= \displaystyle \frac{\displaystyle\sum^n_{i=1}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\displaystyle\sum^n_{i=1}(x_i-\bar{x})^2\sum^n_{i=1}(y_i-\bar{y})^2}}\)

  • Use pre-cleaned “toy” data, if at all

Hip New “Data Science” Courses

mean(x)
sd(x)
cor(x, y)
  • Import, tidy, and manipulate raw data from scratch (like real life!)

Prerequisites

  • Officially (Courses):
    • ECON 205
    • ECON 206
    • ECON 305 or ECON 306
    • MATH 112 or ECMG 212
  • Math Skills:
    • Basic algebra
    • Probability-ish
    • Statistics-ish
  • Computer Science Skills:
    • None 🤖

What You’ll Get Out of This Class

By the end of this semester, you will:

  1. understand how to evaluate statistical and empirical claims;
  2. use the fundamental models of causal inference and research design;
  3. gather, analyze, and communicate with real data in R.

This Class Opens Doors

Building Industry-Demanded Data Science Skills

Building Industry-Demanded Data Science Skills

Building Industry-Demanded Data Science Skills

LinkedIn 2020 Emerging Jobs Report

R Can Be Used for Data Science

Two Types of Uses For Econometrics

\[\color{orange}{Y}=\color{teal}{f}(\color{purple}{X})\]

  1. Causal inference: estimate \(\color{teal}{\hat{f}}\) to determine how changes in \(\color{purple}{X}\) cause changes in \(\color{orange}{Y}\)
  • Care more about accurately estimating \(\color{teal}{f}\) than getting an accurate \(\color{orange}{\hat{Y}}\)
  • Measure the causal effect of \(X \mapsto Y\)
  • primarily regression-based
  1. Prediction: predict \(\color{orange}{\hat{Y}}\) using an estimated \(\color{teal}{f}\)
  • Care more about getting \(\color{orange}{\hat{Y}}\) as accurate as possible, \(\color{teal}{f}\) is an unknown “black-box”
  • Use for forecasting, classification, etc.
  • less regression, more machine-learning methods
  • More and more “data science” focuses on the second…but
  • We care (in this class at least) only about the first…because

Causal Inference — Economists’ Comparative Advantage

  • Machine learning and artificial intelligence are “dumb”1
  • With the right models and research designs, we can say “X causes Y” and quantify it!
  • Economists are in a unique position to make causal claims that mere statistics cannot

Causal Inference — Economists’ Comparative Advantage

“[T]he field of economics has spent decades developing a toolkit aimed at investigating empirical relationships, focusing on techniques to help understand which correlations speak to a causal relationship and which do not. This comes up all the time — does Uber Express Pool grow the full Uber user base, or simply draw in users from other Uber products? Should eBay advertise on Google, or does this simply syphon off people who would have come through organic search anyway? Are African-American Airbnb users rejected on the basis of their race? These are just a few of the countless questions that tech companies are grappling with, investing heavily in understanding the extent of a causal relationship.”

Building Good Workflow Habits

  • I will show you the tools to make your workflow:
    • Reproducible
    • Computer- and Human-Readable (!)
    • Automated
    • All in one program

For Example

library(gapminder)
library(gganimate)
gapminder %>%
  filter(continent != "Oceania") %>%
ggplot(aes(x = gdpPercap,
           y = lifeExp,
           color = country,
           size = pop))+
  geom_point(alpha=0.3)+
    scale_x_log10(breaks=c(1000,10000, 100000),
                  label=scales::dollar)+
  scale_size(range = c(0.5, 12)) +
  scale_color_manual(values = gapminder::country_colors) +
    labs(x = "GDP/Capita",
         y = "Life Expectancy (Years)",
         caption = "Source: Hans Rosling's gapminder.org",
         title = "Income & Life Expectancy - {frame_time}")+
  facet_wrap(~continent)+
  guides(color = F, size = F)+
  theme_minimal(base_family = "Fira Sans Condensed")+
  transition_time(year)+
  ease_aes("linear")

Assignments

  • Research project:
    • Come up with a testable research question
    • Find data
    • Analyze data
    • Present your results (in writing and verbally)
  • HWs
  • Midterm, Final exam
Assignment Percent
1 Research Project 30%
n Homeworks (Average) 25%
1 Midterm 20%
1 Final 25%

Logistics

  • Office hours: MW 1:30-2:30 PM & by appt

    • Office: 114 Rosenstock
  • Slack channel

  • See the resources page for tips for success and more helpful resources

Your Textbooks

You Can Do This.

Tips for Success in This Course

  • Take notes. On paper. Really.

  • Work together on assignments and study together.

  • Ask questions, come to office hours. Don’t struggle in silence, you are not alone!

  • The biggest skill you are developing is learning how to learn1

  • See the reference page for more

Course Website

metricsF22.classes.ryansafner.com

Roadmap for the Semester

For Next Class

For Next Class