1.1 — Introduction to Econometrics

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

About Me

Ph.D (Economics) — George Mason University, 2015
B.A. (Economics) — University of Connecticut, 2011
7th year teaching at Hood
Specializations:
- Law and Economics
- Austrian Economics
Research interests
- modeling innovation & economic growth
- political economy & economic history of intellectual property

What’s Keeping Me Busy

What is Econometrics?

Why Everyone, Yes Everyone, Should Learn Statistics

We’re Not So Good at Statistics: Votes I

Votes in the U.S. House of Representatives in favor of passing the Civil Rights Act of 1964:

Democrat	Republican
61%	80%

On average, Republicans tended to vote for passage more than Democrats

We’re Not So Good at Statistics: Votes

Votes in the U.S. House of Representatives in favor of passing the Civil Rights Act of 1964:

	Democrat	Republican
North	94%	85%
	(145/154)	(138/162)
South	7%	0%
	(7/94)	(0/10)
Overall	61%	80%
	(152/248)	(138/172)

Larger proportion of Democrats \((\frac{94}{248}\), 38%) than Republicans \((\frac{10}{172}\), 6%) were from South
The 7% of southern Democrats voting for the Act dragged down the Democrats’ overall percentage more than the 0% of southern Republicans

We’re Not So Good at Statistics: Kidney Stones

Suppose you suffer from kidney stones, your doctor offers you treatment A or treatment B
In clinical trials, Treatment A was effective for a higher percentage of patients with large stones and a higher percentage of patients with small stones
Treatment B was effective for a larger percentage of patients overall than treatment A
Wait, what?

We’re Not So Good at Statistics: Kidney Stones

From a real medical study:

	Treatment A	Treatment B
Small Stones	93%	87%
	(81/87)	(234/270)
Large Stones	73%	69%
	(192/263)	(55/80)
Overall	78%	83%
	(273/350)	(289/350)

C R Charig, D R Webb, S R Payne, and J E Wickham, 1986, “Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy,” Br Med J (Clin Res Ed) 292(6524): 879–882.

We’re Not So Good at Statistics: Kidney Stones

From a real medical study:

	Treatment A	Treatment B
Small Stones	93%	87%
	(81/87)	(234/270)
Large Stones	73%	69%
	(192/263)	(55/80)
Overall	78%	83%
	(273/350)	(289/350)

The sizes of the two groups (i.e. who gets A vs B) are very different

We’re Not So Good at Statistics: Kidney Stones

The sizes of the two groups (i.e. who gets A vs B) are very different
A lurking variable in the study is the severity of the case: doctors tended to give treatment B for less severe cases

Simpson’s Paradox

Simpson’s Paradox: The correlation between two variables can change (even reverse!) when additional variables are considered]

We’re Not so Good at Statistics: Smoking

1964: U.S. Surgeon General issued a report claiming that cigarette smoking causes lung cancer
Evidence based primarily on correlations between cigarette smoking and lung cancer

We’re Not so Good at Statistics: Smoking

Tobacco companies attacked the report, naturally

We’re Not so Good at Statistics: Smoking

Ronald A. Fisher

1890—1924

But so did R. A. Fisher, the “father of modern statistics”

We’re Not so Good at Statistics: Smoking

There could be a confounding variable (“smoking gene”) that causes both lung cancer and the urge to smoke
Would imply: decision to smoke or not would have no impact on lung cancer!
Correlation between smoking and cancer is spurious!

Correlation Does Not Imply Causation

The goal of every intro statistics class ever

XKCD: Correlation

Correlation Does Not Imply Causation

Spurious Correlations

Correlation Does Not Imply Causation…

It’s always good to be skeptical of causal claims
But this is actually where econometrics shines

Econometrics

Econometrics is the application of statistical tools to quantify economic relationships in the real world
Uses real data to
- test economic hypotheses
- quantitatively estimate the magnitude of relationships between economic variables
- forecast future events

Econometrics and Causal Inference

What sets econometrics apart from mere statistics (or uses of statistics in other disciplines) is its role in causal inference
We can, with proper tools and interprations, make quantitative causal claims
- about the effects of individual choices
- about the effects of policy interventions
- about the impact of political institutions
- about economic history and economic development
- etc…

Causal Inference: Examples

A 50% increase in police presence in a metropolitan area lowers crime rates by 15%, on average¹

Being an incumbent in office raises the probability of re-election by 40-45 percentage points²

European cities with at least one printing press in 1500 were at least 29% more likely to become Protestant by 1600³

Example 1: Education

Example

Does reducing class sizes improve student performance?

A policy-relevant tradeoff with a budget constraint
What is the precise effect of class size on performance?
Is it worth hiring new teachers and building more schools over?

Example 2: Discrimination in Lending

Example

Is there racial discrimination in home mortgage lending?

Boston Fed: 28% of African-Americans are denied mortgages compared to only 9% of White Americans
Is this due to factors such as credit history, income, or discrimination purely because of race?

Example 3: Public Health and Public Finance

Example

How much do state cigarette taxes reduce smoking rates?

Econ 101: raise price \(\rightarrow\) lower quantity consumed
What is the price elasticity of demand for smoking?
How much tax revenue will this generate?
Probably: \(Taxes \rightarrow Smokers\)
Maybe?: \(Taxes \leftarrow Smokers\)

About This Course

Real Talk: The Math

Real Talk: Difficulty

This will be one of the hardest courses you take at Hood
There will be moments where you have no idea WTF is going on 🤯 (this is normal)
But this is one of the best courses you can take at Hood
Yes, you can still get an A

This Class Is

Economics: take your preexisting intuition and models for causal inference
Statistics: add regression and statistical inference
Computer Programming: using R and R Studio for analyzing and presenting data

Old School Statistics Courses

\(\bar{x} = \frac{1}{n} \displaystyle\sum^n_{i=1} x_i\)
\(\sigma_x = \displaystyle \sqrt{\frac{1}{n} \sum^n_{i=1} (x_i-\bar{x})^2}\)
\(r_{xy}= \displaystyle \frac{\displaystyle\sum^n_{i=1}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\displaystyle\sum^n_{i=1}(x_i-\bar{x})^2\sum^n_{i=1}(y_i-\bar{y})^2}}\)
Use pre-cleaned “toy” data, if at all

Hip New “Data Science” Courses

mean(x)
sd(x)
cor(x, y)

Import, tidy, and manipulate raw data from scratch (like real life!)

Prerequisites

Officially (Courses):
- ECON 205
- ECON 206
- ECON 305 or ECON 306
- MATH 112 or ECMG 212

Math Skills:
- Basic algebra
- Probability-ish
- Statistics-ish

Computer Science Skills:
- None 🤖

What You’ll Get Out of This Class

By the end of this semester, you will:

understand how to evaluate statistical and empirical claims;
use the fundamental models of causal inference and research design;
gather, analyze, and communicate with real data in R.

This Class Opens Doors

Building Industry-Demanded Data Science Skills

Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
— Josh Wills (@josh_wills) May 3, 2012

Building Industry-Demanded Data Science Skills

Harvard Business Review

LinkedIn 2018 Emerging Jobs Report

Building Industry-Demanded Data Science Skills

LinkedIn 2020 Emerging Jobs Report

R Can Be Used for Data Science

Two Types of Uses For Econometrics

\[\color{orange}{Y}=\color{teal}{f}(\color{purple}{X})\]

Causal inference: estimate \(\color{teal}{\hat{f}}\) to determine how changes in \(\color{purple}{X}\) cause changes in \(\color{orange}{Y}\)

Care more about accurately estimating \(\color{teal}{f}\) than getting an accurate \(\color{orange}{\hat{Y}}\)
Measure the causal effect of \(X \mapsto Y\)
primarily regression-based

Prediction: predict \(\color{orange}{\hat{Y}}\) using an estimated \(\color{teal}{f}\)

Care more about getting \(\color{orange}{\hat{Y}}\) as accurate as possible, \(\color{teal}{f}\) is an unknown “black-box”
Use for forecasting, classification, etc.
less regression, more machine-learning methods

More and more “data science” focuses on the second…but

We care (in this class at least) only about the first…because

Causal Inference — Economists’ Comparative Advantage

Machine learning and artificial intelligence are “dumb”¹
With the right models and research designs, we can say “X causes Y” and quantify it!
Economists are in a unique position to make causal claims that mere statistics cannot

Causal Inference — Economists’ Comparative Advantage

Harvard Business Review

“[T]he field of economics has spent decades developing a toolkit aimed at investigating empirical relationships, focusing on techniques to help understand which correlations speak to a causal relationship and which do not. This comes up all the time — does Uber Express Pool grow the full Uber user base, or simply draw in users from other Uber products? Should eBay advertise on Google, or does this simply syphon off people who would have come through organic search anyway? Are African-American Airbnb users rejected on the basis of their race? These are just a few of the countless questions that tech companies are grappling with, investing heavily in understanding the extent of a causal relationship.”

Building Good Workflow Habits

I will show you the tools to make your workflow:
- Reproducible
- Computer- and Human-Readable (!)
- Automated
- All in one program

library(gapminder)
library(gganimate)
gapminder %>%
  filter(continent != "Oceania") %>%
ggplot(aes(x = gdpPercap,
           y = lifeExp,
           color = country,
           size = pop))+
  geom_point(alpha=0.3)+
    scale_x_log10(breaks=c(1000,10000, 100000),
                  label=scales::dollar)+
  scale_size(range = c(0.5, 12)) +
  scale_color_manual(values = gapminder::country_colors) +
    labs(x = "GDP/Capita",
         y = "Life Expectancy (Years)",
         caption = "Source: Hans Rosling's gapminder.org",
         title = "Income & Life Expectancy - {frame_time}")+
  facet_wrap(~continent)+
  guides(color = F, size = F)+
  theme_minimal(base_family = "Fira Sans Condensed")+
  transition_time(year)+
  ease_aes("linear")

Assignments

Research project:
- Come up with a testable research question
- Find data
- Analyze data
- Present your results (in writing and verbally)
HWs
Midterm, Final exam

	Assignment	Percent
1	Research Project	30%
n	Homeworks (Average)	25%
1	Midterm	20%
1	Final	25%

Logistics

Office hours: MW 1:30-2:30 PM & by appt
- Office: 114 Rosenstock
Slack channel
See the resources page for tips for success and more helpful resources

Your Textbooks

You Can Do This.

Tips for Success in This Course

Take notes. On paper. Really.
Work together on assignments and study together.
Ask questions, come to office hours. Don’t struggle in silence, you are not alone!
The biggest skill you are developing is learning how to learn¹
See the reference page for more

Course Website

metricsF22.classes.ryansafner.com

Roadmap for the Semester

For Next Class

Take the preliminary survey on statistics and software
Register for R Studio Cloud
(Optional but highly recommended) Install R and R Studio on your computer