slides

2.1 — Data 101 & Descriptive Statistics

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Summary of diamonds by cut
cut	n	frequency	percent
Fair	1610	0.0298480	2.98
Good	4906	0.0909529	9.10
Very Good	12082	0.2239896	22.40
Premium	13791	0.2556730	25.57
Ideal	21551	0.3995365	39.95

id	name	age	sex	income
1	John	23	Male	41000
2	Emile	18	Male	52600
3	Natalya	28	Female	48000
4	Lakisha	31	Female	60200
5	Cheng	36	Male	81900

id	name	age	sex	income
1	John	23	Male	41000
2	Emile	18	Male	52600
3	Natalya	28	Female	48000
4	Lakisha	31	Female	60200
5	Cheng	36	Male	81900

id	name	age	sex	income
1	John	23	Male	41000
2	Emile	18	Male	52600
3	Natalya	28	Female	48000
4	Lakisha	31	Female	60200
5	Cheng	36	Male	81900

Year	GDP	Unemployment	CPI
1950	8.2	0.06	100
1960	9.9	0.04	118
1970	10.2	0.08	130
1980	12.4	0.08	190
1985	13.6	0.06	196

City	Year	Murders	Population	UR
Philadelphia	1986	5	3.700	8.7
Philadelphia	1990	8	4.200	7.2
D.C.	1986	2	0.250	5.4
D.C.	1990	10	0.275	5.5
New York	1986	3	6.400	9.6

scores <dbl>	n <int>
71	2
0	1
62	1
66	1
74	1

Variable	Obs	Min	Q1	Median	Q3	Max	Mean	Std. Dev.
cty	234	9	14	17	19	35	16.86	4.26
cyl	234	4	4	6	8	8	5.89	1.61
hwy	234	12	18	24	27	44	23.44	5.95

2.1 — Data 101 & Descriptive Statistics ECON 480 • Econometrics • Fall 2022 Dr. Ryan Safner Associate Professor of Economics safner@hood.edu ryansafner/metricsF22 metricsF22.classes.ryansafner.com

Title Slide
Contents
The Two Big Problems with Data
Two Big Problems with Data
Identification Problem: Endogeneity
Identification Problem: Endogeneity
Identification Problem: Endogeneity
Inference Problem: Randomness
The Two Problems: Where We’re Heading…Ultimately
Data 101
Data 101
Data 101
Categorical Variables
Categorical Variables: Visualizing I
Categorical Variables: Visualizing II
Categorical Data: Pie Charts
Categorical Data: Pie Charts
Categorical Data: Alternatives to Pie Charts I
Categorical Data: Alternatives to Pie Charts II
Categorical Data: Alternatives to Pie Charts III
Quantitative Data I
Discrete Data
Continuous Data
Spreadsheets
Spreadsheets: Indexing
Spreadsheets: Notation
Datasets: Cross-Sectional
Datasets: Time-Series
Datasets: Panel
Descriptive Statistics
Variables and Distributions
Two Branches of Statistics
Histogram
Histogram: Bin Size
Histogram: Example
Histogram: Example
Descriptive Statistics
Measures of Center
Mode
Mode
Multi-Modal Distributions
Symmetry and Skew I
Symmetry and Skew I
Outliers
Arithmetic Mean (Population)
Arithmetic Mean (Sample)
Arithmetic Mean (Sample)
Arithmetic Mean: Affected by Outliers
Median
Mean, Median, and Outliers
Mean, Median, Symmetry, & Skew I
Mean, Median, Symmetry, & Skew II
Mean, Median, Symmetry, & Skew III
Measures of Dispersion
Range
Five Number Summary I
Five Number Summary II
Boxplot I
Boxplot Comparisons I
Boxplot Comparisons II
Aside: Making Nice Summary Tables I
Aside: Making Nice Summary Tables II
Aside: Making Nice Summary Tables III
Measures of Dispersion: Deviations
Variance (Population)
Standard Deviation (Population)
Variance (Sample)
Standard Deviation (Sample)
Sample Standard Deviation: Example
The Steps to Calculate sd(), Coded I
The Steps to Calculate sd(), Coded II
The Steps to Calculate sd(), Coded III
Sample Standard Deviation: You Try
Descriptive Statistics: Populations vs. Samples