1.3 — Data Visualization

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Graphics and Statistics

  • Admittedly, we still need to cover basic descriptive statistics and data fundamentals

    • continuous, discrete, cross-sectional, time series, panel data
    • mean, median, variance, standard deviation
    • random variables, distributions, PDFs, Z-scores
    • bargraphs, boxplots, histograms, scatterplots
  • All of this is coming in 2 weeks as we return to statistics and econometric theory

  • But let’s start with the fun stuff right away, even if you don’t fully know the reasons: data visualization

Our Data Source

  • For our examples, we’ll use a dataset mpg from the ggplot2 library
library(ggplot2)

head(mpg)
# A tibble: 6 × 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

ggplot2 and the tidyverse

ggplot2

  • ggplot2 is perhaps the most popular package in R and a core element of the tidyverse

  • gg stands for a grammar of graphics

  • Very powerful and beautiful graphics, very customizable and reproducible, but requires a bit of a learning curve

  • All those “cool graphics” you’ve seen in the New York Times, fivethirtyeight, the Economist, Vox, etc use the grammar of graphics

ggplot: All Your Figure are Belong to Us

Source: fivethirtyeight

Source: fivethirtyeight

ggplot: All Your Figure are Belong to Us

Source: BBC’s bbplot

Why Go gg?

Hadley Wickham

Chief Scientist, R Studio

“The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.”

Source

The Grammar of Graphics (gg)

  • This is a true grammar

  • We don’t talk about specific chart types

    • That you have to hunt through in Excel and reshape your data to fit it
  • Instead we talk about specific chart components

The Grammar of Graphics (gg) I

  • Any graphic can be built from the same components:
    1. Data to be drawn from
    2. Aesthetic mappings from data to some visual marking
    3. Geometric objects on the plot
    4. Scales define the range of values
    5. Coordinates to organize location
    6. Labels describe the scale and markings
    7. Facets group into subplots
    8. Themes style the plot elements

The Grammar of Graphics (gg) I

  • Any graphic can be built from the same components:
    1. data to be drawn from
    2. aesthetic mappings from data to some visual marking
    3. geometric objects on the plot
    4. scale define the range of values
    5. coordinates to organize location
    6. labels describe the scale and markings
    7. facet group into subplots
    8. theme style the plot elements

The Grammar of Graphics (gg): All at Once

All in One Command

Produces plot output in viewer

  • Does not save plot (if done in console)

    • Save with Export menu in viewer
  • Adding layers requires whole code for new plot

  • Perfectly fine if it’s a code chunk in a Quarto document!

ggplot(data = mpg)+
  aes(x = displ,
        y = hwy)+
  geom_point()+
  geom_smooth()

The Grammar of Graphics (gg): As R Objects

Saving as an object

  • Saves your plot as an R object

  • Does not show in viewer

    • Execute the name of your object to see it
  • Can add layers by calling the original plot name

# make and save plot as p
p <- ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point()

p # view plot

# to add a layer...
p + geom_smooth() # shows the new plot

p <- p + geom_smooth() # overwrites p
p2 <- p + geom_smooth() # saves new object

Plot Layers

The Grammar of Graphics (gg): Tidy Data

Data

ggplot(data = mpg)

Data is the source of our data. As part of the tidyverse, ggplot2 requires data to be “tidy”1:

  1. Each variable forms a column

  2. Each observation forms a row

  3. Each observational unit forms a table

  1. Data “tidyness” is the core element of all tidyverse packages. Much more on all of this next class

gg: Data Layer

Data

ggplot(data = mpg)

  • Add a layer with + at the end of a line (never at the beginning!)

  • Style recommendation: start a new line after each + to improve legibility!

  • We will build a plot layer-by-layer

gg: Mapping Aesthetics I

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

gg: Mapping Aesthetics II

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

  • displ

  • hwy

  • class

gg: Mapping Aesthetics III

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

  • displ → x

  • hwy → y

  • class → color, (or shape, size, etc.)

gg: Mapping Aesthetics IV

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

gg: Mapping Aesthetics V

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

aes(x = displ,
    y = hwy,
    color = class)

gg: Geoms I

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

gg: Geoms II

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

  • What geoms you should use depends on what you want to show:
Type geom
Point geom_point()
Line geom_line(), geom_path()
Bar geom_bar(), geom_col()
Histogram geom_histogram()
Regression geom_smooth()
Boxplot geom_boxplot()
Text geom_text()
Density geom_density()

gg: Geoms III

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

##  [1] "geom_abline"     "geom_area"       "geom_bar"        "geom_bin2d"     
##  [5] "geom_blank"      "geom_boxplot"    "geom_col"        "geom_contour"   
##  [9] "geom_count"      "geom_crossbar"   "geom_curve"      "geom_density"   
## [13] "geom_density_2d" "geom_density2d"  "geom_dotplot"    "geom_errorbar"  
## [17] "geom_errorbarh"  "geom_freqpoly"   "geom_hex"        "geom_histogram" 
## [21] "geom_hline"      "geom_jitter"     "geom_label"      "geom_line"      
## [25] "geom_linerange"  "geom_map"        "geom_path"       "geom_point"     
## [29] "geom_pointrange" "geom_polygon"    "geom_qq"         "geom_qq_line"   
## [33] "geom_quantile"   "geom_raster"     "geom_rect"       "geom_ribbon"    
## [37] "geom_rug"        "geom_segment"    "geom_sf"         "geom_sf_label"  
## [41] "geom_sf_text"    "geom_smooth"     "geom_spoke"      "geom_step"      
## [45] "geom_text"       "geom_tile"       "geom_violin"     "geom_vline"

See http://ggplot2.tidyverse.org/reference for many more options

gg: Geoms IV

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

Or just start typing geom_ in R Studio!

Let’s Make a Plot!

ggplot(data = mpg)

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point()

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))+
      geom_smooth()

More Geoms

Data

Aesthetics

Geoms

+geom_*(...)

geom_*(aes, data, stat, position)

  • data: geoms can have their own data
    • has to map onto global coordinates
  • aes: geoms can have their own aesthetics
    • inherits global aesthetics by default
    • different geoms have different available aesthetics

More Geoms II

Data

Aesthetics

Geoms

+geom_*(...)

geom_*(aes, data, stat, position)

  • stat: some geoms statistically transform data
    • geom_histogram() uses stat_bin() to group observations into bins
  • position: some adjust location of objects
    • dodge, stack, jitter

Our Plot

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))+
      geom_smooth()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      y = hwy)+
      geom_boxplot()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class)+
      geom_bar()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      fill = drv)+
      geom_bar()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      fill = drv)+
      geom_bar(position = "dodge")

Back to the Original (and Saving It)

# save plot as p
p <- ggplot(data = mpg)+
  aes(x = displ,
        y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()

p # show plot# save plot as p
p <- ggplot(data = mpg)+
  aes(x = displ,
        y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()

p # show plot

gg: Facets I

Data

Aesthetics

Geoms

Facets

+ facet_wrap()

+ facet_grid()

p + facet_wrap(~year)

gg: Facets II

Data

Aesthetics

Geoms

Facets

+ facet_wrap()

+ facet_grid()

p + facet_grid(cyl ~ year)

gg: Labels

Data

Aesthetics

Geoms

Facets

+ labs()

(p <- p + facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class"))

gg: Scales I

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

scale+_+<aes>+_+<type>+()

  • <aes>: parameter to adjust

  • <type: type of parameter

  • Discrete x-axis: scale_x_discrete()

  • Continuous y-axis: scale_y_continuous()

  • Rescale x-axis to log: scale_x_log10()

  • Use different color palette: scale_fill_discrete(), scale_color_manual()

gg: Scales II

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

p + scale_x_continuous(breaks = seq(0, 10, 2),
                       limits = c(0,7.5),
                       expand = c(0,0)
)

gg: Scales II

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

p + scale_x_continuous(breaks = seq(0, 10, 2),
                       limits = c(0,7.5),
                       expand = c(0,0)
                       ) + 
  scale_color_viridis_d()

gg: Themes I

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

Theme changes appearance of plot decorations (things not mapped to data)

  • Some themes that come with ggplot2:
    • + theme_bw()
    • + theme_dark()
    • + theme_gray()
    • + theme_minimal()
    • + theme_light()
    • + theme_classic()

gg: Themes II

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

Theme changes appearance of plot decorations (things not mapped to data)

  • Many parameters we could customize

  • Global options: line, rect, text, title

  • axis: x-, y-, or other axis title, ticks, lines

  • legend: plot legends for fill or color

  • panel: actual plot area

  • plot: whole image

  • strip: facet labels

gg: Themes III

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()

gg: Themes IV

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()+
  theme(text = element_text(family = "Fira Sans"))

gg: Themes V

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()+
  theme(text = element_text(family = "Fira Sans"),
        legend.position = "bottom")

gg: Themes VI

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

  • ggthemes package adds some other nice themes
# install if you don't have it
# install.packages("ggthemes")
library("ggthemes") # load package

gg: Themes VII

library(ggthemes)
ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_economist()+
  theme(text = element_text(family = "Fira Sans"))

gg: Themes VIII

library(ggthemes)
ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_fivethirtyeight()+
  theme(text = element_text(family = "Fira Sans"))

Some Troubleshooting

Global vs. Local Aesthetic Mappings

  • aes() can go in base (data) layer and/or in individual geom() layers
  • All geoms will inherit global aes from data layer unless overridden
# ALL GEOMS will map data to colors
ggplot(data = mpg, aes(x = displ,
                       y = hwy,
                       color = class))+
  geom_point()+
  geom_smooth()

# ONLY points will map data to colors
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(color = class))+
  geom_smooth()

Mapped vs. Set Aesthetics

  • aesthetics such as size and color can be mapped from data or set to a single value
  • Map inside of aes(), set outside of aes()
# Point colors are mapped from class data
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(color = class))+
  geom_smooth()

# Point colors are all set to blue
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(), color = "red")+
  geom_smooth(aes(), color = "blue")

Go Crazy I

  • Output
  • Code

# I did some (hidden) data work before this! 
ggplot(data = county_full,
            mapping = aes(x = long, y = lat,
                          fill = pop_dens, 
                          group = group))+ 
  geom_polygon(color = "gray90", size = 0.05)+
  coord_equal()+
  scale_fill_brewer(palette="Blues",
                             labels = c("0-10", "10-50", "50-100", "100-500",
                                        "500-1,000", "1,000-5,000", ">5,000"))+
  labs(fill = "Population per\nsquare mile") +
    theme_map() +
    guides(fill = guide_legend(nrow = 1)) + 
    theme(legend.position = "bottom")

Go Crazy II

  • Output
  • Code

library(gapminder)
library(gganimate)
gapminder %>%
  filter(continent != "Oceania") %>%
ggplot(aes(x = gdpPercap,
           y = lifeExp,
           color = country,
           size = pop))+
  geom_point(alpha=0.3)+
    scale_x_log10(breaks=c(1000,10000, 100000),
                  label=scales::dollar)+
  scale_size(range = c(0.5, 12)) +
  scale_color_manual(values = gapminder::country_colors) +
    labs(x = "GDP/Capita",
         y = "Life Expectancy (Years)",
         caption = "Source: Hans Rosling's gapminder.org",
         title = "Income & Life Expectancy - {frame_time}")+
  facet_wrap(~continent)+
  guides(color = F, size = F)+
  theme_minimal(base_family = "Fira Sans Condensed")+
  transition_time(year)+
  ease_aes("linear")

Reference: R Studio Makes Great “Cheat Sheet”s!

RStudio: ggplot2 Cheat Sheet

Reference

On ggplot2

  • R Studio’s ggplot2 Cheat Sheet
  • ggplot2’s website reference section
  • Hadley Wickham’s R for Data Science book chapter on ggplot2
  • STHDA’s be awesome in ggplot2
  • r-statistic’s top 50 ggplot2 visualizations

On data visualization

  • Kieran Healy’s Data Visualization: A Practical Guide
  • Claus Wilke’s Fundamentals of Data Visualization
  • PolicyViz Better Presentations
  • Karl Broman’s How to Display Data Badly
  • I Want Hue

ECON 480 — Econometrics

1.3 — Data Visualization ECON 480 • Econometrics • Fall 2022 Dr. Ryan Safner Associate Professor of Economics safner@hood.edu ryansafner/metricsF22 metricsF22.classes.ryansafner.com

  1. Slides

  2. Tools

  3. Close
  • Title Slide
  • Graphics and Statistics
  • Our Data Source
  • ggplot2 and the tidyverse
  • ggplot2
  • ggplot: All Your Figure are Belong to Us
  • ggplot: All Your Figure are Belong to Us
  • Why Go gg?
  • The Grammar of Graphics (gg)
  • The Grammar of Graphics (gg) I
  • The Grammar of Graphics (gg) I
  • The Grammar of Graphics (gg): All at Once
  • The Grammar of Graphics (gg): As R Objects
  • Plot Layers
  • The Grammar of Graphics (gg): Tidy Data
  • gg: Data Layer
  • gg: Mapping Aesthetics I
  • gg: Mapping Aesthetics II
  • gg: Mapping Aesthetics III
  • gg: Mapping Aesthetics IV
  • gg: Mapping Aesthetics V
  • gg: Geoms I
  • gg: Geoms II
  • gg: Geoms III
  • gg: Geoms IV
  • Let’s Make a Plot!
  • Let’s Make a Plot!
  • Let’s Make a Plot!
  • Let’s Make a Plot!
  • Let’s Make a Plot!
  • More Geoms
  • More Geoms II
  • Our Plot
  • Change Our Plot
  • Change Our Plot
  • Change Our Plot
  • Change Our Plot
  • Back to the Original (and Saving It)
  • gg: Facets I
  • gg: Facets II
  • gg: Labels
  • gg: Scales I
  • gg: Scales II
  • gg: Scales II
  • gg: Themes I
  • gg: Themes II
  • gg: Themes III
  • gg: Themes IV
  • gg: Themes V
  • gg: Themes VI
  • gg: Themes VII
  • gg: Themes VIII
  • Some Troubleshooting
  • Global vs. Local Aesthetic Mappings
  • Mapped vs. Set Aesthetics
  • Go Crazy I
  • Go Crazy II
  • Reference: R Studio Makes Great “Cheat Sheet”s!
  • Reference
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help