1.3 — Data Visualization

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Graphics and Statistics

  • Admittedly, we still need to cover basic descriptive statistics and data fundamentals

    • continuous, discrete, cross-sectional, time series, panel data
    • mean, median, variance, standard deviation
    • random variables, distributions, PDFs, Z-scores
    • bargraphs, boxplots, histograms, scatterplots
  • All of this is coming in 2 weeks as we return to statistics and econometric theory

  • But let’s start with the fun stuff right away, even if you don’t fully know the reasons: data visualization

Our Data Source

  • For our examples, we’ll use a dataset mpg from the ggplot2 library
library(ggplot2)

head(mpg)
# A tibble: 6 × 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

ggplot2 and the tidyverse

ggplot2

  • ggplot2 is perhaps the most popular package in R and a core element of the tidyverse

  • gg stands for a grammar of graphics

  • Very powerful and beautiful graphics, very customizable and reproducible, but requires a bit of a learning curve

  • All those “cool graphics” you’ve seen in the New York Times, fivethirtyeight, the Economist, Vox, etc use the grammar of graphics

ggplot: All Your Figure are Belong to Us

ggplot: All Your Figure are Belong to Us

Source: BBC’s bbplot

Why Go gg?

Hadley Wickham

Chief Scientist, R Studio

“The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.”

Source

The Grammar of Graphics (gg)

  • This is a true grammar

  • We don’t talk about specific chart types

    • That you have to hunt through in Excel and reshape your data to fit it
  • Instead we talk about specific chart components

The Grammar of Graphics (gg) I

  • Any graphic can be built from the same components:
    1. Data to be drawn from
    2. Aesthetic mappings from data to some visual marking
    3. Geometric objects on the plot
    4. Scales define the range of values
    5. Coordinates to organize location
    6. Labels describe the scale and markings
    7. Facets group into subplots
    8. Themes style the plot elements

The Grammar of Graphics (gg) I

  • Any graphic can be built from the same components:
    1. data to be drawn from
    2. aesthetic mappings from data to some visual marking
    3. geometric objects on the plot
    4. scale define the range of values
    5. coordinates to organize location
    6. labels describe the scale and markings
    7. facet group into subplots
    8. theme style the plot elements

The Grammar of Graphics (gg): All at Once

All in One Command

Produces plot output in viewer

  • Does not save plot (if done in console)

    • Save with Export menu in viewer
  • Adding layers requires whole code for new plot

  • Perfectly fine if it’s a code chunk in a Quarto document!

ggplot(data = mpg)+
  aes(x = displ,
        y = hwy)+
  geom_point()+
  geom_smooth()

The Grammar of Graphics (gg): As R Objects

Saving as an object

  • Saves your plot as an R object

  • Does not show in viewer

    • Execute the name of your object to see it
  • Can add layers by calling the original plot name

# make and save plot as p
p <- ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point()

p # view plot

# to add a layer...
p + geom_smooth() # shows the new plot

p <- p + geom_smooth() # overwrites p
p2 <- p + geom_smooth() # saves new object

Plot Layers

The Grammar of Graphics (gg): Tidy Data

Data

ggplot(data = mpg)

Data is the source of our data. As part of the tidyverse, ggplot2 requires data to be “tidy”1:

  1. Each variable forms a column

  2. Each observation forms a row

  3. Each observational unit forms a table

gg: Data Layer

Data

ggplot(data = mpg)

  • Add a layer with + at the end of a line (never at the beginning!)

  • Style recommendation: start a new line after each + to improve legibility!

  • We will build a plot layer-by-layer

gg: Mapping Aesthetics I

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

gg: Mapping Aesthetics II

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

  • displ

  • hwy

  • class

gg: Mapping Aesthetics III

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

  • displx

  • hwyy

  • classcolor, (or shape, size, etc.)

gg: Mapping Aesthetics IV

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

gg: Mapping Aesthetics V

Data

Aesthetics

+aes(...)

Aesthetics map data to visual elements or parameters

aes(x = displ,
    y = hwy,
    color = class)

gg: Geoms I

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

gg: Geoms II

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

  • What geoms you should use depends on what you want to show:
Type geom
Point geom_point()
Line geom_line(), geom_path()
Bar geom_bar(), geom_col()
Histogram geom_histogram()
Regression geom_smooth()
Boxplot geom_boxplot()
Text geom_text()
Density geom_density()

gg: Geoms III

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

##  [1] "geom_abline"     "geom_area"       "geom_bar"        "geom_bin2d"     
##  [5] "geom_blank"      "geom_boxplot"    "geom_col"        "geom_contour"   
##  [9] "geom_count"      "geom_crossbar"   "geom_curve"      "geom_density"   
## [13] "geom_density_2d" "geom_density2d"  "geom_dotplot"    "geom_errorbar"  
## [17] "geom_errorbarh"  "geom_freqpoly"   "geom_hex"        "geom_histogram" 
## [21] "geom_hline"      "geom_jitter"     "geom_label"      "geom_line"      
## [25] "geom_linerange"  "geom_map"        "geom_path"       "geom_point"     
## [29] "geom_pointrange" "geom_polygon"    "geom_qq"         "geom_qq_line"   
## [33] "geom_quantile"   "geom_raster"     "geom_rect"       "geom_ribbon"    
## [37] "geom_rug"        "geom_segment"    "geom_sf"         "geom_sf_label"  
## [41] "geom_sf_text"    "geom_smooth"     "geom_spoke"      "geom_step"      
## [45] "geom_text"       "geom_tile"       "geom_violin"     "geom_vline"

See http://ggplot2.tidyverse.org/reference for many more options

gg: Geoms IV

Data

Aesthetics

Geoms

+geom_*(...)

Geometric objects displayed on the plot

Or just start typing geom_ in R Studio!

Let’s Make a Plot!

ggplot(data = mpg)

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point()

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))

Let’s Make a Plot!

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))+
      geom_smooth()

More Geoms

Data

Aesthetics

Geoms

+geom_*(...)

geom_*(aes, data, stat, position)

  • data: geoms can have their own data
    • has to map onto global coordinates
  • aes: geoms can have their own aesthetics
    • inherits global aesthetics by default
    • different geoms have different available aesthetics

More Geoms II

Data

Aesthetics

Geoms

+geom_*(...)

geom_*(aes, data, stat, position)

  • stat: some geoms statistically transform data
    • geom_histogram() uses stat_bin() to group observations into bins
  • position: some adjust location of objects
    • dodge, stack, jitter

Our Plot

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
      geom_point(aes(color = class))+
      geom_smooth()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      y = hwy)+
      geom_boxplot()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class)+
      geom_bar()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      fill = drv)+
      geom_bar()

Change Our Plot

ggplot(data = mpg)+
  aes(x = class,
      fill = drv)+
      geom_bar(position = "dodge")

Back to the Original (and Saving It)

# save plot as p
p <- ggplot(data = mpg)+
  aes(x = displ,
        y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()

p # show plot

gg: Facets I

Data

Aesthetics

Geoms

Facets

+ facet_wrap()

+ facet_grid()

p + facet_wrap(~year)

gg: Facets II

Data

Aesthetics

Geoms

Facets

+ facet_wrap()

+ facet_grid()

p + facet_grid(cyl ~ year)

gg: Labels

Data

Aesthetics

Geoms

Facets

+ labs()

(p <- p + facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class"))

gg: Scales I

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

scale+_+<aes>+_+<type>+()

  • <aes>: parameter to adjust

  • <type: type of parameter

  • Discrete x-axis: scale_x_discrete()

  • Continuous y-axis: scale_y_continuous()

  • Rescale x-axis to log: scale_x_log10()

  • Use different color palette: scale_fill_discrete(), scale_color_manual()

gg: Scales II

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

p + scale_x_continuous(breaks = seq(0, 10, 2),
                       limits = c(0,7.5),
                       expand = c(0,0)
)

gg: Scales II

Data

Aesthetics

Geoms

Facets

Scales

+ scale_*_*()

p + scale_x_continuous(breaks = seq(0, 10, 2),
                       limits = c(0,7.5),
                       expand = c(0,0)
                       ) + 
  scale_color_viridis_d()

gg: Themes I

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

Theme changes appearance of plot decorations (things not mapped to data)

  • Some themes that come with ggplot2:
    • + theme_bw()
    • + theme_dark()
    • + theme_gray()
    • + theme_minimal()
    • + theme_light()
    • + theme_classic()

gg: Themes II

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

Theme changes appearance of plot decorations (things not mapped to data)

  • Many parameters we could customize

  • Global options: line, rect, text, title

  • axis: x-, y-, or other axis title, ticks, lines

  • legend: plot legends for fill or color

  • panel: actual plot area

  • plot: whole image

  • strip: facet labels

gg: Themes III

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()

gg: Themes IV

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()+
  theme(text = element_text(family = "Fira Sans"))

gg: Themes V

ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_minimal()+
  theme(text = element_text(family = "Fira Sans"),
        legend.position = "bottom")

gg: Themes VI

Data

Aesthetics

Geoms

Facets

Scales

Themes

+ theme_*()

  • ggthemes package adds some other nice themes
# install if you don't have it
# install.packages("ggthemes")
library("ggthemes") # load package

gg: Themes VII

library(ggthemes)
ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_economist()+
  theme(text = element_text(family = "Fira Sans"))

gg: Themes VIII

library(ggthemes)
ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))+
  geom_smooth()+
  facet_wrap(~year)+
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+
  theme_fivethirtyeight()+
  theme(text = element_text(family = "Fira Sans"))

Some Troubleshooting

Global vs. Local Aesthetic Mappings

  • aes() can go in base (data) layer and/or in individual geom() layers
  • All geoms will inherit global aes from data layer unless overridden
# ALL GEOMS will map data to colors
ggplot(data = mpg, aes(x = displ,
                       y = hwy,
                       color = class))+
  geom_point()+
  geom_smooth()

# ONLY points will map data to colors
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(color = class))+
  geom_smooth()

Mapped vs. Set Aesthetics

  • aesthetics such as size and color can be mapped from data or set to a single value
  • Map inside of aes(), set outside of aes()
# Point colors are mapped from class data
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(color = class))+
  geom_smooth()

# Point colors are all set to blue
ggplot(data = mpg, aes(x = displ,
                       y = hwy))+
  geom_point(aes(), color = "red")+
  geom_smooth(aes(), color = "blue")

Go Crazy I

# I did some (hidden) data work before this! 
ggplot(data = county_full,
            mapping = aes(x = long, y = lat,
                          fill = pop_dens, 
                          group = group))+ 
  geom_polygon(color = "gray90", size = 0.05)+
  coord_equal()+
  scale_fill_brewer(palette="Blues",
                             labels = c("0-10", "10-50", "50-100", "100-500",
                                        "500-1,000", "1,000-5,000", ">5,000"))+
  labs(fill = "Population per\nsquare mile") +
    theme_map() +
    guides(fill = guide_legend(nrow = 1)) + 
    theme(legend.position = "bottom")

Go Crazy II

library(gapminder)
library(gganimate)
gapminder %>%
  filter(continent != "Oceania") %>%
ggplot(aes(x = gdpPercap,
           y = lifeExp,
           color = country,
           size = pop))+
  geom_point(alpha=0.3)+
    scale_x_log10(breaks=c(1000,10000, 100000),
                  label=scales::dollar)+
  scale_size(range = c(0.5, 12)) +
  scale_color_manual(values = gapminder::country_colors) +
    labs(x = "GDP/Capita",
         y = "Life Expectancy (Years)",
         caption = "Source: Hans Rosling's gapminder.org",
         title = "Income & Life Expectancy - {frame_time}")+
  facet_wrap(~continent)+
  guides(color = F, size = F)+
  theme_minimal(base_family = "Fira Sans Condensed")+
  transition_time(year)+
  ease_aes("linear")

Reference: R Studio Makes Great “Cheat Sheet”s!

RStudio: ggplot2 Cheat Sheet

Reference

On ggplot2

On data visualization