1.4 — Data Wrangling

(Answer Key)

Author

Ryan Safner

Published

September 6, 2022

Required Packages

First, install the following two packages with the command install.packages("tidyverse") and install.packages("gapminder") in the console below.1 Alternatively, you will probably already get a yellow banner at the top of this file indicating you need to install the packages, and can install them by clicking Install. Don’t install any package in an R chunk in this document, since it needs to be installed into R Studio.

Then, load the package by running (clicking the green play button) the chunk below:

library("tidyverse") # my friend and yours
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library("gapminder") # for dataset

gapminder <- gapminder # explicitly save data as a dataframe

Warm Up to dplyr with gapminder Again

Question 1

Let’s look at the data again by running the following chunk. glimpse() is a suped-up tidyverse version of str(). You can also start to see how to use the pipe operator %>%.

gapminder %>% 
  glimpse()
Rows: 1,704
Columns: 6
$ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

Question 2

Now select() only the variables year, lifeExp, and country.

gapminder %>% 
  select(year, lifeExp, country)

Question 3

Now select() all variables except pop.

gapminder %>%
  select(-pop)

Question 4

rename() the variable continent to cont.

gapminder %>%
  rename(cont = continent)

Question 5

arrange() the data by year.

gapminder %>%
  arrange(year)

Question 6

Now arrange() by year, but in descending order.

gapminder %>%
  arrange(desc(year))

Question 7

Now arrange() by year, then by lifeExp

gapminder %>%
  arrange(year, lifeExp)

Question 8

Let’s try subsetting some rows. filter() observations with pop greater than 1 billion (9 zeros).

gapminder %>%
  filter(pop > 1000000000)

Question 9

Redo the same command from question 8, but of that subset of data, only look at India.

gapminder %>%
  filter(pop > 1000000000,
         country == "India")

Question 10

Let’s pipe a bunch of commands together. select() your data to look only at year, gdpPercap, and country in the year 1997, for countries that have a gdpPercap greater than 20,000, and arrange() them alphabetically.

gapminder %>%
  select(year, gdpPercap, country) %>%
  filter(year == 1997,
         gdpPercap > 20000) %>%
  arrange(country)

Question 11

Make a new variable with mutate() called GDP, which is equal to gdpPercap * pop.

gapminder %>%
  mutate(GDP = gdpPercap * pop)

Question 12

Make a new variable that is pop in millions.

gapminder %>%
  mutate(pop_mil = pop / 1000000)