Problem Set 1

Author

Answer Key

Published

September 21, 2022

The Popularity of Baby Names

Install and load the package babynames. Get help for ?babynames to see what the data includes. Also, don’t forget to load tidyverse!

# write your code here! 
# install.packages("babynames") # install for first use 
# Note I've “commented” out some of these commands  (with a #) so they do not run when I run this chunk or render this document

library(babynames) # load for data
library(tidyverse) # load for data wrangling

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Question 1

Part A

What are the top 5 boys names for 2017, and what percent (note not the proportion!) of overall names is each?

# write your code here! 

# save as a new tibble
top_5_boys_2017 <- babynames %>% # take data
  filter(sex == "M", # filter by males
         year == 2017) %>% # and for 2007
  arrange(desc(n)) %>% # arrange in largest-to-smallest order of n (number)
  slice(1:5) %>% # optional, look only at first 5 rows; head(n=5) also works
  mutate(percent = round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals

# look at our new tibble
top_5_boys_2017

ABCDEFGHIJ0123456789

year <dbl>	sex <chr>	name <chr>	n <int>	prop <dbl>	percent <dbl>
2017	M	Liam	18728	0.00953909	0.95
2017	M	Noah	18326	0.00933433	0.93
2017	M	William	14904	0.00759134	0.76
2017	M	James	14232	0.00724906	0.72
2017	M	Logan	13974	0.00711764	0.71

The top 5 names are

top_5_boys_2017 %>%
  select(name,percent) %>%
  knitr::kable() # for nicer table in rendered document

name	percent
Liam	0.95
Noah	0.93
William	0.76
James	0.72
Logan	0.71

Alternatively, you could just write what you found manually into an object like:

top_5_boys_2017_alt <- c("Liam", "Noah", "William", "James", "Logan")

top_5_boys_2017_alt

[1] "Liam"    "Noah"    "William" "James"   "Logan"

# you could alternatively add a command, 
# %>% pull(name) to the first chunk of code, 
# and it would do the same thing, but we'd want to save it, 
# for example:

top_5_boys_2017_alt <- babynames %>%
  filter(sex=="M",
         year==2017) %>%
  arrange(desc(n)) %>% 
  slice(1:5) %>%
  mutate(percent = round(prop*100, 2)) %>%
  pull(name)
  
top_5_boys_2017_alt

[1] "Liam"    "Noah"    "William" "James"   "Logan"

Part B

What are the top 5 girls names for 2017, and what percent of overall names is each?

# write your code here! 
# save as a new tibble
top_5_girls_2017 <- babynames %>% # take data
  filter(sex == "F", # filter by females
         year == 2017) %>% # and for 2007
  arrange(desc(n)) %>% # arrange in largest-to-smallest order of n (number)
  slice(1:5) %>% # optional, look only at first 5 rows; head(., n=5) also works
  mutate(percent = round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals

# look at our new tibble
top_5_girls_2017

ABCDEFGHIJ0123456789

year <dbl>	sex <chr>	name <chr>	n <int>	prop <dbl>	percent <dbl>
2017	F	Emma	19738	0.01052750	1.05
2017	F	Olivia	18632	0.00993760	0.99
2017	F	Ava	15902	0.00848152	0.85
2017	F	Isabella	15100	0.00805377	0.81
2017	F	Sophia	14831	0.00791029	0.79

The top 5 names are

top_5_girls_2017 %>%
  select(name,percent) %>%
  knitr::kable()

name	percent
Emma	1.05
Olivia	0.99
Ava	0.85
Isabella	0.81
Sophia	0.79

Alternatively, you could just write what you found manually into an object like:

top_5_girls_2017_alt <- c("Emma", "Olivia", "Ava", "Isabella", "Sophia")

Question 2

Make two barplots of these top 5 names, one for each sex. Map aesthetics x to name and y to prop [or percent, if you made that variable, as I did.] and use geom_col (since you are declaring a specific y, otherwise you could just use geom_bar() and just an x.)

# write your code here! 
ggplot(data = top_5_boys_2017)+
  aes(x = reorder(name, n), #note this reorders the x variable from small to large n
      y = percent, # you can use prop if you didn't make a percent variable
      fill = name)+ # optional color!
  geom_col()+
  
  # all of the above is sufficient, now I'm just making it pretty
  scale_y_continuous(labels = function(x){paste0(x, "%")}, # add percent signs
                     breaks = seq(from = 0, # make line breaks every 0.25%
                                  to = 1,
                                  by = 0.25),
                     limits = c(0,1), # limit axis to between 0 and 1
                     expand = c(0,0))+ # don't let it go beyond this
  labs(x = "Name",
       y = "Percent of All Babies With Name",
       title = "Most Popular Boys Names Since 1880",
       fill = "Boy's Name",
       caption = "Source: SSA")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16)+
  coord_flip()+ # flip axes to make horizontal!
  scale_fill_viridis_d(option = "default")+ # use viridis discrete color palette
  theme(legend.position = "") # hide legend

Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
'default' does not exist. Defaulting to 'viridis'.

ggplot(data = top_5_girls_2017)+
  aes(x = reorder(name, n), #note this reorders the x variable from small to large n
      y = percent, # you can use prop if you didn't make a percent variable
      fill = name)+ # optional color!
  geom_col()+
  # all of the above is sufficient, now I'm just making it pretty
  scale_y_continuous(labels = function(x){paste0(x, "%")}, # add percent signs
                     breaks = seq(from = 0, # make line breaks every 0.25%
                                  to = 1.25,
                                  by = 0.25),
                     limits = c(0,1.3), # limit axis to between 0 and 1.2
                     expand = c(0,0))+ # don't let it go beyond this
  labs(x = "Name",
       y = "Percent of All Girls With Name",
       title = "Most Popular Girls Names Since 1880",
       fill = "Girl's Name",
       caption = "Source: SSA")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16)+
  coord_flip()+ # flip axes to make horizontal!
  scale_fill_viridis_d(option = "default")+ # use viridis discrete color palette
  theme(legend.position = "") # hide legend

Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
'default' does not exist. Defaulting to 'viridis'.

If you had gone the alternate route by saving an object of names (like I did above with top_5_boys_2017_alt and top_5_girls_2017_alt), you could filter the data using the %in% operator to use for your data layer of each plot.

Note you can also simply pipe your wrangling code into ggplot(), since the first layer is the data source:

babynames %>%
  filter(name %in% top_5_boys_2017_alt) %>%
  ggplot()+ # this pipes the above into the data layer
  # the rest of the plot code...

Question 3

Find your name. [If your name isn’t in there 😟, pick a random name.] count by sex how many babies since 1880 were named your name. [Hint: if you do only this, you’ll get the number of rows (years) there are in the data. You want to add the number of babies in each row (n), so inside count, add , wt = n to weight the count by n.] Also create a variable for the percent of each sex.

# write your code here! 
babynames %>%
  filter(name == "Ryan") %>%
  count(sex, wt = n) %>%
  mutate(percent = round((n/sum(n)*100),2))

ABCDEFGHIJ0123456789

sex <chr>	n <int>	percent <dbl>
F	22910	2.42
M	924877	97.58

Question 4

Make a line graph of the number of babies with your name over time, colored by sex.

# write your code here! 

# first wrangle data
babynames %>%
  filter(name == "Ryan") %>%

  # now we pipe into ggplot
  ggplot()+
  aes(x = year,
      y = n,
      color = sex)+
  geom_line(size = 1)+
  scale_color_manual(values = c("F" = "#e64173", # make my own colors
                                "M" = "#0047AB"))+
  labs(x = "Year",
       y = "Number of Babies",
       title = "Popularity of Babies Named 'Ryan'",
       color = "Sex",
       caption = "Source: SSA")+
    theme_classic(base_family = "Fira Sans Condensed", base_size=16)

Question 5

Part A

Find the most common name for boys by year between 1980-2017. [Hint: you’ll want to first group_by(year). Once you’ve got all the right conditions, you’ll get a table with a lot of data. You only want to keep just the 1st row of each grouped-year’s data, so add %>% slice(1).]

# write your code here! 

babynames %>%
  group_by(year) %>% # we want one observation per year
  filter(sex == "M",
         year > 1979) %>% # or >==1980
  arrange(desc(n)) %>% # start with largest n first
  slice(1) # take first row only

ABCDEFGHIJ0123456789

year <dbl>	sex <chr>	name <chr>	n <int>	prop <dbl>
1980	M	Michael	68693	0.03703079
1981	M	Michael	68765	0.03692247
1982	M	Michael	68228	0.03615445
1983	M	Michael	67995	0.03649110
1984	M	Michael	67736	0.03610228
1985	M	Michael	64906	0.03373805
1986	M	Michael	64205	0.03342343
1987	M	Michael	63647	0.03264834
1988	M	Michael	64133	0.03204521
1989	M	Michael	65382	0.03120182

Part B

Now do the same for girls.

# write your code here! 

babynames %>%
  group_by(year) %>% # we want one observation per year
  filter(sex == "F",
         year > 1979) %>% # or >==1980
  arrange(desc(n)) %>% # start with largest n first
  slice(1) # take first row only

ABCDEFGHIJ0123456789

year <dbl>	sex <chr>	name <chr>	n <int>	prop <dbl>
1980	F	Jennifer	58376	0.03278886
1981	F	Jennifer	57049	0.03190242
1982	F	Jennifer	57115	0.03148593
1983	F	Jennifer	54342	0.03036962
1984	F	Jennifer	50561	0.02804442
1985	F	Jessica	48346	0.02619098
1986	F	Jessica	52674	0.02854888
1987	F	Jessica	55991	0.02988050
1988	F	Jessica	51538	0.02680669
1989	F	Jessica	47885	0.02403998

Question 6

Now let’s graph the evolution of the most common names since 1880.

Part A

First, find out what are the top 10 overall most popular names for boys and for girls in the data. [Hint: first group_by(name).] You may want to create two objects, each with these top 5 names as character elements.

# write your code here! 

babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex == "M") %>%
  summarize(total = sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5)

ABCDEFGHIJ0123456789

name <chr>	total <int>
James	5150472
John	5115466
Robert	4814815
Michael	4350824
William	4102604

# make a vector of the names (we'll need this for our graph below)
top_boys_names <- c("James", "John", "Robert", "Michael", "William")

# you could alternatively add a command, 
# %>% pull(name) to the first chunk of code, 
# and it would do the same thing, but we'd want to save it, 
# for example:

babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex == "M") %>%
  summarize(total = sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5) %>%
  pull(name)

[1] "James"   "John"    "Robert"  "Michael" "William"

babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex == "F") %>%
  summarize(total = sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5)

ABCDEFGHIJ0123456789

name <chr>	total <int>
Mary	4123200
Elizabeth	1629679
Patricia	1571692
Jennifer	1466281
Linda	1452249

# make a vector of the names (we'll need this for our graph below)
top_girls_names <- c("Mary", "Elizabeth", "Patricia", "Jennifer", "Linda")

Part B

Now make two linegraphs of these 5 names over time, one for boys, and one for girls. [Hint: you’ll first want to subset the data to use for your data layer in the plot. First group_by(year) and also make sure you only use the names you found in Part A. Try using the %in% command to do this.]

# write your code here! 

babynames %>%
  group_by(year) %>%
  filter(sex == "M",
         name %in% top_boys_names) %>%
  ggplot()+
  aes(x = year,
      y = prop,
      color = name)+
  geom_line(size = 1)+
  labs(x = "Year",
       y = "Proportion of Babies with Name",
       title = "Most Popular Boys Names Since 1880",
       color = "Boy's Name",
       caption = "Source: SSA")+
  theme_classic(base_family = "Fira Sans Condensed", base_size = 16)

babynames %>%
  group_by(year) %>%
  filter(sex == "F",
         name %in% top_girls_names) %>%
  ggplot()+
  aes(x = year,
      y = prop,
      color = name)+
  geom_line(size = 1)+
  labs(x = "Year",
       y = "Proportion of Babies with Name",
       title = "Most Popular Girls Names Since 1880",
       color = "Girl's Name",
       caption = "Source: SSA")+
  theme_classic(base_family = "Fira Sans Condensed", base_size = 16)

Political and Economic Freedom Around the World

For the remaining questions, we’ll look at the relationship between Economic Freedom and Political Freedom in countries around the world today. Our data for economic freedom comes from the Fraser Institute, and our data for political freedom comes from Freedom House.

Question 7

Download these two datasets that I’ve cleaned up a bit: [If you want a challenge, try downloading them from the websites and cleaning them up yourself!]

Below is a brief description of the variables I’ve put in each dataset:

Econ Freedom

Variable	Description
`year`	Year
`ISO`	Three-letter country code
`country`	Name of the country
`ef_index`	Total economic freedom index (0 - least to 100 - most)
`rank`	Rank of the country in terms of economic freedom
`continent`	Continent the country is in

Pol Freedom

Variable	Description
`country`	Name of the country
`C/T`	Whether the location is a country (C) or territory (T)
`year`	Year
`status`	Whether the location is Free (F), Partly Free (F) or Not Free (NF)
`fh_score`	Total political freedom index (0 - least to 100 - most)

Import and save them each as an object using my_df_name <- read_csv("name_of_the_file.csv"). I suggest one as econ and the other as pol, but it’s up to you. Look at each object you’ve created.

# write your code here! 

# import data with read_csv from readr

# note these file paths assume you have these files right in your working directory

econ <- read_csv("econ_freedom.csv")

Rows: 4050 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): ISO, country, continent
dbl (3): year, ef_index, rank

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

pol <- read_csv("pol_freedom.csv")

Rows: 1885 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, C/T, status
dbl (2): year, fh_score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# look at each dataframe
econ

ABCDEFGHIJ0123456789

year <dbl>	ISO <chr>	country <chr>	ef_index <dbl>	rank <dbl>	continent <chr>
2018	ALB	Albania	7.80	26	Europe
2018	DZA	Algeria	4.97	157	Africa
2018	AGO	Angola	4.75	159	Africa
2018	ARG	Argentina	5.78	144	Americas
2018	ARM	Armenia	7.92	18	Asia
2018	AUS	Australia	8.23	5	Oceania
2018	AUT	Austria	7.80	26	Europe
2018	AZE	Azerbaijan	6.37	112	Asia
2018	BHS	Bahamas, The	7.62	39	Americas
2018	BHR	Bahrain	7.16	70	Asia

pol

ABCDEFGHIJ0123456789

country <chr>	C/T <chr>	year <dbl>	status <chr>	fh_score <dbl>
Abkhazia	t	2021	PF	40
Afghanistan	c	2021	NF	27
Albania	c	2021	PF	66
Algeria	c	2021	NF	32
Andorra	c	2021	F	93
Angola	c	2021	NF	31
Antigua and Barbuda	c	2021	F	85
Argentina	c	2021	F	84
Armenia	c	2021	PF	55
Australia	c	2021	F	97

Question 8

Now let’s join them together so that we can have a single dataset to work with. You can learn more about this in the 1.4 slides. Since both datasets have both country and year (spelled exactly the same in both!), we can use these two variables as a key to combine observations. Run the following code (substituting whatever you want to name your objects):

freedom <- left_join(econ, pol, # join pol tibble to econ tibble
                     by = c("country", "year")) # keys to match variables between two tibbles!

Take a look at freedom to make sure it appears to have worked.

# write your code here! 
freedom

ABCDEFGHIJ0123456789

year <dbl>	ISO <chr>	country <chr>	ef_index <dbl>	rank <dbl>
2018	ALB	Albania	7.80	26
2018	DZA	Algeria	4.97	157
2018	AGO	Angola	4.75	159
2018	ARG	Argentina	5.78	144
2018	ARM	Armenia	7.92	18
2018	AUS	Australia	8.23	5
2018	AUT	Austria	7.80	26
2018	AZE	Azerbaijan	6.37	112
2018	BHS	Bahamas, The	7.62	39
2018	BHR	Bahrain	7.16	70

Question 9

Part A

Make a barplot of the 10 countries with the highest Economic Freedom index score in 2018. You may want to find this first and save it as an object to use for your plot’s data layer. Use geom_col() since we will map ef_index to y. If you want to order the bars, set x = fct_reorder(ISO, desc(ef_index)) to reorder ISO (or country, if you prefer) by EF score in descending order.

# write your code here! 

# grab the top 10 countries by ef in 2018
ef_10 <- freedom %>%
  filter(year == 2018) %>%
  arrange(desc(ef_index)) %>%
  slice(1:10)

# look at it just to check
ef_10

ABCDEFGHIJ0123456789

year <dbl>	ISO <chr>	country <chr>	ef_index <dbl>	rank <dbl>	continent <chr>
2018	HKG	Hong Kong SAR, China	8.94	1	Asia
2018	SGP	Singapore	8.65	2	Asia
2018	NZL	New Zealand	8.53	3	Oceania
2018	CHE	Switzerland	8.43	4	Europe
2018	AUS	Australia	8.23	5	Oceania
2018	USA	United States	8.22	6	Americas
2018	MUS	Mauritius	8.21	7	Africa
2018	GEO	Georgia	8.18	8	Asia
2018	CAN	Canada	8.17	9	Americas
2018	IRL	Ireland	8.13	10	Europe

# now plot it
ggplot(data = ef_10)+
  aes(x = fct_reorder(ISO, desc(ef_index)), # reorder ISO by ef in order
      y = ef_index)+
  geom_col(aes(fill = continent))+ # coloring is optional
  
  # above is sufficient, now let's just make it prettier
  geom_text(aes(label = ef_index), # add the score onto the bar
            vjust = 1.2, # adjust it vertically
            color = "white")+
  scale_y_continuous(breaks = seq(0,10,2),
                     limits = c(0,10),
                     expand = c(0,0)
                     )+
  labs(x = "Country",
       y = "Economic Freedom Score",
       title = "Top 10 Countries by Economic Freedom",
       caption = "Source: Frasier Institute",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold",
                                  size = rel(1.5))
        )

Part B

Make a barplot of the 10 countries with the highest Freedom House index score in 2018, similar to what you did for Part A.

# write your code here! 

# grab the top 10 countries by fh in 2018
pf_10 <- freedom %>%
  filter(year == 2018) %>%
  arrange(desc(fh_score)) %>%
  slice(1:10)

# look at it just to check
pf_10

ABCDEFGHIJ0123456789

year <dbl>	ISO <chr>	country <chr>	ef_index <dbl>	rank <dbl>	continent <chr>	C/T <chr>	status <chr>
2018	FIN	Finland	7.76	29	Europe	c	F
2018	NOR	Norway	7.60	43	Europe	c	F
2018	SWE	Sweden	7.58	46	Europe	c	F
2018	CAN	Canada	8.17	9	Americas	c	F
2018	NLD	Netherlands	7.82	24	Europe	c	F
2018	AUS	Australia	8.23	5	Oceania	c	F
2018	LUX	Luxembourg	7.75	31	Europe	c	F
2018	NZL	New Zealand	8.53	3	Oceania	c	F
2018	URY	Uruguay	7.25	66	Americas	c	F
2018	DNK	Denmark	8.10	11	Europe	c	F

# now plot it
ggplot(data = pf_10)+
  aes(x = fct_reorder(ISO, desc(fh_score)),
      y = fh_score)+
  geom_col(aes(fill = continent))+ # coloring is optional
  # above is sufficient, now let's just make it prettier
  geom_text(aes(label = fh_score), # add the score onto the bar
            vjust = 1.2, # adjust it vertically
            color = "white")+
  scale_y_continuous(breaks = seq(0,100,20),
                     limits = c(0,100),
                     expand = c(0,0))+
  labs(x = "Country",
       y = "Political Freedom Score",
       title = "Top 10 Countries by Political Freedom",
       caption = "Source: Freedom House",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

Question 10

Now make a scatterplot of Political freedom (fh_score as y) on Economic Freedom (ef_index as x) in the year 2018, and color by continent.

# write your code here! 

# note I'm going to save the plot as an object called p, for next question
p <- freedom %>%
  filter(year == "2018") %>% 
  ggplot()+
  aes(x = ef_index,
      y = fh_score)+
  # doing just geom_point() is fine, but since there's a lot of overlap, here are some things I like to do:
  geom_point(aes(fill = continent), # fill the points with color by continent
             alpha = 0.9, # make points slightly transparent
             color = "white", # outline the points with a white border
             pch = 21, # this shape has an outline and a fill color
             size = 3)+
  scale_x_continuous(breaks = seq(0,10,2),
                     limits = c(0,10),
                     expand = c(0,0))+
  scale_y_continuous(breaks = seq(0,100,20),
                     limits = c(0,105),
                     expand = c(0,0))+
  labs(x = "Economic Freedom Score",
       y = "Political Freedom Score",
       caption = "Sources: Frasier Institute, Freedom House",
       title = "Economic Freedom & Political Freedom",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

# look at plot
p

Warning: Removed 13 rows containing missing values (geom_point).

Question 11

Save your plot from Question 10 as an object, and add a new layer where we will highlight a few countries. Pick a few countries (I suggest using the ISO code) and create a new object filtering the data to only include these countries (again the %in% command will be most helpful here).

Additionally, install and load a package called "ggrepel", which will adjust labels so they do not overlap on a plot.

Then, add the following layer to your plot:

geom_label_repel(data = countries, # or whatever object name you created
                     aes(x = ef_index,
                         y = fh_score,
                         label = ISO, # show ISO as label (you could do country instead)
                         color = continent),
                     alpha = 0.5, # make it a bit transparent
                     box.padding = 0.75, # control how far labels are from points
                     show.legend = F) # don't want this to add to the legend

This should highlight these countries on your plot.

library(ggrepel)

# pick some countries
some_countries <- freedom %>%
  filter(year==2018,
         country %in% c("United States",
                        "United Kingdom",
                        "Sweden",
                        "China",
                        "Singapore",
                        "Russian Federation",
                        "Korea, Rep.",
                        "Hong Kong SAR, China"))

# write your code here! 

p + geom_label_repel(data = some_countries, # or whatever object name you created
                     aes(x = ef_index,
                         y = fh_score,
                         label = ISO, # show ISO as label (you could do country instead)
                         color = continent),
                     alpha = 0.75, # make it a bit transparent
                     box.padding = 0.75, # control how far labels are from points
                     show.legend = F) # don't want this to add to the legend

Warning: Removed 13 rows containing missing values (geom_point).

Question 12

Let’s just look only at the United States and see how it has fared in both measures of freedom over time. filter() the data to look only at the United States (its ISO is "USA"). Use both a geom_point() layer and a geom_path() layer, which will connect the dots over time. Let’s also see this by labeling the years with an additional layer geom_text_repel(aes(label = year)).

# write your code here! 

# save plot as us
us <- freedom %>%
  filter(ISO == "USA") %>%
  ggplot()+
  aes(x = ef_index,
      y = fh_score)+
  geom_point(color = "red")+
  geom_path(color = "red")+
  geom_text_repel(aes(label = year),
                  color = "red")+
  scale_x_continuous(breaks = seq(8,8.5,0.05),
                     limits = c(8,8.5),
                     expand = c(0,0))+
  scale_y_continuous(breaks = seq(85,95,1),
                     limits = c(85,95),
                     expand = c(0,0))+
  labs(x = "Economic Freedom Score",
       y = "Political Freedom Score",
       caption = "Sources: Frasier Institute, Freedom House",
       title = "U.S. Political & Economic Freedom, 2013—2018",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

# look at it
us

Warning: Removed 19 rows containing missing values (geom_point).

Warning: Removed 19 row(s) containing missing values (geom_path).

Warning: Removed 19 rows containing missing values (geom_text_repel).

Note that the way I zoomed in on the scales, these look like pretty dramatic changes!

If we maintain the full perspective, the change appears minor. Be very careful how you present your analysis!

us +
  # force scales to show full range of 0-10 for x, 0-100 for y
  scale_x_continuous(breaks = seq(0,10,1),
                     limits = c(0,10),
                     expand = c(0,0)
                     )+
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100),
                     expand = c(0,0)
                     )

Scale for 'x' is already present. Adding another scale for 'x', which will
replace the existing scale.

Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.

Warning: Removed 19 rows containing missing values (geom_point).

Warning: Removed 19 row(s) containing missing values (geom_path).

Warning: Removed 19 rows containing missing values (geom_text_repel).

Knit and Submit!

When you are done, click the Render button. Based on the current yaml header format: html, this will currently produce an html webpage, which should automatically open for your review.

Notice in the Files pane in R Studio (by default, the lower right one), there should now be a document called 01-problem-set.html (or if you changed the filename) ending in .html. This is the webpage, so you can find this file on your computer (or download it from Rstudio.cloud with by clicking on the checkmark box in front of the file in the Files page and then going to More -> Export... to download the file to your computer) and send this file.

If you want to make a PDF, install the package “tinytex” and run the following code to install a LaTeX distribution:

Then delete the lines in the yaml header that say format: html: self-contained: TRUE, and add a simple line that says format: pdf . Clicking Render will now produce a PDF, show it, and save it as a new file in the Files pane.

Either way, send me your output file, html or pdf (or, if you like, word) so long as it shows the input and output code of every chunk. I have set it by default to do this, with echo: true in the yaml header.

Don’t forget to add your name to the author part of the header!