Warning

This assignment is due by class on Wednesday September 21.

The PDF is useful if you want to print out the problem set and write on it. The R Project is a zipped `.zip` file which contains a `.qmd` file to write answers in, and the data, all in a logical working directory. (See this resource for help unzipping files). You can also just write an `.R` file in the project if you don’t want to use markdown. If you use the cloud project, I have already installed `tidyverse` and `tinytex` (to produce pdfs).

# The Popularity of Baby Names

Install and load the package `babynames`. Get help for `?babynames` to see what the data includes.

## Question 1

### Part A

What are the top 5 boys names for 2017, and what percent (note not the proportion!) of overall names is each?

### Part B

What are the top 5 girls names for 2017, and what percent of overall names is each?

## Question 2

Make two barplots of these top 5 names, one for each sex. Map `aes`thetics `x` to `name` and `y` to `prop` [or `percent`, if you made that variable, as I did.] and use `geom_col` (since you are declaring a specific `y`, otherwise you could just use `geom_bar()` and just an `x`.)

## Question 3

Find your name. [If your name isn’t in there 😟, pick a random name.] `count` by `sex` how many babies since 1880 were named your name. [Hint: if you do only this, you’ll get the number of rows (years) there are in the data. You want to add the number of babies in each row (`n`), so inside `count`, add `, wt = n` to weight the count by `n`.] Also create a variable for the percent of each sex.

## Question 4

Make a line graph of the number of babies with your name over time, `color`ed by `sex`.

## Question 5

### Part A

Find the most common name for boys by year between 1980-2017. [Hint: you’ll want to first `group_by(year)`. Once you’ve got all the right conditions, you’ll get a table with a lot of data. You only want to `slice(1)` to keep just the 1st row of each grouped-year’s data.]

### Part B

Now do the same for girls.

## Question 6

Now let’s graph the evolution of the most common names since 1880.

### Part A

First, find out what are the top 10 overall most popular names for boys and for girls in the data. [Hint: first `group_by(name)`.] You may want to create two objects, each with these top 5 names as character elements.

### Part B

Now make two `line`graphs of these 5 names over time, one for boys, and one for girls. [Hint: you’ll first want to subset the data to use for your `data` layer in the plot. First `group_by(year)` and also make sure you only use the names you found in Part A. Try using the `%in%` command to do this.]

# Political and Economic Freedom Around the World

For the remaining questions, we’ll look at the relationship between Economic Freedom and Political Freedom in countries around the world today. Our data for economic freedom comes from the Fraser Institute, and our data for political freedom comes from Freedom House.

## Question 7

Below is a brief description of the variables I’ve put in each dataset:

### Econ Freedom

Variable Description
`year` Year
`ISO` Three-letter country code
`country` Name of the country
`ef_index` Total economic freedom index (0 - least to 100 - most)
`rank` Rank of the country in terms of economic freedom
`continent` Continent the country is in

### Pol Freedom

Variable Description
`country` Name of the country
`C/T` Whether the location is a country (C) or territory (T)
`year` Year
`status` Whether the location is Free (F), Partly Free (F) or Not Free (NF)
`fh_score` Total political freedom index (0 - least to 100 - most)

Import and save them each as an object using `my_df_name <- read_csv("name_of_the_file.csv")`. I suggest one as `econ` and the other as `pol`, but it’s up to you. Look at each object you’ve created.

## Question 8

Now let’s join them together so that we can have a single dataset to work with. You can learn more about this in the 1.4 slides. Since both datasets have both `country` and `year` variables (spelled exactly the same in both!), we can use these two variables as a `key` to combine observations. Run the following code (substituting whatever you are naming your objects):

``````freedom <- left_join(econ, pol, # join pol tibble to econ tibble
by = c("country", "year")) # keys to match variables between two tibbles!``````

Take a look at `freedom` to make sure it appears to have worked.

## Question 9

### Part A

Make a barplot of the 10 countries with the highest Economic Freedom index score in 2018. You may want to find this first and save it as an object to use for your plot’s `data` layer. Use `geom_col()` since we will map `ef_index` to `y`. If you want to order the bars, set `x = fct_reorder(ISO, desc(ef_index))` to reorder `ISO` (or `country`, if you prefer) by EF score in descending order.

### Part B

Make a barplot of the 10 countries with the highest Freedom House index score in 2018, similar to what you did for Part A.

## Question 10

Now make a scatterplot of Political freedom (`fh_score` as `y`) on Economic Freedom (`ef_index` as `x`) in the year 2018, and `color` by `continent`.

## Question 11

Save your plot from Question 10 as an object, and add a new layer where we will highlight a few countries. Pick a few countries (I suggest using the `ISO` code) and create a new object `filtering` the data to only include these countries (again the `%in%` command will be most helpful here).

Additionally, install and load a package called `"ggrepel"`, which will adjust labels so they do not overlap on a plot.

``````geom_label_repel(data = countries, # or whatever object name you created
aes(x = ef_index,
y = fh_score,
label = ISO, # show ISO as label (you could do country instead)
color = continent),
alpha = 0.5, # make it a bit transparent
box.padding = 0.75, # control how far labels are from points
show.legend = F) # don't want this to add to the legend``````

This should highlight these countries on your plot.

## Question 12

Let’s just look only at the United States and see how it has fared in both measures of freedom over time. `filter()` the data to look only at the United States (its `ISO` is `"USA"`). Use both a `geom_point()` layer and a `geom_path()` layer, which will connect the dots over time. Let’s also see this by labeling the years with an additional layer `geom_text_repel(aes(label = year))`.