Install and load the package babynames. Get help for ?babynames to see what the data includes. Also, don’t forget to load tidyverse!
# write your code here! # install.packages("babynames") # install for first use # Note I've “commented” out some of these commands (with a #) so they do not run when I run this chunk or render this documentlibrary(babynames) # load for datalibrary(tidyverse) # load for data wrangling
What are the top 5 boys names for 2017, and what percent (note not the proportion!) of overall names is each?
# write your code here! # save as a new tibbletop_5_boys_2017 <- babynames %>%# take datafilter(sex =="M", # filter by males year ==2017) %>%# and for 2007arrange(desc(n)) %>%# arrange in largest-to-smallest order of n (number)slice(1:5) %>%# optional, look only at first 5 rows; head(n=5) also worksmutate(percent =round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals# look at our new tibbletop_5_boys_2017
The top 5 names are
top_5_boys_2017 %>%select(name,percent) %>% knitr::kable() # for nicer table in rendered document
name
percent
Liam
0.95
Noah
0.93
William
0.76
James
0.72
Logan
0.71
Alternatively, you could just write what you found manually into an object like:
# you could alternatively add a command, # %>% pull(name) to the first chunk of code, # and it would do the same thing, but we'd want to save it, # for example:top_5_boys_2017_alt <- babynames %>%filter(sex=="M", year==2017) %>%arrange(desc(n)) %>%slice(1:5) %>%mutate(percent =round(prop*100, 2)) %>%pull(name)top_5_boys_2017_alt
[1] "Liam" "Noah" "William" "James" "Logan"
Part B
What are the top 5 girls names for 2017, and what percent of overall names is each?
# write your code here! # save as a new tibbletop_5_girls_2017 <- babynames %>%# take datafilter(sex =="F", # filter by females year ==2017) %>%# and for 2007arrange(desc(n)) %>%# arrange in largest-to-smallest order of n (number)slice(1:5) %>%# optional, look only at first 5 rows; head(., n=5) also worksmutate(percent =round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals# look at our new tibbletop_5_girls_2017
Make two barplots of these top 5 names, one for each sex. Map aesthetics x to name and y to prop [or percent, if you made that variable, as I did.] and use geom_col (since you are declaring a specific y, otherwise you could just use geom_bar() and just an x.)
# write your code here! ggplot(data = top_5_boys_2017)+aes(x =reorder(name, n), #note this reorders the x variable from small to large ny = percent, # you can use prop if you didn't make a percent variablefill = name)+# optional color!geom_col()+# all of the above is sufficient, now I'm just making it prettyscale_y_continuous(labels =function(x){paste0(x, "%")}, # add percent signsbreaks =seq(from =0, # make line breaks every 0.25%to =1,by =0.25),limits =c(0,1), # limit axis to between 0 and 1expand =c(0,0))+# don't let it go beyond thislabs(x ="Name",y ="Percent of All Babies With Name",title ="Most Popular Boys Names Since 1880",fill ="Boy's Name",caption ="Source: SSA")+ ggthemes::theme_pander(base_family ="Fira Sans Condensed", base_size=16)+coord_flip()+# flip axes to make horizontal!scale_fill_viridis_d(option ="default")+# use viridis discrete color palettetheme(legend.position ="") # hide legend
Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
'default' does not exist. Defaulting to 'viridis'.
ggplot(data = top_5_girls_2017)+aes(x =reorder(name, n), #note this reorders the x variable from small to large ny = percent, # you can use prop if you didn't make a percent variablefill = name)+# optional color!geom_col()+# all of the above is sufficient, now I'm just making it prettyscale_y_continuous(labels =function(x){paste0(x, "%")}, # add percent signsbreaks =seq(from =0, # make line breaks every 0.25%to =1.25,by =0.25),limits =c(0,1.3), # limit axis to between 0 and 1.2expand =c(0,0))+# don't let it go beyond thislabs(x ="Name",y ="Percent of All Girls With Name",title ="Most Popular Girls Names Since 1880",fill ="Girl's Name",caption ="Source: SSA")+ ggthemes::theme_pander(base_family ="Fira Sans Condensed", base_size=16)+coord_flip()+# flip axes to make horizontal!scale_fill_viridis_d(option ="default")+# use viridis discrete color palettetheme(legend.position ="") # hide legend
Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
'default' does not exist. Defaulting to 'viridis'.
If you had gone the alternate route by saving an object of names (like I did above with top_5_boys_2017_alt and top_5_girls_2017_alt), you could filter the data using the %in% operator to use for your data layer of each plot.
Note you can also simply pipe your wrangling code into ggplot(), since the first layer is the data source:
babynames %>%filter(name %in% top_5_boys_2017_alt) %>%ggplot()+# this pipes the above into the data layer# the rest of the plot code...
Question 3
Find your name. [If your name isn’t in there 😟, pick a random name.] count by sex how many babies since 1880 were named your name. [Hint: if you do only this, you’ll get the number of rows (years) there are in the data. You want to add the number of babies in each row (n), so inside count, add , wt = n to weight the count by n.] Also create a variable for the percent of each sex.
Make a line graph of the number of babies with your name over time, colored by sex.
# write your code here! # first wrangle datababynames %>%filter(name =="Ryan") %>%# now we pipe into ggplotggplot()+aes(x = year,y = n,color = sex)+geom_line(size =1)+scale_color_manual(values =c("F"="#e64173", # make my own colors"M"="#0047AB"))+labs(x ="Year",y ="Number of Babies",title ="Popularity of Babies Named 'Ryan'",color ="Sex",caption ="Source: SSA")+theme_classic(base_family ="Fira Sans Condensed", base_size=16)