Data Sources and Suggestions

List of Public Datasets, Data Sources, and R APIs

Built-in Datasets

General Databases of Datasets

Good R Packages for Getting Data in R Format2

Below are packages written by and for R users that link up with the API of key data sets for easy use in R. Each link goes to the documentation and description of each package.

Don’t forget to install3 first and then load it with library().

  • owidR for importing data from Our World in Data
  • wbstats provides access to all the data available on the World Bank API, which is basically everything on their website. The World Bank keeps track of many country-level indicators over time.
  • tidycensus gives you access to data from the US Census and the American Community Survey. These are the largest high-quality data sets you’ll find of cross-sectional data on individual people in the US. You’ll need to get a (free) API key from the website (or ask me for mine).
  • fredr gets data from the Federal Reserve’s Economic Database (FRED). You’ll need to get a (free) API key from the website (or ask me for mine).
  • tidyquant gets data from a number of financial sources (including fredr).
  • icpsrdata downloads data from the Inter-university Consortium for Political and Social Research (you’ll need an account and a keycode). ICPSR is a database of datasets from published social science papers for the purposes of reproducibility.
  • NHANES uses data from the US National Health and Nutrition Examination Survey.
  • ipumsr has census data from all around the world, in addition to the US census, American Community Survey, and Current Population Survey. If you’re doing international micro work, look at IPUMS. It’s also the easiest way to get the Current Population Survey (CPS), which is very popular for labor economics. Unfortunately ipumsr won’t get the data from within R; you’ll have to make your own data extract on the IPUMS website and download it. But ipumsr will read that file into R and preserve things like names and labels.
  • education-data-package-r4 is the Urban Institute’s data data on educational institutions in the US, including colleges (in IPEDS) and K-12 schools (in CCD). This package also has data on county-level poverty rates from SAIPE.
  • psidR is the Panel Study of Income Dynamics. This study doesn’t just follow people over their lifetimes, it follows their children too, generationally! A great source for studying how things follow families through generations.
  • atus is th e American Time Use Survey, which is a large cross-sectional data set with information on how people spend their time.
  • Rilostat uses data from the International Labor Organization. This contains lots of different statistics on labor, like employment, wage gaps, etc., generally aggregated to the national level and changing over time.
  • democracyData5 is a great “package for accessing and manipulating existing measures of democracy.”
  • politicaldata provides useful functions for obtaining commonly-used data in political analysis and political science, including from sources such as the Comparative Agendas Project (which provides data on politics and policy from 20+ countries), the MIT Election and Data Science Lab, and FiveThirtyEight.

Below is a list of good data sources depending on the types of topics you might be interested in writing on:6

Key Data Sources

By Topic

Footnotes

  1. Note: You should use these more for playing around with in R to boost your data wrangling skills. These should not be used for your projects in most circumstances.↩︎

  2. Some of these come from Nick Huntington-Klein’s excellent list.↩︎

  3. install.packages("name_of_package")↩︎

  4. Note you will need to install devtools package first, and then install the package directly from Github with the command devtools::install_github('UrbanInstitute/education-data-package-r')↩︎

  5. Note you will need to install devtools package first, and then install the package directly from Github with the command devtools::install_github('xmarquez/democracyData')↩︎

  6. Some of these come from various sources, including https://github.com/awesomedata/awesome-public-datasets#economics↩︎