Data Sources and Suggestions
List of Public Datasets, Data Sources, and R APIs
Built-in Datasets
- A near-comprehensive list of all existing data sets built-in to R or R packages1
General Databases of Datasets
Good R Packages for Getting Data in R Format2
Below are packages written by and for R users that link up with the API of key data sets for easy use in R. Each link goes to the documentation and description of each package.
Don’t forget to install3 first and then load it with library()
.
owidR
for importing data from Our World in Datawbstats
provides access to all the data available on the World Bank API, which is basically everything on their website. The World Bank keeps track of many country-level indicators over time.tidycensus
gives you access to data from the US Census and the American Community Survey. These are the largest high-quality data sets you’ll find of cross-sectional data on individual people in the US. You’ll need to get a (free) API key from the website (or ask me for mine).fredr
gets data from the Federal Reserve’s Economic Database (FRED). You’ll need to get a (free) API key from the website (or ask me for mine).tidyquant
gets data from a number of financial sources (includingfredr
).icpsrdata
downloads data from the Inter-university Consortium for Political and Social Research (you’ll need an account and a keycode). ICPSR is a database of datasets from published social science papers for the purposes of reproducibility.NHANES
uses data from the US National Health and Nutrition Examination Survey.ipumsr
has census data from all around the world, in addition to the US census, American Community Survey, and Current Population Survey. If you’re doing international micro work, look at IPUMS. It’s also the easiest way to get the Current Population Survey (CPS), which is very popular for labor economics. Unfortunately ipumsr won’t get the data from within R; you’ll have to make your own data extract on the IPUMS website and download it. But ipumsr will read that file into R and preserve things like names and labels.education-data-package-r
4 is the Urban Institute’s data data on educational institutions in the US, including colleges (in IPEDS) and K-12 schools (in CCD). This package also has data on county-level poverty rates from SAIPE.psidR
is the Panel Study of Income Dynamics. This study doesn’t just follow people over their lifetimes, it follows their children too, generationally! A great source for studying how things follow families through generations.atus
is th e American Time Use Survey, which is a large cross-sectional data set with information on how people spend their time.Rilostat
uses data from the International Labor Organization. This contains lots of different statistics on labor, like employment, wage gaps, etc., generally aggregated to the national level and changing over time.democracyData
5 is a great “package for accessing and manipulating existing measures of democracy.”politicaldata
provides useful functions for obtaining commonly-used data in political analysis and political science, including from sources such as the Comparative Agendas Project (which provides data on politics and policy from 20+ countries), the MIT Election and Data Science Lab, and FiveThirtyEight.
Below is a list of good data sources depending on the types of topics you might be interested in writing on:6
Key Data Sources
Coronavirus Data: John Hopkins CSSE Covid-19 data (definitive), Our World in Data, New York Times Covid data,
covdata
r package, Tidy Covid dataIPUMS (Integrated Public Use Microdata Series)
ICPSR (Inter-university Consortium for Political and Social Research)
By Topic
- Quality of Government Data has an extremely wide range of data sources pertaining to measures of institutions. The data itself can be found here.
- National and State Accounts Data: Bureau of Economic Analysis
- Labor Market and Price Data: Bureau of Labor Statistics
- Macroeconomic Data: Federal Reserve Economic Data (FRED), World Development Indicators (World Bank), Penn World Table
- International Data: NationMaster.com, Doing Business, CIESIN
- Census Data: U.S. Census Bureau
- Sports Data: Spotrac, Rodney Fort’s Sports Data
- Data Clearing House: Stat USA, Fedstats, Statistical Abstract of the United States, Resources for Economists
- Political and Social Data: ICPSR, Federal Election Commission, Poole and Rosenthal Roll Call Data (Voting ideology), Archigos Data on Political Leaders, Library of Congress: Thomas (Legislation), Iowa Electronic Markets (Prediction Markets)
- War and Violence Data: Correlates of War
- State Level Data: Correlates of State Policy
- Health Data: Centers for Disease Control, CDC Wonder System
- Crime Data: Bureau of Justice Statistics
- Education Data: National Center for Education Statistics
- Environmental Data: EPA
- Religion Data: American Religion Data Archiva (ARDA)
- Financial Data: Financial Data Finder{Financial Data Finder}
- Philanthropy Data: The Urban Institute
Footnotes
Note: You should use these more for playing around with in R to boost your data wrangling skills. These should not be used for your projects in most circumstances.↩︎
Some of these come from Nick Huntington-Klein’s excellent list.↩︎
install.packages("name_of_package")
↩︎Note you will need to install
devtools
package first, and then install the package directly from Github with the commanddevtools::install_github('UrbanInstitute/education-data-package-r')
↩︎Note you will need to install
devtools
package first, and then install the package directly from Github with the commanddevtools::install_github('xmarquez/democracyData')
↩︎Some of these come from various sources, including https://github.com/awesomedata/awesome-public-datasets#economics↩︎