This guide outlines some useful workflows for pulling data sets commonly used by the Urban Institute.
library(tidycensus)
library(tidycensus) by Kyle Walker (complete intro here) is the best tool for accessing some Census data sets in R from the Census Bureau API. The package returns tidy data frames and can easily pull shapefiles by adding geometry = TRUE.
Here is a simple example for one state with shapefiles:
library(tidyverse)library(purrr)library(tidycensus)# pull median household income and shapefiles for Census tracts in Alabamaget_acs(geography ="tract", variables ="B19013_001", state ="01",year =2015,geometry =TRUE,progress =FALSE)
Simple feature collection with 1181 features and 5 fields (with 1 geometry empty)
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -88.47323 ymin: 30.22333 xmax: -84.88908 ymax: 35.00803
Geodetic CRS: NAD83
First 10 features:
GEOID NAME variable
1 01003010500 Census Tract 105, Baldwin County, Alabama B19013_001
2 01003011501 Census Tract 115.01, Baldwin County, Alabama B19013_001
3 01009050500 Census Tract 505, Blount County, Alabama B19013_001
4 01015981901 Census Tract 9819.01, Calhoun County, Alabama B19013_001
5 01025957700 Census Tract 9577, Clarke County, Alabama B19013_001
6 01025958002 Census Tract 9580.02, Clarke County, Alabama B19013_001
7 01031011000 Census Tract 110, Coffee County, Alabama B19013_001
8 01033020500 Census Tract 205, Colbert County, Alabama B19013_001
9 01037961200 Census Tract 9612, Coosa County, Alabama B19013_001
10 01039961700 Census Tract 9617, Covington County, Alabama B19013_001
estimate moe geometry
1 41944 8100 MULTIPOLYGON (((-87.80249 3...
2 41417 14204 MULTIPOLYGON (((-87.71719 3...
3 40055 8054 MULTIPOLYGON (((-86.75735 3...
4 NA NA MULTIPOLYGON (((-86.01323 3...
5 32708 4806 MULTIPOLYGON (((-88.1805 31...
6 29048 14759 MULTIPOLYGON (((-87.98623 3...
7 44732 7640 MULTIPOLYGON (((-85.92018 3...
8 49052 6543 MULTIPOLYGON (((-87.76733 3...
9 31957 9954 MULTIPOLYGON (((-86.46069 3...
10 32697 6021 MULTIPOLYGON (((-86.6998 31...
Smaller geographies like Census tracts can only be pulled state-by-state. This example demonstrates how to iterate across FIPS codes to pull Census tracts for multiple states. The process is as follows:
Pick the variables of interest
Create a vector of state FIPS codes for the states of interest
Create a custom function that works on a single state FIPS code
Iterate the function along the vector of state FIPS codes with map_df() from library(purrr)
Here is an example that pulls median household income at the Census tract level for multiple states:
# variables of interestvars <-c("B19013_001"# median household income estimate)# states of interest: alabama, alaska, arizonastate_fips <-c("01", "02", "04")# create a custom function that works for one stateget_income <-function(state_fips) { income_data <-get_acs(geography ="tract", variables = vars, state = state_fips,year =2015)return(income_data)}# iterate the functionmap_df(.x = state_fips, # iterate along the vector of state fips codes.f = get_income) # apply get_income() to each fips_code
# A tibble: 2,874 × 5
GEOID NAME varia…¹ estim…² moe
<chr> <chr> <chr> <dbl> <dbl>
1 01001020100 Census Tract 201, Autauga County, Alabama B19013… 61838 11900
2 01001020200 Census Tract 202, Autauga County, Alabama B19013… 32303 13538
3 01001020300 Census Tract 203, Autauga County, Alabama B19013… 44922 5629
4 01001020400 Census Tract 204, Autauga County, Alabama B19013… 54329 7003
5 01001020500 Census Tract 205, Autauga County, Alabama B19013… 51965 6935
6 01001020600 Census Tract 206, Autauga County, Alabama B19013… 63092 9585
7 01001020700 Census Tract 207, Autauga County, Alabama B19013… 34821 7867
8 01001020801 Census Tract 208.01, Autauga County, Alaba… B19013… 73728 2447
9 01001020802 Census Tract 208.02, Autauga County, Alaba… B19013… 60063 8602
10 01001020900 Census Tract 209, Autauga County, Alabama B19013… 41287 7857
# … with 2,864 more rows, and abbreviated variable names ¹variable, ²estimate
library(tidycensus) works well with library(tidyverse) and enables access to geospatial data, but it is limited to only some Census Bureau data sets. The next package has less functionality but allows for accessing any data available on the Census API.
library(censusapi)
library(censusapi) by Hannah Recht (complete intro here) can access any published table that is accessible through the Census Bureau API. A full listing is available here.
Here is a simple example that pulls median household income and its margin of error for Census tracts in Alabama:
library(tidyverse)library(purrr)library(censusapi)vars <-c("B19013_001E", # median household income estimate"B19013_001M"# median household income margin of error)getCensus(name ="acs/acs5",key =Sys.getenv("CENSUS_API_KEY"),vars = vars, region ="tract:*",regionin ="state:01",vintage =2015) %>%as_tibble()
Smaller geographies like Census tracts can only be pulled state-by-state. This example demonstrates how to iterate across FIPS codes to pull Census tracts for multiple states. The process is as follows:
Pick the variables of interest
Create a vector of state FIPS codes for the states of interest
Create a custom function that works on a single state FIPS code
Iterate the function along the vector of state FIPS codes with map_df() from library(purrr)
Here is an example that pulls median household income at the Census tract level for multiple states:
# variables of interestvars <-c("B19013_001E", # median household income estimate"B19013_001M"# median household income margin of error)# states of interest: alabama, alaska, arizonastate_fips <-c("01", "02", "04")# create a custom function that works for one stateget_income <-function(state_fips) { income_data <-getCensus(name ="acs/acs5", key =Sys.getenv("CENSUS_API_KEY"),vars = vars, region ="tract:*",regionin =paste0("state:", state_fips),vintage =2015)return(income_data)}# iterate the functionmap_df(.x = state_fips, # iterate along the vector of state fips codes.f = get_income) %>%# apply get_income() to each fips_code as_tibble()