This guide is a brief introduction to mapping sub-county geographies with R. Counties can use this guide to understand the distribution of useful metrics at the census tract, zip code tabulation area (ZCTA), and public use microdata area (PUMA) levels. First we walk you through setting up the necessary tools, including R and RStudio, to create maps. Then, we explain how to work with geospatial data and how it differs from other types of data. Finally, we present examples of maps at the census tract, ZCTA, and PUMA levels.


Step 1: Setting up the Environment

After completing all of the Action items in this step, you will have all of the tools you need to begin making maps.

R and RStudio

Brief

  • R is a free, open-source software for statistical computing and data science.
    • Action: Download and install R here.
  • RStudio is a free, open-source integrated development environment (IDE) that runs on top of R. R users almost exclusively open RStudio and rarely directly open R.
    • Action: Download and install RStudio here.

Extended


RStudio Projects

Brief

  • Creating an RStudio project for each data analysis project allows you to keep all files associated with that project together, and serves as an easy way to tell R where to load/save files.
    • Action: Now that you have installed R and RStudio, download the template and double click template.Rproj to get started.

Extended

  • The command for setting working directories in R is setwd(), and the command for checking what the current working directory is getwd(). However, setwd() relies on absolute file paths. An absolute file path is a string that contains the root element and the complete directory list to locate a file. This can cause problems with loading and saving your files if that path ever changes, such as when reorganizing directories on your machine or when collaborating with people using different machines. RStudio projects make all file paths relative to the root directory, specified by a file with the extension .Rproj.

  • Read more about RStudio Projects and .Rproj files below:


Running R Code

Brief

RStudio has four main panels: code editor (top left), R console (bottom left), environment and history (top right), and files, plots, packages, help, and viewer pane (bottom right).

There are two main ways to run R code:

  • Type code to the right of the > (called the “prompt”) in the R console and hit enter. Running code this way does not create a long-term document of your code.

  • Type code into the R script in the code editor panel. Highlight the desired code and click Run in the top right of the code editor panel, or type Ctrl/command+enter. Scripts can be saved, so they are the best way to document your code for future use and reference.

Extended

No extended version.


R Packages

Brief

  • Packages in R are collections of functions and other code that extend the capabilities of base R. Packages only need to be installed once per machine, but must be loaded once per session. Packages are installed using install.packages("package_name") and then loaded into R using library(package_name). The packages we use in this guide are library(tidyverse), library(here), library(tidycensus), library(crsuggest), library(sf), and library(tigris).
    • Action: Install these packages by un-commenting the first six lines of code in example.R that start with install.packages and running those six lines of code.
      • Note: Comments start with a #. When executing code, R will ignore anything that starts with #. You can add or remove #s to multiple lines of code by selecting the lines you want to comment using the cursor and using the key combination command + shift + C on Mac or control + shift + C on Windows.
    • Action: Since packages only need to be installed once per machine, you can comment out these lines again after installing.
# install.packages("tidyverse")
# install.packages("here")
# install.packages("tidycensus")
# install.packages("crsuggest")
# install.packages("sf")
# install.packages("tigris")

library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)

Extended

More information about these packages is included in later sections of this guide. * library(ggplot2) - part of the tidyverse ecosystem, ggplot2 is a package for creating data visualizations including maps.

  • library(sf) - plots simple features (sf), which is a standardized way to encode spatial vector data. This function automatically references the geometry column and makes it simple to combine point, line, and polygon data.

  • library(tigris) - downloads and provides TIGER/Line shapefiles from the US Census Bureau. TIGER stands for Topologically Integrated Geographic Encoding and Referencing.

  • library(tidycensus) - can return simple feature geometry for geographic units along with variables from the decennial US Census or American Community survey. By setting geometry = TRUE in a tidycensus function call, tidycensus will use the tigris package to retrieve the corresponding geographic dataset from the US Census Bureau and pre-merge it with the tabular data obtained from the Census API.

  • library(crsuggest) - attempts to match an input spatial dataset with corresponding coordinate reference systems that will work well for mapping and/or spatial analysis.


Census API Key

Brief

  • An application programming interface (API) is a connection between computers or between computer programs that allows two applications to talk to each other. An API allows applications to access data and interact with external software components.

  • An application programming interface (API) key is a unique code used to identify and authenticate an application or user. API keys are often used to track and control how the API is being used.

  • The Census Data API gives the public access to raw statistical data from various Census Bureau data programs. It is an efficient way to query data directly from Census Bureau servers. A Census API Key is required to use tidycensus.

    • Action: Request a Census API Key here.
    • Action: After receiving your Census API Key, copy and paste it into <paste key here> in each census_api_key.R located in the tract_template, zcta_template, and puma_template folders (pasting three times total). Be sure to delete the < >, but keep the " " that are already included.
# RUN THIS IN census_api_key.R
census_api_key("<paste key here>")
# RUN THIS IN example.R
source(here::here("census_api_key.R"))

Extended

You can also install your API key with census_api_key("your-key-string", install = TRUE). To then obtain your keystring, you can use library(dotenv) and Sys.getenv(<key name in .env file>).

  • Read more about API’s and API keys here.
  • Read more about the Census Data API here.


Why map with R?

Brief

R can have a steeper learning curve than point-and-click tools, such as QGIS or ArcGIS, for geospatial analysis and mapping. However, creating maps in R has many advantages including reproducibility, iteration, easy updates, an expansive ecosystem, and no cost



Extended

  1. Reproducibility: By creating maps with R code, you can easily share the outputs and the code that generated the output with collaborators, allowing them to replicate your work and catch errors easily.

  2. Iteration: With point and click software like ArcGIS, making 50 maps would be 50 times the work/time. But using R, we can easily make make many iterations of the same map with a few changes to the code.

  3. Easy Updates: Writing code provides a roadmap for others (and future you!) to quickly update parts of the map as needed. Say for example a collaborator wanted to change the legend colors of 50 state maps. With R, this is possible in just a few seconds!

  4. An Expansive ecosystem: There are several R packages that make it very easy to get spatial data, create static and interactive maps, and perform spatial analyses. This feature rich package ecosystem which all play nice together is frankly unmatched by other programming languages and even point and click tools like QGIS and ArcGIS. Some of these R packages include:

  5. Cost: Most point-and-click tools for geospatial analysis are proprietary and expensive. R is free open-source software. The software and most of its packages can be used for free by anyone for almost any use case.



Step 2: Geospatial Data

After this section, you will have a general understanding of basic geospatial data principles and how geospatial data differs from other types of data you may have worked with.

Geographic Entities

Brief

  • Census tracts are small, relatively permanent statistical subdivisions of a county that average about 4,000 inhabitants. The minimum census tract population is 1,200 inhabitants and the maximum population is about 8,000 inhabitants. Census tracts are uniquely numbered in each county with a numeric code. There were 84,414 census tracts in the United States (50 states and DC) in 2020.

  • Public Use Microdata Areas (PUMAs) partition each state into geographic areas containing no fewer than 100,000 people each and must maintain that population throughout the decade. PUMAs nest within states and may not cross state boundaries, but may cross county boundaries. For the 2010 census, there were 2,378 PUMAs in the United States (including islands and territories)

  • Zip Code Tabulation Areas (ZCTAs) are generalized areal representations of the United States Postal Service Zip Code service areas. ZCTAs may cross county boundaries. There were 33,642 ZCTAs in the United States (50 states and DC) in 2020.

Extended

No extended version.


Importing Spatial Data

Brief

We load cartographic boundary files using two R packages called library(tidycensus) and library(tigris).

library(tidycensus) sometimes requires the use of a Census API Key. See the APIs and Census API Key of Step 1 for information on how to obtain and install your API Key.

Below is an example of using the get_acs() function from library(tidycensus) to get median household income from the Census in Fairfax County, VA, along with cartographic boundary files.

# Retrieve data from tidycensus
tc_tracts <- get_acs(
  geography = "tract",
  variables = c(medincome = "B19013_001"),
  state = 51,
  county = 059,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
)
  • <- is the “assignment operator” — an object created on the right side of an assignment operator is assigned to a name on the left side of the assignment operator. We use the get_acs() function to pull our data, and assign that data to an object called tc_tracts.

  • "B19013_001" is the Census variable ID for median household income. Variable IDs are required to fetch data from the Census APIs. There are thousands of variables available across the different Census datasets. The tidycensus() package offers the load_variables() function for obtaining a dataset of these variables to search through. load_variables() works as follows:

v19 <- load_variables(2019, "acs1", cache = TRUE)
  • After using tidycensus() enough, you may begin to remember that "B19013_001" is the variable ID for median household income and no longer need to search for it with the above code. See here and here for more information on searching for variables in tidycensus().

Extended

library(tigris) downloads and provides TIGER/Line shapefiles from the US Census Bureau. TIGER stands for Topologically Integrated Geographic Encoding and Referencing.

library(tigris) includes all standard census geographies including but not limited to states, counties, census tracts, PUMAs, ZCTAs, school districts, and congressional districts.

By default, library(tigris) will download very large high-resolution TIGER line boundary files. Their larger size slows down processing time, and because they follow legal boundaries, some objects like bodies of water will not be visible. Cartographic boundary files are quicker to download and follow the US coastline, which better aligns with maps that we’re used to looking at. The argument cb = TRUE pulls the cartographic boundary files, but setting the argument to cb = FALSE will pull the TIGER line files.

library(tidycensus) was created by the creator of library(tigris) but has two major differences: first, library(tidycensus) allows you to import Census data, including demographic variables, simultaneously with the geographic data. simply include geometry = TRUE in the get_acs() function to pull the shapes data as sf; second, it only provides cartographic boundary files that are smaller, quicker to load, more familiar than TIGER/Line shapefiles as explained above.

Both library(tigris) and library(tidycensus) have a year parameter that determines the year of the data obtained in functions like tidycensus::get_acs() or tigris::counties(). This parameter currently defaults to 2019 for get_acs() and 2020 for tigris functions. tidycensus::get_acs() also notably defaults to pulling 5-year ACS data rather than single year files. We recommend reading the documentation for these functions to understand the parameter options and their default values.

  • Geospatial data come in various formats. Different formats require different approaches to load into R:
    • Shapefiles (.shp) - proprietary file format created by ESRI, the company that creates ArcGIS, and that is composed of three or more binary files. The st_read() function reads shapefiles into R in the sf format. Point the function at a file ending in .shp to read the data.
    • GeoJSON (.geojson) - open source file type for storing geospatial data in plain text. The st_read() function also reads GeoJSON data. Point the function at a file ending in .geojson to read the data.
    • CSV (.csv) - file type common for storing point data that has longitude and latitude columns. To load point data from a .csv file into R, first read the file using the read_csv function. Then, use the st_as_sf() function to convert it into an sf object by specifying the columns with longitude and latitude using the coords argument: e.g., `st_as_sf(data, coords = c(“lon”, “lat”))


FIPS codes

Brief

Federal Information Processing Series (FIPS) are numeric codes that identify US states and counties. US states are identified by a 2-digit number and US counties are identified by a 3-digit number. For example, the FIPS code for Virginia is 51 and the FIPS code for Fairfax County is 059. The FIPS code for Fairfax County, VA is thus 059. FIPS codes are used to identify states and counties of interest in packages like library(tidycensus) and library(tigris).

Extended

No extended version.


Spatial Joins and Appending Spatial Info to Your Data

Brief

  • Sometimes you will need to join and append spatial data. For instance, PUMAs and ZCTAs cross county lines, so you cannot pull them at the county level from library(tidycensus). If you only want to map a specific county though, you can join your ZCTA or PUMA level data to a county, keeping only those ZCTAs or PUMAs within that county.
# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  select(COUNTYFP, GEOID, geometry) %>%
  filter(COUNTYFP == "059")

# Now we can join the ZCTAs to counties, and then filter to the county we're interested in
my_data <- sf::st_join(tc_zcta, fairfax_county, join = st_intersects) %>% 
  filter(COUNTYFP == "059")
  • %>% is the “pipe operator” and comes from library(tidyverse). It passes the output of one function as the input of another function.

  • select() keeps only the specified columns.

  • filter() keeps only the rows that satisfy the condition inside the parentheses.

Extended

No extended version.


Coordinate Reference Systems

Brief

  • All spatial data has a coordinate reference system (CRS), which specifies how to identify a location on Earth.

  • It is important for all spatial datasets that you’re working with to be in the same CRS so that all maps and spatial operations are accurate. We recommend using crsuggest::suggest_crs() to find an appropriate CRS for your dataset and coord_sf(crs = ) to change the CRS.

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)

Extended

Creating a map requires transforming the earth from its spherical shape (3D) to a planar shape (2D). A coordinate reference system (CRS) defines how the two-dimensional projected map relates to real places on the three-dimensional earth. When using multiple different geospatial datasets for mapping, they should have the same CRS prior to mapping.

We recommend using the State Plane Coordinate System (SPCS), which is only used in the United States, for mapping at the county-level.

  • You can also use the st_crs() function to find the CRS. The CRS code is located at the end in ID[authority, SRID]. You can change the CRS with st_transform().

  • Read more about coordinate reference systems here.

  • Read more about the State Plane Coordinate System here.

  • Read more about the crsuggest package here.


Creating maps using ggplot2

Brief

The package library(ggplot2) (also contained in library(tidyverse), so when you load library(tidyverse) you are also loading library(ggplot2)) creates graphics including maps.

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by ZCTA",
       fill = "Median Household Income") +
  theme_void()
  • The code above is explained in “Step 3: Examples”
# Save plot as a .png file
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
  • ggsave() is a function from library(ggplot2) for saving a plot. It defaults to saving the last plot that was displayed using the size of the current graphics device. The defaults can be changed with arguments such as plot =, width =, and height =.

Extended

No extended version.


Types of Geospatial Data

Brief

Geospatial data may consist of points (a single longitude/latitude), lines (a sequence of points), or polygons (a sequence of lines that outline a shape).



Extended

library(sf) stores geospatial data in a geometry column within R dataframes.

The geom_sf function from the library_sf() package plots sf data and automatically references the geometry column which stores the geospatial data.

`

Step 3: Examples

After working through these examples, you will have an understanding of several methods for making maps at different geography levels.

Example 1 - Tract using tidycensus

In this example, we show you how to create a census tract-level map with an example of using data from tidycensus(). The tidycensus() package was designed to facilitate the process of acquiring and working with US Census Bureau population data in the R environment. We can use the power of R to quickly pull this data, along with attributed to help facilitate mapping and spatial analysis.

Brief

# Load packages
library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)

# Tell R where to find your census API key so that you can use tidycensus
source(here::here("census_api_key.R"))

# Retrieve census tract level data from tidycensus
tc_tracts <- get_acs(
  geography = "tract",
  variables = c(medincome = "B19013_001"),
  state = 51,
  county = 059,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
)

# Pull county shapefile to include county border in map
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  filter(COUNTYFP == "059")

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)

# Create map
ggplot() +
  geom_sf(data = tc_tracts,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by census tract",
       fill = "Median Household Income") +
  theme_void()

# Save plot
ggsave("tract_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")

Extended

# Load packages
library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)

# Tell R where to find your census API key so that you can use tidycensus
source(here::here("census_api_key.R"))

# Retrieve census tract level data from tidycensus
tc_tracts <- get_acs(
  geography = "tract",
  variables = c(medincome = "B19013_001"),
  state = 51,
  county = 059,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
)
  • <- is the “assignment operator” — an object created on the right side of an assignment operator is assigned to a name on the left side of the assignment operator. We use the get_acs() function to pull our data, and assign that data to an object called tc_tracts.

  • geography — specifies the geography of your data. See here for all geographies available.

  • variables — specifies which variable or variables you want to pull from the Census. medincome is the name we give the variable, and "B19013_001" is the variable ID that we’re pulling.

  • state and county — optional arguments for the state and/or county for which you are requesting data. State/county names and FIPS codes are both accepted, but we recommend FIPS codes to avoid spelling errors.

  • year — the year of the ACS sample (Census data) and defaults to 2019.

  • geometry — if TRUE, returns simple feature geometry in the geometry column. The simple feature geometry data is what allows us to create maps.

  • progress_bar — if FALSE, prevents the progress of the data pull from being displayed. This creates a cleaner looking output.

# Pull county shapefile to include county border in map
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  filter(COUNTYFP == "059")
  • cb — By default, library(tigris) will download very large high-resolution TIGER line boundary files. Their larger size slows down processing time, and because they follow legal boundaries, some objects like bodies of water will not be visible. Cartographic boundary files are quicker to download and follow the US coastline, which better aligns with maps that we’re used to looking at. The argument cb = TRUE pulls the cartographic boundary files, but setting the argument to cb = FALSE will pull the TIGER line files.

  • %>% is the “pipe operator” and comes from library(tidyverse). It passes the output of one function as the input of another function.

  • filter() — keeps only the rows that satisfy the condition inside the parentheses. We set the name of the county FIPS variable (COUNTYFP) equal to "059", the county FIPS code for Fairfax County, VA, to keep only that county from a data set that originally included all counties in Virginia.

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)
  • suggest_crs() - a function that takes a spatial dataset (tc_tracts in this example) and suggests suitable coordinate reference systems that could be used for CRS transformations.

  • limit — an argument to the above function that specified how many results to return. We set limit = 1 to only return the first and best suggested CRS.

# Create map
ggplot() +
  geom_sf(data = tc_tracts,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by census tract",
       fill = "Median Household Income") +
  theme_void()

  • geom_sf() — a function used to visualize simple feature (sf) objects and thus create maps.

    • data — specifies the data to be displayed.

    • mapping — set of aesthetic mappings created by the function aes(). Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms.

    • fill — tells aes() which variable to “fill” in the map geographies with. We want to map median household income, which is called estimate in our data, so we set fill = estimate.

    • color — specifies the color of the boundary lines on the map. We set the census tract lines equal to "white" and the county line for Fairfax County to "magenta".

    • size — specifies the size of the boundary lines on the map. We set the census tract lines equal to 0.1 and the county line for Fairfax County to 0.4 so that the county boundary displays larger than the census tract boundaries.

  • coord_sf() — a function that ensures that all layers use a common CRS. We set the crs argument equal to the crs_gcs column of the recommended_crs dataset, specified by recommended_crs$crs_gcs.

  • scale_fill_gradientn() — creates a gradient with “n” number of colors. We set the colors argument of this function equal to two colors using their hex code values c("#132B43", "#56B1F7"), so it creates a 2-color gradient with these colors. Hex codes are a 6-character (hexadecimal) format for identifying and displaying colors that are used in other programming languages and applications including HTML.The colors argument also accepts some color names such as “blue” or “red” but hex codes can be more precise and allow for a wider variety of colors to be used. See here for more information on hex color codes. To use a different set of colors, simply replace the hex codes in the vector.

    • labels — specifies the labels for the fill values used in the legend. We use the scales package to specify the unit as dollar with scales::dollar.
  • labs() — a function to create labels for the plot.

    • title — creates a title for the plot.

    • fill — creates a title for the legend.

  • theme_void() — Removes grid lines and gray background from the plot.

# Save plot
ggsave("tract_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
  • ggsave() — a function for saving a plot. It defaults to saving the last plot that was displayed using the size of the current graphics device. The defaults can be changed with arguments such as plot =, width =, and height =.


Example 2 - ZCTA using tidycensus

In this example, we show you how to create a ZCTA-level map with an example of pulling data from tidycensus. We use tigris to get shapefiles for a single county and join them to our tidycensus data.

Brief

# Retrieve data from tidycensus
tc_zcta <- get_acs(
  geography = "zcta",
  variables = c(medincome = "B19013_001"),
  state = 51,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
) %>%
  rename(zcta = GEOID)

# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  filter(COUNTYFP == "059")

# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(tc_zcta, fairfax_county, join = st_intersects) %>% 
  filter(COUNTYFP == "059")

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by ZCTA",
       fill = "Median Household Income") +
  theme_void()

# Save plot
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")

Extended

# Retrieve data from tidycensus
tc_zcta <- get_acs(
  geography = "zcta",
  variables = c(medincome = "B19013_001"),
  state = 51,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
) 

# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  filter(COUNTYFP == "059")

# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(x = tc_zcta, y = fairfax_county, join = st_intersects) %>% 
  filter(COUNTYFP == "059")
  • st_join() - a function for spatial joins, or joining spatial data.

    • x and y are both objects of class sf, and are the two data sets that we want to join.

    • join = st_intersects — identifies if x and y share any space and keeps the observations of x that share space with y. Here, it keeps the ZCTAs that are in or that border Fairfax County.

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by ZCTA",
       fill = "Median Household Income") +
  theme_void()

# Save plot
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")


Example 3 - PUMA using tidycensus

In this example, we show you how to create a PUMA-level map with an example of using data from tidycensus. We use tigris to get shapefiles for a single county and join them to our tidycensus data.

Brief

# Retrieve PUMA data
tc_puma <- get_acs(
  geography = "public use microdata area",
  variables = c(medincome = "B19013_001"),
  state = 51,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
)


# Pull county shapefiles to filter PUMA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>% 
  filter(COUNTYFP == "059")

# Now we can join the pumas to counties, and then filter to the county we're interested in
pumas_of_interest <- sf::st_join(fairfax_county, tc_puma, join = st_intersects) %>%
  pull(GEOID.y)

my_data <- tc_puma %>%
  filter(GEOID %in% pumas_of_interest)

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by PUMA",
       fill = "Median Household Income") +
  theme_void()

# Save plot
ggsave("puma_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")

Extended

# Retrieve PUMA data
tc_puma <- get_acs(
  geography = "public use microdata area",
  variables = c(medincome = "B19013_001"),
  state = 51,
  year = 2019,
  geometry = TRUE,
  progress_bar = FALSE
)

# Pull county shapefiles to filter PUMA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>% 
  filter(COUNTYFP == "059")

# Now we can join the pumas to counties, and then filter to the county we're interested in
pumas_of_interest <- sf::st_join(fairfax_county, tc_puma, join = st_intersects) %>%
  pull(GEOID.y)

my_data <- tc_puma %>%
  filter(GEOID %in% pumas_of_interest)

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = estimate),
          color = "white",
          size = 0.1) +
  geom_sf(data = fairfax_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Median household income in Fairfax County, VA by PUMA",
       fill = "Median Household Income") +
  theme_void()

# Save plot
ggsave("puma_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")


Example 4 - PUMA and loading in original data

In this example, we show you how to create a PUMA-level map with an example of loading in your own original data and joining shapefiles to that data using tigris.

Brief

# Load in original data
ss_inc <- read_csv(here::here("mean_ssinc_oh_2019.csv"))

# Pull the spatial data from tigris
pumas <- tigris::pumas(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  rename(puma = PUMACE10) %>%
  transmute(puma = as.numeric(puma))

# Join the puma shapefiles to our original puma-level data so we can map it
ss_inc_puma <- left_join(pumas, ss_inc, by = "puma")

# Pull county shapefiles to filter PUMA observations to a single county of interest
franklin_county <- tigris::counties(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>% 
  filter(COUNTYFP == "049")

# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(ss_inc_puma, franklin_county, join = st_intersects) %>% 
  filter(COUNTYFP == "049")

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = mean_ssinc),
          color = "white",
          size = 0.1) +
  geom_sf(data = franklin_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Mean Supplemental Security Income in Franklin County, OH, by PUMA",
       fill = "Mean Supplemental Security Income") +
  theme_void()

# Save plot
ggsave("puma_own_data.png", plot = last_plot(), width = 8, height = 8, units = "in")



Extended

# Load in original data
ss_inc <- read_csv(here::here("mean_ssinc_oh_2019.csv"))

# Pull the spatial data from tigris
pumas <- tigris::pumas(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
  rename(puma = PUMACE10) %>%
  transmute(puma = as.numeric(puma))
  • to join data, which we do in the next step, the variable that we want to join by must have the same name and be of the same type. The variable containing PUMA information from the data that we read in the first step is called puma and is of type numeric. After pulling PUMA data from library(tigris), we use rename and transmute to change its name and type so that we can join it to our original data.

  • transmute() — adds new variables and drops existing ones. as.numeric() coerces its argument to numeric type. We are creating a new version of the puma variable that is numeric.

# Join the puma shapefiles to our original puma-level data so we can map it
ss_inc_puma <- left_join(x = pumas, y = ss_inc, by = "puma")
  • left_join() returns all rows from the first argument x (in this case pumas_proj), and all columns from the first two arguments x and y(in this case, pumas_proj and ss_inc respectively).
# Pull county shapefiles to filter PUMA observations to a single county of interest
franklin_county <- tigris::counties(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>% 
  filter(COUNTYFP == "049")

# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(ss_inc_puma, franklin_county, join = st_intersects) %>% 
  filter(COUNTYFP == "049")

# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)

# Create map
ggplot() +
  geom_sf(data = my_data,
          mapping = aes(fill = mean_ssinc),
          color = "white",
          size = 0.1) +
  geom_sf(data = franklin_county,
          fill = NA,
          color = "magenta",
          size = 0.4) +
  coord_sf(crs = recommended_crs$crs_gcs) +
  scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
                      labels = scales::dollar) +
  labs(title = "Mean Supplemental Security Income in Franklin County, OH, by PUMA",
       fill = "Mean Supplemental Security Income") +
  theme_void()

# Save plot
ggsave("puma_own_data.png", plot = last_plot(), width = 8, height = 8, units = "in")



Bibliography and References

The Comprehensive R Archive Network. https://cran.r-project.org/

RStudio. https://www.rstudio.com/products/rstudio/download/

United States Census Bureau. https://api.census.gov/data/key_signup.html

Hadley Wickham and Garrett Grolemund (2017). R For Data Science https://r4ds.had.co.nz/

Aaron Williams and Kyle Ueyama. Urban Institute Intro to R https://urbaninstitute.github.io/r-at-urban/intro-to-r.html#Introduction

Hadley Wickham and Garrett Grolemund (2017). R For Data Science, Chapter 8 https://r4ds.had.co.nz/workflow-projects.html

RStudio Support. Using RStudio Projects https://support.rstudio.com/hc/en-us/articles/200526207-Using-RStudio-Projects

Martin Chan (2020). R Bloggers, RStudio Projects and Working Directories: A Beginner’s Guide https://www.r-bloggers.com/2020/01/rstudio-projects-and-working-directories-a-beginners-guide/

Amy Reichert. What is an API key? https://whatis.techtarget.com/definition/API-key

United States Census Bureau, Census Data API User Guide https://www.census.gov/data/developers/guidance/api-user-guide.Overview.html

Urban Institute Mapping Guide https://urbaninstitute.github.io/r-at-urban/mapping.html#Helpful_Learning_Resources

Kyle Walker. Analyzing US Census Data: Methods, Maps, and Models in R, Chapter 2 https://walker-data.com/census-r/an-introduction-to-tidycensus.html#searching-for-variables-in-tidycensus

Kyle Walker and Matt Herman. Basic usage of tidycensus https://walker-data.com/tidycensus/articles/basic-usage.html#searching-for-variables

Tim Sutton and Otto Dassau. A Basic Introduction to GIS, Chapter 8 https://docs.qgis.org/3.16/en/docs/gentle_gis_introduction/coordinate_reference_systems.html

United States Geological Survey. What is the State Plane Coordinate System? Can GPS provide coordinates in these values? https://www.usgs.gov/faqs/what-state-plane-coordinate-system-can-gps-provide-coordinates-these-values#:~:text=The%20State%20Plane%20Coordinate%20System%20(SPCS)%2C%20which%20is%20only,the%20state’s%20size%20and%20shape.

Kyle Walker (2021), crsuggest, GitHub repository, https://github.com/walkerke/crsuggest

Dixon & Moe (2021) https://htmlcolorcodes.com/

Version: 2022-04-29 17:00:02