This guide is a brief introduction to mapping sub-county geographies with R. Counties can use this guide to understand the distribution of useful metrics at the census tract, zip code tabulation area (ZCTA), and public use microdata area (PUMA) levels. First we walk you through setting up the necessary tools, including R and RStudio, to create maps. Then, we explain how to work with geospatial data and how it differs from other types of data. Finally, we present examples of maps at the census tract, ZCTA, and PUMA levels.
After completing all of the Action items in this step, you will have all of the tools you need to begin making maps.
template.Rproj
to
get started.The command for setting working directories in R is
setwd()
, and the command for checking what the current
working directory is getwd()
. However, setwd()
relies on absolute file paths. An absolute file path is a string that
contains the root element and the complete directory list to locate a
file. This can cause problems with loading and saving your files if that
path ever changes, such as when reorganizing directories on your machine
or when collaborating with people using different machines. RStudio
projects make all file paths relative to the root directory, specified
by a file with the extension .Rproj.
Read more about RStudio Projects and .Rproj files below:
RStudio has four main panels: code editor (top left), R console (bottom left), environment and history (top right), and files, plots, packages, help, and viewer pane (bottom right).
There are two main ways to run R code:
Type code to the right of the >
(called the
“prompt”) in the R console and hit enter. Running code this way does not
create a long-term document of your code.
Type code into the R script in the code editor panel. Highlight
the desired code and click Run
in the top right of the code
editor panel, or type Ctrl/command+enter. Scripts can be saved, so they
are the best way to document your code for future use and
reference.
No extended version.
install.packages("package_name")
and then
loaded into R using library(package_name)
. The packages we
use in this guide are library(tidyverse)
,
library(here)
, library(tidycensus)
,
library(crsuggest)
, library(sf)
, and
library(tigris)
.
example.R
that start with
install.packages
and running those six lines of code.
#
. When executing
code, R will ignore anything that starts with #
. You can
add or remove #
s to multiple lines of code by selecting the
lines you want to comment using the cursor and using the key combination
command + shift + C
on Mac or
control + shift + C
on Windows.# install.packages("tidyverse")
# install.packages("here")
# install.packages("tidycensus")
# install.packages("crsuggest")
# install.packages("sf")
# install.packages("tigris")
library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)
More information about these packages is included in later sections
of this guide. * library(ggplot2)
- part of the
tidyverse
ecosystem, ggplot2
is a package for
creating data visualizations including maps.
library(sf)
- plots simple features (sf), which is a
standardized way to encode spatial vector data. This function
automatically references the geometry column and makes it simple to
combine point, line, and polygon data.
library(tigris)
- downloads and provides TIGER/Line
shapefiles from the US Census Bureau. TIGER stands for Topologically
Integrated Geographic Encoding and Referencing.
library(tidycensus)
- can return simple feature
geometry for geographic units along with variables from the decennial US
Census or American Community survey. By setting
geometry = TRUE
in a tidycensus
function call,
tidycensus
will use the tigris
package to
retrieve the corresponding geographic dataset from the US Census Bureau
and pre-merge it with the tabular data obtained from the Census
API.
library(crsuggest)
- attempts to match an input
spatial dataset with corresponding coordinate reference systems that
will work well for mapping and/or spatial analysis.
An application programming interface (API) is a connection between computers or between computer programs that allows two applications to talk to each other. An API allows applications to access data and interact with external software components.
An application programming interface (API) key is a unique code used to identify and authenticate an application or user. API keys are often used to track and control how the API is being used.
The Census Data API gives the public access to raw statistical
data from various Census Bureau data programs. It is an efficient way to
query data directly from Census Bureau servers. A Census API Key is
required to use tidycensus
.
<paste key here>
in each
census_api_key.R
located in the
tract_template
, zcta_template
, and
puma_template
folders (pasting three times total). Be sure
to delete the < >
, but keep the " "
that
are already included.# RUN THIS IN census_api_key.R
census_api_key("<paste key here>")
# RUN THIS IN example.R
source(here::here("census_api_key.R"))
R can have a steeper learning curve than point-and-click tools, such as QGIS or ArcGIS, for geospatial analysis and mapping. However, creating maps in R has many advantages including reproducibility, iteration, easy updates, an expansive ecosystem, and no cost
Reproducibility: By creating maps with R code, you can easily share the outputs and the code that generated the output with collaborators, allowing them to replicate your work and catch errors easily.
Iteration: With point and click software like ArcGIS, making 50 maps would be 50 times the work/time. But using R, we can easily make make many iterations of the same map with a few changes to the code.
Easy Updates: Writing code provides a roadmap for others (and future you!) to quickly update parts of the map as needed. Say for example a collaborator wanted to change the legend colors of 50 state maps. With R, this is possible in just a few seconds!
An Expansive ecosystem: There are several R packages that make it very easy to get spatial data, create static and interactive maps, and perform spatial analyses. This feature rich package ecosystem which all play nice together is frankly unmatched by other programming languages and even point and click tools like QGIS and ArcGIS. Some of these R packages include:
Cost: Most point-and-click tools for geospatial analysis are proprietary and expensive. R is free open-source software. The software and most of its packages can be used for free by anyone for almost any use case.
After this section, you will have a general understanding of basic geospatial data principles and how geospatial data differs from other types of data you may have worked with.
Census tracts are small, relatively permanent statistical subdivisions of a county that average about 4,000 inhabitants. The minimum census tract population is 1,200 inhabitants and the maximum population is about 8,000 inhabitants. Census tracts are uniquely numbered in each county with a numeric code. There were 84,414 census tracts in the United States (50 states and DC) in 2020.
Public Use Microdata Areas (PUMAs) partition each state into geographic areas containing no fewer than 100,000 people each and must maintain that population throughout the decade. PUMAs nest within states and may not cross state boundaries, but may cross county boundaries. For the 2010 census, there were 2,378 PUMAs in the United States (including islands and territories)
Zip Code Tabulation Areas (ZCTAs) are generalized areal representations of the United States Postal Service Zip Code service areas. ZCTAs may cross county boundaries. There were 33,642 ZCTAs in the United States (50 states and DC) in 2020.
No extended version.
We load cartographic boundary files using two R packages called
library(tidycensus)
and library(tigris)
.
library(tidycensus)
sometimes requires the use of a
Census API Key. See the APIs and Census API Key of
Step 1 for information on how to obtain and install
your API Key.
Below is an example of using the get_acs()
function from
library(tidycensus)
to get median household income from the
Census in Fairfax County, VA, along with cartographic boundary
files.
# Retrieve data from tidycensus
tc_tracts <- get_acs(
geography = "tract",
variables = c(medincome = "B19013_001"),
state = 51,
county = 059,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
<-
is the “assignment operator” — an object
created on the right side of an assignment operator is assigned to a
name on the left side of the assignment operator. We use the
get_acs()
function to pull our data, and assign that data
to an object called tc_tracts
.
"B19013_001"
is the Census variable ID for median
household income. Variable IDs are required to fetch data from the
Census APIs. There are thousands of variables available across the
different Census datasets. The tidycensus()
package offers
the load_variables()
function for obtaining a dataset of
these variables to search through. load_variables()
works
as follows:
v19 <- load_variables(2019, "acs1", cache = TRUE)
library(tigris)
downloads and provides TIGER/Line
shapefiles from the US Census Bureau. TIGER stands for Topologically
Integrated Geographic Encoding and Referencing.
library(tigris)
includes all standard census geographies
including but not limited to states, counties, census tracts, PUMAs,
ZCTAs, school districts, and congressional districts.
By default, library(tigris)
will download very large
high-resolution TIGER line boundary files. Their larger size slows down
processing time, and because they follow legal boundaries, some objects
like bodies of water will not be visible. Cartographic boundary files
are quicker to download and follow the US coastline, which better aligns
with maps that we’re used to looking at. The argument
cb = TRUE
pulls the cartographic boundary files, but
setting the argument to cb = FALSE
will pull the TIGER line
files.
library(tidycensus)
was created by the creator of
library(tigris)
but has two major differences: first,
library(tidycensus)
allows you to import Census data,
including demographic variables, simultaneously with the geographic
data. simply include geometry = TRUE
in the
get_acs()
function to pull the shapes data as
sf
; second, it only provides cartographic boundary files
that are smaller, quicker to load, more familiar than TIGER/Line
shapefiles as explained above.
Both library(tigris)
and
library(tidycensus)
have a year parameter that determines
the year of the data obtained in functions like
tidycensus::get_acs()
or tigris::counties()
.
This parameter currently defaults to 2019 for get_acs()
and
2020 for tigris
functions.
tidycensus::get_acs()
also notably defaults to pulling
5-year ACS data rather than single year files. We recommend reading the
documentation for these functions to understand the parameter options
and their default values.
st_read()
function reads shapefiles into
R in the sf
format. Point the function at a file ending in
.shp
to read the data.st_read()
function also reads
GeoJSON data. Point the function at a file ending in
.geojson
to read the data..csv
file into R, first read the file using the
read_csv
function. Then, use the st_as_sf()
function to convert it into an sf
object by specifying the
columns with longitude and latitude using the coords
argument: e.g., `st_as_sf(data, coords = c(“lon”, “lat”))Federal Information Processing Series (FIPS) are numeric codes that
identify US states and counties. US states are identified by a 2-digit
number and US counties are identified by a 3-digit number. For example,
the FIPS code for Virginia is 51 and the FIPS code for Fairfax County is
059. The FIPS code for Fairfax County, VA is thus 059. FIPS codes are
used to identify states and counties of interest in packages like
library(tidycensus)
and library(tigris)
.
No extended version.
library(tidycensus)
. If you only want
to map a specific county though, you can join your ZCTA or PUMA level
data to a county, keeping only those ZCTAs or PUMAs within that
county.# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
select(COUNTYFP, GEOID, geometry) %>%
filter(COUNTYFP == "059")
# Now we can join the ZCTAs to counties, and then filter to the county we're interested in
my_data <- sf::st_join(tc_zcta, fairfax_county, join = st_intersects) %>%
filter(COUNTYFP == "059")
%>%
is the “pipe operator” and comes from
library(tidyverse)
. It passes the output of one function as
the input of another function.
select()
keeps only the specified columns.
filter()
keeps only the rows that satisfy the
condition inside the parentheses.
No extended version.
All spatial data has a coordinate reference system (CRS), which specifies how to identify a location on Earth.
It is important for all spatial datasets that you’re working with
to be in the same CRS so that all maps and spatial operations are
accurate. We recommend using crsuggest::suggest_crs()
to
find an appropriate CRS for your dataset and
coord_sf(crs = )
to change the CRS.
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)
Creating a map requires transforming the earth from its spherical shape (3D) to a planar shape (2D). A coordinate reference system (CRS) defines how the two-dimensional projected map relates to real places on the three-dimensional earth. When using multiple different geospatial datasets for mapping, they should have the same CRS prior to mapping.
We recommend using the State Plane Coordinate System (SPCS), which is only used in the United States, for mapping at the county-level.
You can also use the st_crs()
function to find the
CRS. The CRS code is located at the end in
ID[authority, SRID]
. You can change the CRS with
st_transform()
.
Read more about coordinate reference systems here.
Read more about the State Plane Coordinate System here.
Read more about the crsuggest
package here.
The package library(ggplot2)
(also contained in
library(tidyverse)
, so when you load
library(tidyverse)
you are also loading
library(ggplot2)
) creates graphics including maps.
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by ZCTA",
fill = "Median Household Income") +
theme_void()
# Save plot as a .png file
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
ggsave()
is a function from
library(ggplot2)
for saving a plot. It defaults to saving
the last plot that was displayed using the size of the current graphics
device. The defaults can be changed with arguments such as
plot =
, width =
, and
height =
.No extended version.
Geospatial data may consist of points (a single longitude/latitude), lines (a sequence of points), or polygons (a sequence of lines that outline a shape).
library(sf)
stores geospatial data in a
geometry
column within R dataframes.
The geom_sf
function from the library_sf()
package plots sf
data and automatically references the
geometry
column which stores the geospatial data.
`
After working through these examples, you will have an understanding of several methods for making maps at different geography levels.
In this example, we show you how to create a census tract-level map
with an example of using data from tidycensus()
. The
tidycensus()
package was designed to facilitate the process
of acquiring and working with US Census Bureau population data in the R
environment. We can use the power of R to quickly pull this data, along
with attributed to help facilitate mapping and spatial analysis.
# Load packages
library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)
# Tell R where to find your census API key so that you can use tidycensus
source(here::here("census_api_key.R"))
# Retrieve census tract level data from tidycensus
tc_tracts <- get_acs(
geography = "tract",
variables = c(medincome = "B19013_001"),
state = 51,
county = 059,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
# Pull county shapefile to include county border in map
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)
# Create map
ggplot() +
geom_sf(data = tc_tracts,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by census tract",
fill = "Median Household Income") +
theme_void()
# Save plot
ggsave("tract_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
# Load packages
library(tidyverse)
library(tidycensus)
library(crsuggest)
library(sf)
library(tigris)
# Tell R where to find your census API key so that you can use tidycensus
source(here::here("census_api_key.R"))
# Retrieve census tract level data from tidycensus
tc_tracts <- get_acs(
geography = "tract",
variables = c(medincome = "B19013_001"),
state = 51,
county = 059,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
<-
is the “assignment operator” — an object
created on the right side of an assignment operator is assigned to a
name on the left side of the assignment operator. We use the
get_acs()
function to pull our data, and assign that data
to an object called tc_tracts
.
geography
— specifies the geography of your data.
See here
for all geographies available.
variables
— specifies which variable or variables
you want to pull from the Census. medincome
is the name we
give the variable, and "B19013_001"
is the variable ID that
we’re pulling.
state
and county
— optional arguments
for the state and/or county for which you are requesting data.
State/county names and FIPS codes are both accepted, but we recommend
FIPS codes to avoid spelling errors.
year
— the year of the ACS sample (Census data) and
defaults to 2019.
geometry
— if TRUE, returns simple feature geometry
in the geometry
column. The simple feature geometry data is
what allows us to create maps.
progress_bar
— if FALSE, prevents the progress of
the data pull from being displayed. This creates a cleaner looking
output.
# Pull county shapefile to include county border in map
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
cb
— By default, library(tigris)
will
download very large high-resolution TIGER line boundary files. Their
larger size slows down processing time, and because they follow legal
boundaries, some objects like bodies of water will not be visible.
Cartographic boundary files are quicker to download and follow the US
coastline, which better aligns with maps that we’re used to looking at.
The argument cb = TRUE
pulls the cartographic boundary
files, but setting the argument to cb = FALSE
will pull the
TIGER line files.
%>%
is the “pipe operator” and comes from
library(tidyverse)
. It passes the output of one function as
the input of another function.
filter()
— keeps only the rows that satisfy the
condition inside the parentheses. We set the name of the county FIPS
variable (COUNTYFP
) equal to "059"
, the county
FIPS code for Fairfax County, VA, to keep only that county from a data
set that originally included all counties in Virginia.
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(tc_tracts, limit = 1)
suggest_crs()
- a function that takes a spatial
dataset (tc_tracts
in this example) and suggests suitable
coordinate reference systems that could be used for CRS
transformations.
limit
— an argument to the above function that
specified how many results to return. We set limit = 1
to
only return the first and best suggested CRS.
# Create map
ggplot() +
geom_sf(data = tc_tracts,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by census tract",
fill = "Median Household Income") +
theme_void()
geom_sf()
— a function used to visualize simple
feature (sf) objects and thus create maps.
data
— specifies the data to be displayed.
mapping
— set of aesthetic mappings created by the
function aes()
. Aesthetic mappings describe how variables
in the data are mapped to visual properties (aesthetics) of
geoms.
fill
— tells aes()
which variable to
“fill” in the map geographies with. We want to map median household
income, which is called estimate
in our data, so we set
fill = estimate
.
color
— specifies the color of the boundary lines on
the map. We set the census tract lines equal to "white"
and
the county line for Fairfax County to "magenta"
.
size
— specifies the size of the boundary lines on
the map. We set the census tract lines equal to 0.1
and the
county line for Fairfax County to 0.4
so that the county
boundary displays larger than the census tract boundaries.
coord_sf()
— a function that ensures that all layers
use a common CRS. We set the crs
argument equal to the
crs_gcs
column of the recommended_crs
dataset,
specified by recommended_crs$crs_gcs
.
scale_fill_gradientn()
— creates a gradient with “n”
number of colors. We set the colors
argument of this
function equal to two colors using their hex code values
c("#132B43", "#56B1F7")
, so it creates a 2-color gradient
with these colors. Hex codes are a 6-character (hexadecimal) format for
identifying and displaying colors that are used in other programming
languages and applications including HTML.The colors
argument also accepts some color names such as “blue” or “red” but hex
codes can be more precise and allow for a wider variety of colors to be
used. See here for more
information on hex color codes. To use a different set of colors, simply
replace the hex codes in the vector.
labels
— specifies the labels for the fill
values used in the legend. We use the scales
package to
specify the unit as dollar
with
scales::dollar
.labs()
— a function to create labels for the
plot.
title
— creates a title for the plot.
fill
— creates a title for the legend.
theme_void()
— Removes grid lines and gray
background from the plot.
# Save plot
ggsave("tract_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
ggsave()
— a function for saving a plot. It defaults to
saving the last plot that was displayed using the size of the current
graphics device. The defaults can be changed with arguments such as
plot =
, width =
, and
height =
.In this example, we show you how to create a ZCTA-level map with an
example of pulling data from tidycensus
. We use
tigris
to get shapefiles for a single county and join them
to our tidycensus
data.
# Retrieve data from tidycensus
tc_zcta <- get_acs(
geography = "zcta",
variables = c(medincome = "B19013_001"),
state = 51,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
) %>%
rename(zcta = GEOID)
# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(tc_zcta, fairfax_county, join = st_intersects) %>%
filter(COUNTYFP == "059")
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by ZCTA",
fill = "Median Household Income") +
theme_void()
# Save plot
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
# Retrieve data from tidycensus
tc_zcta <- get_acs(
geography = "zcta",
variables = c(medincome = "B19013_001"),
state = 51,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
# Pull county shapefiles to filter ZCTA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(x = tc_zcta, y = fairfax_county, join = st_intersects) %>%
filter(COUNTYFP == "059")
st_join()
- a function for spatial joins, or joining
spatial data.
x
and y
are both objects of class
sf
, and are the two data sets that we want to
join.
join = st_intersects
— identifies if x
and y
share any space and keeps the observations of
x
that share space with y
. Here, it keeps the
ZCTAs that are in or that border Fairfax County.
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by ZCTA",
fill = "Median Household Income") +
theme_void()
# Save plot
ggsave("zcta_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
In this example, we show you how to create a PUMA-level map with an
example of using data from tidycensus
. We use
tigris
to get shapefiles for a single county and join them
to our tidycensus
data.
# Retrieve PUMA data
tc_puma <- get_acs(
geography = "public use microdata area",
variables = c(medincome = "B19013_001"),
state = 51,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
# Pull county shapefiles to filter PUMA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
# Now we can join the pumas to counties, and then filter to the county we're interested in
pumas_of_interest <- sf::st_join(fairfax_county, tc_puma, join = st_intersects) %>%
pull(GEOID.y)
my_data <- tc_puma %>%
filter(GEOID %in% pumas_of_interest)
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by PUMA",
fill = "Median Household Income") +
theme_void()
# Save plot
ggsave("puma_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
# Retrieve PUMA data
tc_puma <- get_acs(
geography = "public use microdata area",
variables = c(medincome = "B19013_001"),
state = 51,
year = 2019,
geometry = TRUE,
progress_bar = FALSE
)
# Pull county shapefiles to filter PUMA observations to a single county of interest
fairfax_county <- tigris::counties(state = 51, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "059")
# Now we can join the pumas to counties, and then filter to the county we're interested in
pumas_of_interest <- sf::st_join(fairfax_county, tc_puma, join = st_intersects) %>%
pull(GEOID.y)
my_data <- tc_puma %>%
filter(GEOID %in% pumas_of_interest)
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = estimate),
color = "white",
size = 0.1) +
geom_sf(data = fairfax_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Median household income in Fairfax County, VA by PUMA",
fill = "Median Household Income") +
theme_void()
# Save plot
ggsave("puma_tidycensus.png", plot = last_plot(), width = 8, height = 5, units = "in")
In this example, we show you how to create a PUMA-level map with an
example of loading in your own original data and joining shapefiles to
that data using tigris
.
# Load in original data
ss_inc <- read_csv(here::here("mean_ssinc_oh_2019.csv"))
# Pull the spatial data from tigris
pumas <- tigris::pumas(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
rename(puma = PUMACE10) %>%
transmute(puma = as.numeric(puma))
# Join the puma shapefiles to our original puma-level data so we can map it
ss_inc_puma <- left_join(pumas, ss_inc, by = "puma")
# Pull county shapefiles to filter PUMA observations to a single county of interest
franklin_county <- tigris::counties(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "049")
# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(ss_inc_puma, franklin_county, join = st_intersects) %>%
filter(COUNTYFP == "049")
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = mean_ssinc),
color = "white",
size = 0.1) +
geom_sf(data = franklin_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Mean Supplemental Security Income in Franklin County, OH, by PUMA",
fill = "Mean Supplemental Security Income") +
theme_void()
# Save plot
ggsave("puma_own_data.png", plot = last_plot(), width = 8, height = 8, units = "in")
# Load in original data
ss_inc <- read_csv(here::here("mean_ssinc_oh_2019.csv"))
# Pull the spatial data from tigris
pumas <- tigris::pumas(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
rename(puma = PUMACE10) %>%
transmute(puma = as.numeric(puma))
to join data, which we do in the next step, the variable that we
want to join by must have the same name and be of the same type. The
variable containing PUMA information from the data that we read in the
first step is called puma
and is of type numeric. After
pulling PUMA data from library(tigris)
, we use
rename
and transmute
to change its name and
type so that we can join it to our original data.
transmute()
— adds new variables and drops existing
ones. as.numeric()
coerces its argument to numeric type. We
are creating a new version of the puma
variable that is
numeric.
# Join the puma shapefiles to our original puma-level data so we can map it
ss_inc_puma <- left_join(x = pumas, y = ss_inc, by = "puma")
left_join()
returns all rows from the first argument
x
(in this case pumas_proj
), and all columns
from the first two arguments x
and y
(in this
case, pumas_proj
and ss_inc
respectively).# Pull county shapefiles to filter PUMA observations to a single county of interest
franklin_county <- tigris::counties(state = 39, year = 2019, cb = TRUE, progress_bar = FALSE) %>%
filter(COUNTYFP == "049")
# Now we can join the pumas to counties, and then filter to the county we're interested in
my_data <- sf::st_join(ss_inc_puma, franklin_county, join = st_intersects) %>%
filter(COUNTYFP == "049")
# Figure out best projection for data
recommended_crs <- crsuggest::suggest_crs(my_data, limit = 1)
# Create map
ggplot() +
geom_sf(data = my_data,
mapping = aes(fill = mean_ssinc),
color = "white",
size = 0.1) +
geom_sf(data = franklin_county,
fill = NA,
color = "magenta",
size = 0.4) +
coord_sf(crs = recommended_crs$crs_gcs) +
scale_fill_gradientn(colors = c("#CFE8F3","#A2D4EC","#73BFE2","#46ABDB","#1696D2","#12719E","#0A4C6A","#062635"),
labels = scales::dollar) +
labs(title = "Mean Supplemental Security Income in Franklin County, OH, by PUMA",
fill = "Mean Supplemental Security Income") +
theme_void()
# Save plot
ggsave("puma_own_data.png", plot = last_plot(), width = 8, height = 8, units = "in")
The Comprehensive R Archive Network. https://cran.r-project.org/
RStudio. https://www.rstudio.com/products/rstudio/download/
United States Census Bureau. https://api.census.gov/data/key_signup.html
Hadley Wickham and Garrett Grolemund (2017). R For Data Science https://r4ds.had.co.nz/
Aaron Williams and Kyle Ueyama. Urban Institute Intro to R https://urbaninstitute.github.io/r-at-urban/intro-to-r.html#Introduction
Hadley Wickham and Garrett Grolemund (2017). R For Data Science, Chapter 8 https://r4ds.had.co.nz/workflow-projects.html
RStudio Support. Using RStudio Projects https://support.rstudio.com/hc/en-us/articles/200526207-Using-RStudio-Projects
Martin Chan (2020). R Bloggers, RStudio Projects and Working Directories: A Beginner’s Guide https://www.r-bloggers.com/2020/01/rstudio-projects-and-working-directories-a-beginners-guide/
Amy Reichert. What is an API key? https://whatis.techtarget.com/definition/API-key
United States Census Bureau, Census Data API User Guide https://www.census.gov/data/developers/guidance/api-user-guide.Overview.html
Urban Institute Mapping Guide https://urbaninstitute.github.io/r-at-urban/mapping.html#Helpful_Learning_Resources
Kyle Walker. Analyzing US Census Data: Methods, Maps, and Models in R, Chapter 2 https://walker-data.com/census-r/an-introduction-to-tidycensus.html#searching-for-variables-in-tidycensus
Kyle Walker and Matt Herman. Basic usage of tidycensus https://walker-data.com/tidycensus/articles/basic-usage.html#searching-for-variables
Tim Sutton and Otto Dassau. A Basic Introduction to GIS, Chapter 8 https://docs.qgis.org/3.16/en/docs/gentle_gis_introduction/coordinate_reference_systems.html
United States Geological Survey. What is the State Plane Coordinate System? Can GPS provide coordinates in these values? https://www.usgs.gov/faqs/what-state-plane-coordinate-system-can-gps-provide-coordinates-these-values#:~:text=The%20State%20Plane%20Coordinate%20System%20(SPCS)%2C%20which%20is%20only,the%20state’s%20size%20and%20shape.
Kyle Walker (2021), crsuggest, GitHub repository, https://github.com/walkerke/crsuggest
Dixon & Moe (2021) https://htmlcolorcodes.com/
Version: 2022-04-29 17:00:02