Using tidycensus and leaflet to map Census data
By Julia Silge
June 24, 2017
Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling…
There should be a word for “the regret felt when an R 📦, which would have saved untold hours of your life, is released”… #rstats 🤔 https://t.co/2THN4MwedO
— Mara Averick (@dataandme) May 31, 2017
But seriously, I have worked with US Census data a lot in the past and this package
- is such a valuable addition to the R ecosystem and
- would have saved me SO MUCH ENERGY, HEADACHE, and TIME.
I was working this weekend on a side project with an old friend about opioid usage in Texas and needed to download some Census data again. A perfect opportunity to give this new package a little run-through!
Exercising my joygret
Before running code like the following from tidycensus, you need to obtain an API key from the Census and then use the function census_api_key()
to set it in R.
library(tidyverse)
library(tidycensus)
texas_pop <- get_acs(geography = "county",
variables = "B01003_001",
state = "TX",
geometry = TRUE)
texas_pop
## Simple feature collection with 254 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
## # A tibble: 254 x 6
## GEOID NAME variable estimate moe geometry
## <chr> <chr> <chr> <dbl> <dbl> <S3: sfc_MULTIPOLYGON>
## 1 48007 Aransas County, Texas B01003_001 24292 0 <S3: sfc_MULTIPOLYGON>
## 2 48025 Bee County, Texas B01003_001 32659 0 <S3: sfc_MULTIPOLYGON>
## 3 48035 Bosque County, Texas B01003_001 17971 0 <S3: sfc_MULTIPOLYGON>
## 4 48067 Cass County, Texas B01003_001 30328 0 <S3: sfc_MULTIPOLYGON>
## 5 48083 Coleman County, Texas B01003_001 8536 0 <S3: sfc_MULTIPOLYGON>
## 6 48091 Comal County, Texas B01003_001 119632 0 <S3: sfc_MULTIPOLYGON>
## 7 48103 Crane County, Texas B01003_001 4730 0 <S3: sfc_MULTIPOLYGON>
## 8 48139 Ellis County, Texas B01003_001 157058 0 <S3: sfc_MULTIPOLYGON>
## 9 48151 Fisher County, Texas B01003_001 3858 0 <S3: sfc_MULTIPOLYGON>
## 10 48167 Galveston County, Texas B01003_001 308163 0 <S3: sfc_MULTIPOLYGON>
## # ... with 244 more rows
There we go! The total population in each county in Texas, in a tidyverse-ready data frame. If you want to get information for multiple states, just use purrr. The US Census tabulates lots of important kinds of information here in the United States, although there has been troubling uncertainty about leadership and funding there in recent months.
So we have this data in a form that will be easy to manipulate; what if we want to map it? Kyle Walker again has this taken care of, with his tigris package (a dependency of tidycensus); if you set geometry = TRUE
the way that I did when I downloaded the Census data above, tigris handles downloading the shapefiles from the Census, with support for sf
simple features. Kyle has a vignette for mapping using ggplot2, but you can also pipe straight into leaflet.
library(leaflet)
library(stringr)
library(sf)
pal <- colorQuantile(palette = "viridis", domain = texas_pop$estimate, n = 10)
texas_pop %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(estimate)) %>%
addLegend("bottomright",
pal = pal,
values = ~ estimate,
title = "Population percentiles",
opacity = 1)
What is that st_transform
doing? Well, I am no cartographer and I am still fuzzy on these issues, but it is doing a projection onto a certain reference system of the spatial information contained in the sf
column. The specific choice of an EPSG code of 4326 is for a given projection.
A couple more examples
Let’s look at the counties in Utah (where I live) while we’re at it. Let’s map color to population here, instead of quantiles, just for something different.
utah_pop <- get_acs(geography = "county",
variables = "B01003_001",
state = "UT",
geometry = TRUE)
pal <- colorNumeric(palette = "plasma",
domain = utah_pop$estimate)
utah_pop %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(estimate)) %>%
addLegend("bottomright",
pal = pal,
values = ~ estimate,
title = "County Populations",
opacity = 1)
Yep, that is right, although still remarkable to me. Utah is largely an extremely rural state, with lots of people here in Salt Lake City where I live and then in the corridor to the north and south.
There is so much other information available from the Census. For example, what if I want to look at the median home value in Salt Lake County, at the census tract level?
slc_value <- get_acs(geography = "tract",
variables = "B25077_001",
state = "UT",
county = "Salt Lake County",
geometry = TRUE)
pal <- colorNumeric(palette = "viridis",
domain = slc_value$estimate)
slc_value %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(estimate)) %>%
addLegend("bottomright",
pal = pal,
values = ~ estimate,
title = "Median Home Value",
labFormat = labelFormat(prefix = "$"),
opacity = 1)
The two census tracts with NA
values are the airport on the west side and the University of Utah on the east side. You can very obviously see the east-to-west gradient that comes as no surprise to us locals, and that priciest tract is up against one of the canyons with beautiful views. But mainly please notice with what ease I made this interactive map!
The End
Maybe the main reason I wrote up this blog post is to say how streamlined and easy it now is to get Census data into R and plot it, but another reason is to demonstrate, at least to myself, how little effort it takes to make a blog post with, say, interactive leaflet components with my new blogging workflow. I recently changed my blog from a Jekyll blog hosted on GitHub Pages to a blog built with blogdown and Hugo, deployed using Netlify. I am finding this workflow so great, and this post with its leaflet maps went off without a hitch! I would like to say a big THANK YOU to Yihui Xie for his work on R Markdown, knitr, and blogdown. Let me know if you have any questions!