Using tidycensus and leaflet to map Census data

By Julia Silge

June 24, 2017

Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling…

But seriously, I have worked with US Census data a lot in the past and this package

  • is such a valuable addition to the R ecosystem and
  • would have saved me SO MUCH ENERGY, HEADACHE, and TIME.

I was working this weekend on a side project with an old friend about opioid usage in Texas and needed to download some Census data again. A perfect opportunity to give this new package a little run-through!

Exercising my joygret

Before running code like the following from tidycensus, you need to obtain an API key from the Census and then use the function census_api_key() to set it in R.

library(tidyverse)
library(tidycensus)

texas_pop <- get_acs(geography = "county", 
                     variables = "B01003_001", 
                     state = "TX",
                     geometry = TRUE) 

texas_pop
## Simple feature collection with 254 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
## # A tibble: 254 x 6
##    GEOID                    NAME   variable estimate   moe               geometry
##    <chr>                   <chr>      <chr>    <dbl> <dbl> <S3: sfc_MULTIPOLYGON>
##  1 48007   Aransas County, Texas B01003_001    24292     0 <S3: sfc_MULTIPOLYGON>
##  2 48025       Bee County, Texas B01003_001    32659     0 <S3: sfc_MULTIPOLYGON>
##  3 48035    Bosque County, Texas B01003_001    17971     0 <S3: sfc_MULTIPOLYGON>
##  4 48067      Cass County, Texas B01003_001    30328     0 <S3: sfc_MULTIPOLYGON>
##  5 48083   Coleman County, Texas B01003_001     8536     0 <S3: sfc_MULTIPOLYGON>
##  6 48091     Comal County, Texas B01003_001   119632     0 <S3: sfc_MULTIPOLYGON>
##  7 48103     Crane County, Texas B01003_001     4730     0 <S3: sfc_MULTIPOLYGON>
##  8 48139     Ellis County, Texas B01003_001   157058     0 <S3: sfc_MULTIPOLYGON>
##  9 48151    Fisher County, Texas B01003_001     3858     0 <S3: sfc_MULTIPOLYGON>
## 10 48167 Galveston County, Texas B01003_001   308163     0 <S3: sfc_MULTIPOLYGON>
## # ... with 244 more rows

There we go! The total population in each county in Texas, in a tidyverse-ready data frame. If you want to get information for multiple states, just use purrr. The US Census tabulates lots of important kinds of information here in the United States, although there has been troubling uncertainty about leadership and funding there in recent months.

So we have this data in a form that will be easy to manipulate; what if we want to map it? Kyle Walker again has this taken care of, with his tigris package (a dependency of tidycensus); if you set geometry = TRUE the way that I did when I downloaded the Census data above, tigris handles downloading the shapefiles from the Census, with support for sf simple features. Kyle has a vignette for mapping using ggplot2, but you can also pipe straight into leaflet.

library(leaflet)
library(stringr)
library(sf)

pal <- colorQuantile(palette = "viridis", domain = texas_pop$estimate, n = 10)

texas_pop %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(estimate)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ estimate,
              title = "Population percentiles",
              opacity = 1)