Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling…
@dataandme) May 31, 2017
But seriously, I have worked with US Census data a lot in the past and this package
- is such a valuable addition to the R ecosystem and
- would have saved me SO MUCH ENERGY, HEADACHE, and TIME.
I was working this weekend on a side project with an old friend about opioid usage in Texas and needed to download some Census data again. A perfect opportunity to give this new package a little run-through!
Exercising my joygret
Before running code like the following from tidycensus, you need to obtain an API key from the Census and then use the function
census_api_key() to set it in R.
library(tidyverse) library(tidycensus) texas_pop <- get_acs(geography = "county", variables = "B01003_001", state = "TX", geometry = TRUE) texas_pop
## Simple feature collection with 254 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007 ## epsg (SRID): 4269 ## proj4string: +proj=longlat +datum=NAD83 +no_defs ## # A tibble: 254 x 6 ## GEOID NAME variable estimate moe geometry ## <chr> <chr> <chr> <dbl> <dbl> <S3: sfc_MULTIPOLYGON> ## 1 48007 Aransas County, Texas B01003_001 24292 0 <S3: sfc_MULTIPOLYGON> ## 2 48025 Bee County, Texas B01003_001 32659 0 <S3: sfc_MULTIPOLYGON> ## 3 48035 Bosque County, Texas B01003_001 17971 0 <S3: sfc_MULTIPOLYGON> ## 4 48067 Cass County, Texas B01003_001 30328 0 <S3: sfc_MULTIPOLYGON> ## 5 48083 Coleman County, Texas B01003_001 8536 0 <S3: sfc_MULTIPOLYGON> ## 6 48091 Comal County, Texas B01003_001 119632 0 <S3: sfc_MULTIPOLYGON> ## 7 48103 Crane County, Texas B01003_001 4730 0 <S3: sfc_MULTIPOLYGON> ## 8 48139 Ellis County, Texas B01003_001 157058 0 <S3: sfc_MULTIPOLYGON> ## 9 48151 Fisher County, Texas B01003_001 3858 0 <S3: sfc_MULTIPOLYGON> ## 10 48167 Galveston County, Texas B01003_001 308163 0 <S3: sfc_MULTIPOLYGON> ## # ... with 244 more rows
There we go! The total population in each county in Texas, in a tidyverse-ready data frame. If you want to get information for multiple states, just use purrr. The US Census tabulates lots of important kinds of information here in the United States, although there has been troubling uncertainty about leadership and funding there in recent months.
So we have this data in a form that will be easy to manipulate; what if we want to map it? Kyle Walker again has this taken care of, with his tigris package (a dependency of tidycensus); if you set
geometry = TRUE the way that I did when I downloaded the Census data above, tigris handles downloading the shapefiles from the Census, with support for
sf simple features. Kyle has a vignette for mapping using ggplot2, but you can also pipe straight into leaflet.
library(leaflet) library(stringr) library(sf) pal <- colorQuantile(palette = "viridis", domain = texas_pop$estimate, n = 10) texas_pop %>% st_transform(crs = "+init=epsg:4326") %>% leaflet(width = "100%") %>% addProviderTiles(provider = "CartoDB.Positron") %>% addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"), stroke = FALSE, smoothFactor = 0, fillOpacity = 0.7, color = ~ pal(estimate)) %>% addLegend("bottomright", pal = pal, values = ~ estimate, title = "Population percentiles", opacity = 1)