Something Strange in the Neighborhood
By Julia Silge
August 5, 2016
Today I was so pleased to see a new data package hit CRAN, and how wonderful to see such accomplished women writing R packages.
What a great new data package on CRAN! And always great to see more women authors in #rstats https://t.co/nROMibqPxX pic.twitter.com/UEayWgx9bz
— Julia Silge (@juliasilge) August 5, 2016
The ghostr package includes a dataset of over 800 ghost sightings in Kentucky, with information on city, latitude, and longitude, along with URLs for finding more information about the ghost sightings.
library(ghostr)
library(acs)
library(dplyr)
library(reshape2)
library(stringr)
library(readr)
data(ghost_sightings)
names(ghost_sightings)
## [1] "url" "city" "state" "sightings" "lat" "lon"
ghost_sightings %>% summarise(total = sum(sightings))
## total
## 1 846
Getting Started with Leaflet
I’ve been wanting to get familiar with Leaflet, the popular library for interactive maps, and this seems like a perfect opportunity.
How are ghost sightings distributed across Kentucky?
library(leaflet)
m <- leaflet(ghost_sightings, width = "100%") %>%
addProviderTiles("CartoDB.Positron") %>%
addCircles(lng = ~lon, lat = ~lat, weight = 2.5,
radius = ~sqrt(sightings) * 4e3, popup = ~city,
color = "limegreen")
m
I’ve used a nice slimy green color here for the sightings, and the area of each circle is proportional to the number of sightings there.
Ain’t Afraid of No Ghost
That is very nice, but perhaps we would like to compare this to the populations in Kentucky cities and towns. Let’s find the population in towns and cities in Kentucky from the U.S. Census, using ACS table B01003. (If you haven’t used the acs package before, you will need to get an API key and run api.key.install()
one time to install your key on your system.) I’ll use msa
in the call to the ACS tables, which gets metropolitan/micropolitan statistical areas; this is about the best match to cities and towns you can get in the Census.
kentucky <- geo.make(state = "KY", msa = "*")
popfetch <- acs.fetch(geography = kentucky,
endyear = 2014,
span = 5,
table.number = "B01003",
col.names = "pretty")
popDF <- melt(estimate(popfetch)) %>%
mutate(city = str_extract(str_sub(as.character(Var1), 1, -11), ".+?(?= \\(part)|.+"),
population = value) %>%
select(city, population)
popDF
## city population
## 1 Bardstown, KY Micro Area 44254
## 2 Bowling Green, KY Metro Area 162322
## 3 Campbellsville, KY Micro Area 25059
## 4 Cincinnati, OH-KY-IN Metro Area 432535
## 5 Clarksville, TN-KY Metro Area 88736
## 6 Danville, KY Micro Area 53696
## 7 Elizabethtown-Fort Knox, KY Metro Area 150917
## 8 Evansville, IN-KY Metro Area 46394
## 9 Frankfort, KY Micro Area 71173
## 10 Glasgow, KY Micro Area 52716
## 11 Huntington-Ashland, WV-KY-OH Metro Area 85898
## 12 Lexington-Fayette, KY Metro Area 483997
## 13 London, KY Micro Area 126949
## 14 Louisville/Jefferson County, KY-IN Metro Area 974532
## 15 Madisonville, KY Micro Area 46684
## 16 Mayfield, KY Micro Area 37451
## 17 Maysville, KY Micro Area 17398
## 18 Middlesborough, KY Micro Area 28234
## 19 Mount Sterling, KY Micro Area 45190
## 20 Murray, KY Micro Area 37981
## 21 Owensboro, KY Metro Area 115795
## 22 Paducah, KY-IL Micro Area 83262
## 23 Richmond-Berea, KY Micro Area 102450
## 24 Somerset, KY Micro Area 63505
## 25 Union City, TN-KY Micro Area 6550
You can see here that this is fewer cities and towns than we had for the ghost sightings; there are ghost sightings records in some very small towns. Also, the acs package is great but working with it always involves a) lots of regex and b) lots of tidying. Anyway, now we need the latitude and longitude for these metropolitan and micropolitan areas; these are available from the Census.
gazetteer <- read_tsv("./2015_Gaz_cbsa_national.txt")
popDF <- left_join(popDF, gazetteer, by = c("city" = "NAME"))
Now let’s make a Leaflet map for the population of these areas in Kentucky.
m <- leaflet(popDF, width = "100%") %>%
addProviderTiles("CartoDB.Positron") %>%
addCircles(lng = ~INTPTLONG, lat = ~INTPTLAT, weight = 1,
radius = ~sqrt(population) * 50, popup = ~city)
m
Actually, let’s bind these data frames together and map them at the same time to compare.
mapDF <- bind_rows(popDF %>%
mutate(lat = INTPTLAT, long = INTPTLONG,
weight = 1, radius = sqrt(population) * 50,
type = "Population") %>%
select(lat, long, city, weight, radius, type),
ghost_sightings %>%
mutate(lat = lat, long = lon, city = city,
weight = 2.5, radius = sqrt(sightings) * 4e3,
type = "Ghost Sighting") %>%
select(lat, long, city, weight, radius, type))
typepal <- colorFactor(c("limegreen", "blue"), mapDF$type)
m <- leaflet(mapDF, width = "100%") %>%
addProviderTiles("CartoDB.Positron") %>%
addCircles(lng = ~long, lat = ~lat, weight = ~weight,
radius = ~radius, popup = ~city, color = ~typepal(type)) %>%
addLegend(pal = typepal, values = ~type, title = NULL)
m
Pretty nice! It looks to me like there are more ghost sightings in areas of higher population, but basically there are ghosts everywhere in Kentucky. The eastern part of Kentucky seems particularly full of ghosts relative to people.
The End
I am glad to have figured out a few things about Leaflet; it is very nice to use. Thanks to Kyle Walker and Kent Russell who helped me figure out how to get the maps to display at the right width both on desktop and mobile! The R Markdown file used to make this blog post is available here. I am very happy to hear feedback or questions!