Blog

Machine learning, text analysis, and more

The game is afoot! Topic modeling of Sherlock Holmes stories

In a recent release of tidytext, we added tidiers and support for building Structural Topic Models from the stm package. This is my current favorite implementation of topic modeling in R, so let’s walk through an example of how to get started with this kind of modeling, using The Adventures of Sherlock Holmes. via GIPHY You can watch along as I demonstrate how to start with the raw text of these short stories, prepare the data, and then implement topic modeling in this video tutorial!

January 25, 2018

tidytext 0.1.6

I am pleased to announce that tidytext 0.1.6 is now on CRAN! Most of this release, as well as the 0.1.5 release which I did not blog about, was for maintenance, updates to align with API changes from tidytext’s dependencies, and bugs. I just spent a good chunk of effort getting tidytext to pass R CMD check on older versions of R despite the fact that some of the packages in tidytext’s Suggests require recent versions of R.

January 10, 2018

One year as a data scientist at Stack Overflow

I recently passed my one-year anniversary of working at Stack Overflow as a data scientist. I have some very exciting news! I am joining the data team at @StackOverflow. ✨📊✨📊✨ — Julia Silge (@juliasilge) December 13, 2016 Coming to Stack Overflow has been an adventure for me. This is my first time to work at an actual tech company. I have been what I like to think of as “tech adjacent” my whole career, writing code and working on technical questions but never before working at a straight-up web company.

December 27, 2017

Tidy word vectors, take 2!

A few weeks ago, I wrote a post about finding word vectors using tidy data principles, based on an approach outlined by Chris Moody on the StitchFix tech blog. I’ve been pondering how to improve this approach, and whether it would be nice to wrap up some of these functions in a package, so here is an update! Like in my previous post, let’s download half a million posts from the Hacker News corpus using the bigrquery package.

November 27, 2017

New sports from random emoji

I love emoji ❤️ and I love xkcd, so this recent comic from Randall Munroe was quite a delight for me. I sat there, enjoying the thought of these new sports like horse hole and multiplayer avocado and I thought, “I can make more of these in just the barest handful of lines of code”. This is largely thanks to the emo package by Hadley Wickham, which if you haven’t installed and started using yet, WHY NOT?

November 25, 2017