Learn tidytext with my new learnr course

By Julia Silge in rstats

February 2, 2021

Today I am happy to announce that a new free, online, open source, interactive tutorial, Text Mining with Tidy Data Principles, has been published! 🎉

I previously developed an interactive course on text mining for an online learning company, but that course is no longer available. I’ve been wanting to revisit the ideas behind that course, update them, and make a new tutorial freely available for a long time, much like I did for my supervised machine learning course; I recently sat down and got a chance to do just that.

Learn tidytext!

Text data sets are diverse and ubiquitous, and tidy data principles provide an approach to make text mining easier, more effective, and consistent with tools already in wide use. In this tutorial, you will develop your text mining skills using the tidytext package in R, along with other tidyverse tools. You will apply these skills in four case studies, which will allow you to:

practice important data handling skills,
learn about the ways text analysis can be applied, and
extract relevant insights from real-world data.

This tutorial is organized into four case studies, each with its own data set:

transcripts of TED talks
a collection of comedies and tragedies by Shakespeare
one month of newspaper headlines (HEADS UP, the particular month is November 2020 😳)
song lyrics spanning five decades

These case studies demonstrate how you can use text analysis techniques with diverse kinds of text. Much of the data in this tutorial is new, the code is refreshed (a lot of it entirely new), and overall I am really proud of how it has turned out.

Learning about learnr

For the last interactive course I built, I used the amazing framework created by Ines Montani based on Binder. I reflected a bit last year about the process of building that course and using that framework. This time around, I used learnr instead to put together this tutorial.

I think that learnr let me develop the tutorial material much faster than other options I’ve used because I could write in R Markdown, only needing to add a tiny bit more to what I already use in so much of my daily work. Putting the tutorial together felt fluent and natural. You may have seen Allison Horst’s amazing tutorial on missing data that uses learnr, and you can tell from seeing my tutorial that I learned a lot from seeing her work. I’d like to also thank my coworker Alison Hill for her helpful feedback on the tutorial. 🙌

The deployment and publishing considerations also need to be taken into account. I published my text mining tutorial to shinyapps.io which is a paid service. (I’m an RStudio employee so my situation is not the same as folks outside of RStudio.) I can certainly say that it is very easy to publish learnr tutorials to this platform, arguably much easier than learning a little JavaScript, a little Docker, a little Binder, etc! There were a couple of resources I found helpful in checking whether I was going to use a ridiculous amount of computing resources in publishing this tutorial:

Try it out

If you would like to learn more about text analysis with tidy data principles, go ahead and give it a whirl! 📄 Contributions and comments on how to improve this tutorial are welcome. Please file an issue or submit a pull request if you find something that could be fixed or improved.

Posted on:: February 2, 2021

Length:: 3 minute read, 592 words

Categories:: rstats

Tags:: rstats

See Also:: Explore #TidyTuesday literary prizes with Positron’s Data Explorer; Release an R package with Positron; Positron in action with #TidyTuesday orca encounters