I am heading home from my third year of attending rstudio::conf! If you weren’t there, watch for the videos to be released so you can check out the talks; I know I will do the same so I can see the talks I was forced to miss by scheduling constraints. I love this conference, and once again this year, the organizers have succeeded in building an impactful, valuable, inclusive conference. The welcoming values of the conference were made explicit from the very first moments of the first session.
First few things covered at #rstudioconf:— Angela Li 🐘 @email@example.com (@CivicAngela) January 17, 2019
-Code of Conduct 🙌
-How to find friendly @rstudio staff to help you out
-Pac-Man rule: stand in a way so that someone can join your conversation (a circle with a hole, not a closed circle!)
This was my first year not to speak at rstudio::conf, and I so enjoyed attending more talks, focusing on learning, and meeting so many people. The only “official” thing I did was faciliate a Birds of a Feather session focused on text mining and natural language processing. Thanks to everyone who came to chat there!
As I reflect more broadly on the conference, it’s interesting to see themes emerge that multiple folks addressed across the days. For example, one theme that I remember from last year in 2018 was how broadly useful and impactful simulations can be. This year, I feel like there are three main themes I connected with.
- Using R as a first-class programming language, often in production
- Best practices for workflows
- Professional competence beyond writing code
R is a real programming language
On Thursday, we heard from RStudio president Tareef Kawaf about code, reproducibility, and RStudio’s business model. He also talked about the two things that have made the most noticeable improvements in the quality of my own work life recently (both of which have been RStudio investments/efforts):
- maturing the R/database interfaces
- API capabilities for R
These are examples of this first theme, treating R like a first-class programming language and thinking about how it can be used in production. CTO Joe Cheng gave a great talk specifically about Shiny in production.
Jacqueline Nolis and Heather Nolis gave another compelling and useful talk about R in production, telling their story about putting a deep learning model trained in R actually into production for T-Mobile using R.
Some other standout talks for me around this theme were James Blair talking about Plumber APIs and Jim Hester’s talk about package dependencies. (Check out that new package Jim introduces in that talk for measuring how/how much your own package depends on others! 😮)
How do I do this again?!
Another theme I saw was thinking explicitly about processes and how we work as data scientists and analysts. Probably my favorite in this category was Kara Woo’s talk about her experience working as an intern on ggplot2 and how she solved a specific problem with how box plots rendered.
I also am looking forward to applying Amelia McNamara’s talk about moving from brittle, fragile code to robust, safe code for categorical data, as well as the whole afternoon session on machine learning and modeling with talks by Max Kuhn, Alex Hayes, and more.
A different aspect of this theme of process and workflow is how we can make work public and share knowledge. My coauthor Dave’s keynote about sharing work publicly made the case for increasing the impact of work we do by broadening its audience.
Code is only part of my job
Friday morning started off with a keynote by Felienne (who apparently is a one name person like Prince?!? after the talk I feel like it’s fair), and RStudio is so lucky to have brought her to their conference. Her main point was about how we teach programming to people, but the talk was excellent at a higher level than even that. I have watched some videos of her before so I was expecting this to be good, but it was better than I was even hoping.
@juliasilge) January 18, 2019
The third main theme I connected with from this conference was about how much of what we do as data scientists or analysts is not code, but teaching (as Felienne addressed), growing teams, and more. Hilary Parker gave a talk about using data more effectively by broadening how we think about our focus, beyond strictly technical strengths to include more collaborative and design skills.
Other excellent talks in this category that I saw were JD Long’s talk about spreadsheets and bullshit (that’s what his shirt said, and he should probably start selling them), along with empathy and broadening the tent. Others are Caitlin Hudon’s talk about different kinds of mistakes she’s made and Angela Bassa’s talk about building data science teams.
The final session of the conference was a panel discussion focusing on data in organizations. I have a tweet thread where I jotted down perspectives the participants shared, from thoughts on junior data scientists, management vs. individual contributor work, team values, and data ethics.
Final #Rstudioconf panel making points we haven’t talked about enough yet: slow down, invest in leadership as skill development, and HIRE DIVERSE TEAMS— Brooke Watson (@brookLYNevery1) January 18, 2019
Feat. @AngeBassa, @hspter, @_inundata, & @tracykteal, moderated by Eduardo Ariño de la Rubia pic.twitter.com/cxCh8gSo7u
On Saturday, I participated in the first ever Tidyverse Developer Day. The goal of this event was to nurture regular contributors to tidyverse packages, especially from people who have not done so before. I maintain an R package that is tidyverse adjacent myself and have managed issues and PRs as a maintainer, but I had never written code for a core tidyverse package before or submitted a PR to a package with that kind of enormous user base. This developer day was a really valuable experience. First off, it was so, so fun to sit in a room with engaged, delightful people excited about what they are working on, chatting together while looking at issues, thinking about what improvements could help users like us. Second, I saw multiple people around me submit their first ever PRs, which is no small feat. I am (mostly) competent in package development and git so got to chat through some problems people had, but naturally the real experts were around as well.
I submitted three PRs during the day. Two of them were extremely tiny; the one I am happiest about is improving the error message in tidyr when you use
spread() and end up with duplicate identifiers. That’s something that I have experienced lots of times in my real world data life and lost time to puzzling over; the new error message is more clear, aligns with the tidyverse error message style guide, and gives guidance on what to try next.
I so enjoyed rstudio::conf this year. One thing I noticed this year is that both the speakers and the attendees exhibited some of the best representation by women I have ever experienced in a technical community. I hope to continue to see improving representation from other groups who are often under-indexed in data science and tech. I’ll look forward to putting what I learned into practice in both my day job and open source work, and hopefully to be back next year!
- Posted on:
- January 20, 2019
- 6 minute read, 1276 words