Educational attainment in #TidyTuesday UK towns
Let’s walk through the ML lifecycle from EDA to model development to deployment, using tidymodels, vetiver, and Posit Team.
Let’s walk through the ML lifecycle from EDA to model development to deployment, using tidymodels, vetiver, and Posit Team.
He’s here, he’s there, he’s every f*cking where, and we’re finding bootstrap confidence intervals.
Use workflowsets to evaluate multiple possible models to predict whether email is spam.
Learn about different kinds of metrics for evaluating classification models, and how to compute, compare, and visualize them.
Let’s use byte pair encoding tokenization along with Poisson regression to understand which tokens are more more often (or less often) in US place names.
How well can we predict the magnitude of tornadoes in the US? Let’s use xgboost along with effect encoding to fit our model.
Can we predict childcare costs in the US using an xgboost model? In this blog post, learn how to use early stopping for hyperparameter tuning.
Learn how to train and deploy a model with R and vetiver on AWS SageMaker infrastructure.
Artists who are women are underrepresented in art history textbooks, and we can use resampling to robustly understand more about this imbalance.
Will squirrels will come eat from your bird feeder? Let’s fit a model both with and without downsampling to find out.
Learn how to handle predictors with high cardinality using tidymodels for accreditation data on UK museums.
Learn how to use vetiver to set up different types of prediction endpoints for your deployed model.
After you train a model, you can use vetiver to prepare a Dockerfile and deploy your model in a flexible way.
Use summarization, a single linear model, and bootstrapping to understand what economic activities involve a larger pay gap for women.
The spatialsample package is gaining many new methods this summer, and we can use spatially aware resampling to understand how drought is related to other quantities across Texas.
Will a book be on the NYT bestseller list a long time, or a short time? We walk through how to use wordpiece tokenization for the author names, and how to deploy your model as a REST API.
Understand how much money colleges spend on sports using linear modeling and bootstrap intervals.
The tidymodels framework provides extension packages for specialized tasks such as Poisson regression. Learn how to fit a zero-inflated model for understanding how R package releases are related to number of vignettes.
The infer package is part of tidymodels and provides an expressive statistical grammar. Understand how to use infer, and celebrate Black History Month by learning more about the Tuskegee airmen.
Use custom feature engineering for board game categories, tune an xgboost model with racing methods, and use explainability methods for deeper understanding.
Get started with feature engineering for text data, transforming text to be used in machine learning algorithms.
Using a tidymodels workflow can make many modeling tasks more convenient, but sometimes you want more flexibility and control of how to handle your modeling objects. Learn how to handle resampled workflow results and extract the quantities you are interested in.
Use spatial resampling to more accurately estimate model performance for geographic data.
Get started with tidymodels workflowsets to handle and evaluate multiple preprocessing and modeling approaches simultaneously, using pumpkin competitions.
Tune and evaluate a multiclass model with lasso regulariztion for economics working papers.
Songs on the Billboard Top 100 have many audio features. We can use data preprocessing recipes to implement dimensionality reduction and understand how these features are related.
In this screencast, focus on some tidymodels basics such as how to put together feature engineering and a model algorithm, and how to fit and predict.
Learn how to evaluate multiple feature engineering and modeling approaches with workflowsets, predicting whether a person or the computer spoke a line on Star Trek.
More xgboost with tidymodels! Learn about feature engineering to incorporate text information as indicator variables for boosted trees.
Early stopping can keep an xgboost model from overfitting.
Models like xgboost have many tuning hyperparameters, but racing methods can help identify parameter combinations that are not performing well.
Which Scooby Doo monsters are REAL?! Walk through how to tune and then choose a decision tree model, as well as how to visualize and evaluate the results.
Predict prices for Airbnb listings in NYC with a data set from a recent episode of SLICED, with a focus on two specific aspects of this model analysis: creating a custom metric to evaluate the model and combining both tabular and unstructured text data in one model.
Handling class imbalance in modeling affects classification metrics in different ways. Learn how to use tidymodels to subsample for class imbalance, and how to estimate model performance using resampling.
Tune a decision tree model to predict whether a Mario Kart world record used a shortcut, and explore partial dependence profiles for the world record times.
Walk through a tidymodels analysis from beginning to end to predict whether water is available at a water source in Sierra Leone.
Are more CEO departures involuntary now than in the past? We can use tidymodels' bootstrap resampling and generalized linear models to understand change over time.
Use tidymodels to build features for modeling from Netflix description text, then fit and evaluate a support vector machine model.
Use tidymodels to predict post office location with subword features and a support vector machine model.
Explore country-level UN voting with a tidymodels approach to unsupervised machine learning.
Estimate how commercial characteristics like humor and patriotic themes change with time using tidymodels functions for bootstrap confidence intervals.
Use tidy data principles to understand which kinds of occupations are most similar in terms of demographic characteristics.
Explore results of models with convenient tidymodels functions.
Check residuals and other model diagnostics for regression models trained on text features, all with tidymodels functions.
Download up-to-date city data from Chicago’s open data portal and predict whether a traffic crash involved an injury with a bagged tree model.
Use tidymodels scaffolding functions for getting started quickly with commonly used models like random forests.
Use tidymodels to predict capacity for Canadian wind turbines with decision trees.
Which of the Datasaurus Dozen are easier or harder for a random forest model to identify? Learn how to use multiclass evaluation metrics to find out.
Tune a hyperparameter and then understand how to choose the best value afterward, using tidymodels for modeling the relationship between expected wins and tournament seed.
Use tidymodels for feature engineering steps like imputing missing data and subsampling for class imbalance, and build predictive models to predict the probability of survival for Himalayan climbers.
An initial version of the first eleven chapters are available today! Look for more chapters to be released in the near future.
Learn how to use tidyverse and tidymodels functions to fit and analyze many models at once.
Use text features and tidymodels to predict the speaker of individual lines from the show, and learn how to compute model-agnostic variable importance for any kind of model.
Build two kinds of classification models and evaluate them using resampling.
Learn how to use bootstrap aggregating to predict the duration of astronaut missions.
Explore data from the Claremont Run Project on Uncanny X-Men with bootstrap resampling.
Understand more about the forced transport of African people using the Slave Voyages database.
Use tidymodels for unsupervised dimensionality reduction.
Learn how to tune hyperparameters for an XGBoost classification model to predict wins and losses.
I am happy to announce that a new version of my free, online, interactive course has been published!
Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to evaluate complex models. Today’s screencast demonstrates how to implement multiclass or multinomial classification using with this week’s #TidyTuesday dataset on volcanoes. 🌋 Here is the code I used in the video, for those who prefer reading instead of or in addition to video. Explore the data Our modeling goal is to predict the type of volcano from this week’s #TidyTuesday dataset based on other volcano characteristics like latitude, longitude, tectonic setting, etc.
A lot has been happening in the tidymodels ecosystem lately! There are many possible projects we on the tidymodels team could focus on next; we are interested in gathering community feedback to inform our priorities. If you are interested in sharing your opinion on next steps in tidymodels development, please take this short survey. Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models.
This is an exciting week for us on the tidymodels team; we launched tidymodels.org, a new central location with resources and documentation for tidymodels packages. There is a TON to explore and learn there! 🚀 You can check out the official blog post for more details. Today, I’m publishing here on my blog another screencast demonstrating how to use tidymodels. This is a good video for folks getting started with tidymodels, using this week’s #TidyTuesday dataset on GDPR violations.
Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’s #TidyTuesday dataset on the best hip hop songs of all time as determinded by a BBC poll of music critics. Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m using this week’s #TidyTuesday dataset on beer production to show how to use bootstrap resampling to estimate model parameters. Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m using a #TidyTuesday dataset from earlier this year on trees around San Francisco to show how to tune the hyperparameters of a random forest model and then use the final best model. Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m using this week’s #TidyTuesday dataset on The Office to show how to build a lasso regression model and choose regularization parameters! Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first getting started to how to tune machine learning models. Today, I’m using this week’s #TidyTuesday dataset on college tuition and diversity at US colleges to show some data preprocessing steps and how to use resampling! Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
Last week I published a screencast demonstrating how to use the tidymodels framework and specifically the recipes package. Today, I’m using this week’s #TidyTuesday dataset on food consumption around the world to show hyperparameter tuning! Here is the code I used in the video, for those who prefer reading instead of or in addition to video. Explore the data Our modeling goal here is to predict which countries are Asian countries and which countries are not, based on their patterns of food consumption in the eleven categories from the #TidyTuesday dataset.
Last week I published my first screencast showing how to use the tidymodels framework for machine learning and modeling in R. Today, I’m using this week’s #TidyTuesday dataset on hotel bookings to show how to use one of the tidymodels packages recipes with some simple models! Here is the code I used in the video, for those who prefer reading instead of or in addition to video.
This week I started my new job as a software engineer at RStudio, working with Max Kuhn and other folks on tidymodels. I am really excited about tidymodels because my own experience as a practicing data scientist has shown me some of the areas for growth that still exist in open source software when it comes to modeling and machine learning. Almost nothing has had the kind of dramatic impact on my productivity that the tidyverse and other RStudio investments have had; I am enthusiastic about contributing to that kind of user-focused transformation for modeling and machine learning.