What tokens are used more vs. less in #TidyTuesday place names?
Let’s use byte pair encoding tokenization along with Poisson regression to understand which tokens are more more often (or less often) in US place names.
Machine learning, text analysis, and more
Let’s use byte pair encoding tokenization along with Poisson regression to understand which tokens are more more often (or less often) in US place names.
How well can we predict the magnitude of tornadoes in the US? Let’s use xgboost along with effect encoding to fit our model.
Can we predict childcare costs in the US using an xgboost model? In this blog post, learn how to use early stopping for hyperparameter tuning.
Learn how to train and deploy a model with R and vetiver on AWS SageMaker infrastructure.
High quality text embeddings are becoming more available from companies like OpenAI. Learn how to obtain them and then use them for text analysis.