Data Cleaning with R

As of last week, my course “Data Cleaning with R” is now live and available at R for the Rest of Us.

R for the Rest of Us is a welcoming community meant to bring R to folks who do not yet use it. The site, run by David Keyes, brings together courses, learning resources, custom training, and consulting in all things R.

David contacted me roughly a year ago, to chat about my work with R and eventually we discussed developing a course for his site. If you’ve read other posts on my blog, used the unheadr package, or saw any of my recent conference talks, you might be aware that I’ve put a lot of thought into the process of data cleaning/wrangling/munging using R.

Over the past 9 months, I developed the overall outline for a course that would cover the high-level concepts and best practices in data organization, to general field-tested approaches for cleaning messy data. I tried by best to make sure that the course covers the why and the how of data cleaning.

I may not have any fancy Data Science or programming credentials, but I’d like to consider this tweet below as my certificate of competence in data Rectangling.


The course is self paced, and throughout 31 lessons you will learn how to:

  • Make your data tidy
  • Fix bad variable names that make data unusable
  • Use regular expressions to deal with character strings
  • Work with missing data
  • Identify and deal with duplicates
  • Share usable, analysis-ready data

I put a lot of thought into explaining regular expressions in a communicative manner, and I hope others find this section particularly helpful.

Here’s a brief video overview.


Check out the course details, and feel free to contact me if you have any questions.