tidytext – text mining using dplyr, ggplot2, and other tidy tools

tidytext serves to bring text data into the “tidyverse”.

It’s text mining for word processing and sentiment analysis using ‘dplyr’, ‘ggplot2’, and other tidy tools.

It provides simple tools to manipulate unstructured text data in such a way that it can be analyzed with tools like dplyr and ggplot2.

The tidytext package structures text data upon the principle of tidy data. As well documented in a chapter of Hadley Wickham’s R for Data Science, three rules make a data set tidy:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

Website: juliasilge.github.io/tidytext
Support: GitHub Code Repository
Developer: Julia Silge, David Robinson
License: MIT License

Return to Natural Language Processing Home Page | Return to R Natural Language Tools Page

Read our complete collection of recommended free and open source software. The collection covers all categories of software.

The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.
Share this article