quanteda (Quantitative Analysis of Textual Data) is an R package for managing and analyzing text.

The package is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources.

While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

The R package is written in C++ and Fortran.

Features include:

  • Fast, flexible, and comprehensive framework for quantitative text analysis in R.
  • Provides functionality for corpus management, creating and manipulating tokens and ngrams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
  • Simple and powerful companion package for loading texts: readtext. The main function in this package, readtext(), takes a file or fileset from disk or a URL, and returns a type of data.frame that can be used directly with the corpus() constructor function, to create a quanteda corpus object.
  • Lexicoder Sentiment Dictionary is supplied.
  • Cross-platform support – runs under Linux, macOS, and Windows.

Website: quanteda.io
Support: Quick Start Guide, GitHub Code Repository
Developer: Kenneth Benoit
License: GNU General Public License v3.0

