The R language is the de facto standard among statisticians for the development of statistical software, and is widely used for statistical software development and data analysis. R is a modern dialect of S, one of several statistical programming languages designed at Bell Laboratories.
R is much more than a programming language. It’s an interactive suite of software facilities for data manipulation, calculation, and graphical display. R offers a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The ability to download and install R packages is a key factor which makes R an excellent language to learn. What else makes R awesome? Here’s a taster.
- It’s free, open source, and available for every major platform. So anyone can repeat your work whatever platform they run.
- A huge set of high quality packages for statistical modelling, machine learning, visualisation, and importing and manipulating data.
- Cutting edge tools.
- A suite of operators for calculations on arrays, in particular matrices.
- Deep-seated language support for data analysis. This includes features likes missing values, data frames, and subsetting.
- Powerful tools for communicating your results.
- Produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are available through additional packages. R packages make it easy to produce HTML or PDF, and create interactive websites with Shiny, a sublime R package.
- A strong foundation in functional programming. The ideas of functional programming are well suited to solving many of the challenges of data analysis. R provides a powerful and flexible toolkit which allows you to write concise yet descriptive code.
- RStudio, a powerful integrated development environment.
- Powerful metaprogramming facilities; a fantastic environment for interactive data analysis.
- Connects to high-performance programming languages like C, Fortran, and C++.
- An amazingly vibrant and helpful community.
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. The CRAN package repository hosts over 14,000 packages, and Bioconductor is home to over 1,600 packages.
This article recommends 29 free books which will teach you the basics of R, how to produce amazing plots, how to apply R to lots of disciplines, and how to efficiently program in R. Many of the books are open source.
If you’re new to R, we strongly recommend reading our interactive tutorial: Introduction to R and RStudio for Data Science. It focuses on a common task in data science: import a data set, manipulate its structure, and then visualise the data. We use R and RStudio to accomplish this task.
1. R for Data Science by Hadley Wickham & Garrett Grolemund
R for Data Science is the ideal introductory text for learning about what R can do. In fact, we’d go as far to say it’s the best introductory book for budding R data scientists. It teaches you the basics learning good practices for writing and organizing your R code, and RStudio, a powerful IDE. The focus of this book is on exploration, not confirmation or formal inference.
If you’re looking to grasp how to make simple and elegant plots in R, learn how to transform data, and embark on some data analysis, this is definitely your starting text.
There’s particularly good coverage about data wrangling, and you’ll master the basics of data frames, data importing, and tidy data.
Hadley Wickham has graciously made this book available online. It’s released under an open source license. You’ll probably want to purchase the paperback version, the book is so good.
2. Introduction to Data Science by Rafael A Irizarry
This introductory book introduces concepts and skills that can help you tackle real-world data analysis challenges. It’s an exceptionally good read covering concepts from probability, statistical inference, linear regression and machine learning.
It also helps you develop skills such as R programming, data wrangling with dplyr, data visualization with ggplot2, algorithm building with caret, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with knitr and R markdown.
The book includes dozens of exercises to test whether you have understood the material.
It’s suggested price is $49.99, but the book can be downloaded without charge. And it’s released under an open source license.
3. Hands-On Programming with R by Garrett Grolemund
As the title suggests, Hands-On Programming with R teaches you how to program in R. It’s expertly crafted. There’s hands-on examples in the book.
The book teaches you how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools.
The book is released under an open source license.
4. ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham
ggplot2 is a widely acclaimed data visualization package for the statistical programming language R. The package lets you create new beautiful plots. We use ggplot2 extensively for our Group Tests charts.
ggplot2 was created by Hadley Wickham. So it’s not surprising that we recommend his ggplot2: Elegant Graphics for Data Analysis book. It expertly teaches you the elements of ggplot2’s grammar and how they fit together. This book helps you understand the theory that underpins ggplot2, and will help you create new types of graphic specifically tailored to your needs
You can grab the code and text behind the ggplot2 book. ggplot2’s reference website is a welcome resource once you’ve mastered the basics.
5. Data Visualization: A practical introduction by Keiran Healy
Data Visualization: A practical introduction offers students and researchers a hands-on introduction to the principles and practice of data visualization. No knowledge of R is assumed.
Data Visualization builds the reader’s expertise in ggplot2, an excellent visualization library for the R programming language. Through a series of worked examples, this accessible primer then demonstrates how to create plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Learn how to produce and refine plots. The worked examples are a real godsend.
Topics include plotting continuous and categorical variables; layering information on graphics; producing effective “small multiple” plots; grouping, summarizing, and transforming data for plotting; creating maps; working with the output of statistical models; and refining plots to make them more comprehensible.
Kieran Healy is associate professor of sociology at Duke University.
Pages in this article:
Page 1 – R for Data Science and more books
Page 2 – R Graphics Cookbook and more books
Page 3 – Fundamentals of Data Visualization and more books
Page 4 – Data Analysis for the Life Sciences and more books
Page 5 – An Introduction To R and more books
Page 6 – Modern Statistics for Modern Biology and more books
Page 7 – A Little Book of R for Biomedical Statistics and more books
All books in this series:
|Free Programming Books|
|Ada||ALGOL-like programming language, extended from Pascal and other languages|
|Agda||Dependently typed functional language based on intuitionistic Type Theory|
|Arduino||Inexpensive, flexible, open source microcontroller platform|
|Assembly||As close to writing machine code without writing in pure hexadecimal|
|Awk||Versatile language designed for pattern scanning and processing language|
|Bash||Shell and command language; popular both as a shell and a scripting language|
|BASIC||Beginner’s All-purpose Symbolic Instruction Code|
|C||General-purpose, procedural, portable, high-level language|
|C++||General-purpose, portable, free-form, multi-paradigm language|
|C#||Combines the power and flexibility of C++ with the simplicity of Visual Basic|
|Clojure||Dialect of the Lisp programming language|
|COBOL||Common Business-Oriented Language|
|Coq||Dependently typed language similar to Agda, Idris, F* and others|
|Crystal||General-purpose, concurrent, multi-paradigm, object-oriented language|
|CSS||CSS (Cascading Style Sheets) specifies a web page’s appearance|
|D||General-purpose systems programming language with a C-like syntax|
|Dart||Client-optimized language for fast apps on multiple platforms|
|Dylan||Multi-paradigm language supporting functional and object-oriented coding|
|ECMAScript||Best known as the language embedded in web browsers|
|Eiffel||Object-oriented language designed by Bertrand Meyer|
|Elixir||Relatively new functional language running on the Erlang virtual machine|
|Erlang||General-purpose, concurrent, declarative, functional language|
|F#||Uses functional, imperative, and object-oriented programming methods|
|Factor||Dynamic stack-based programming language|
|Forth||Imperative stack-based programming language|
|Fortran||The first high-level language, using the first compiler|
|Go||Compiled, statically typed programming language|
|Groovy||Powerful, optionally typed and dynamic language|
|Haskell||Standardized, general-purpose, polymorphically, statically typed language|
|HTML||HyperText Markup Language|
|Icon||Wide variety of features for processing and presenting symbolic data|
|J||Array programming language based primarily on APL|
|Java||General-purpose, concurrent, class-based, object-oriented, high-level language|
|Julia||High-level, high-performance language for technical computing|
|Kotlin||More modern version of Java|
|LabVIEW||Designed to enable domain experts to build power systems quickly|
|LaTeX||Professional document preparation system and document markup language|
|Lisp||Unique features - excellent to study programming constructs|
|Logo||Dialect of Lisp that features interactivity, modularity, extensibility|
|Lua||Designed as an embeddable scripting language|
|Markdown||Plain text formatting syntax designed to be easy-to-read and easy-to-write|
|Objective-C||Object-oriented language that adds Smalltalk-style messaging to C|
|OCaml||The main implementation of the Caml language|
|Pascal||Imperative and procedural language designed in the late 1960s|
|Perl||High-level, general-purpose, interpreted, scripting, dynamic language|
|PHP||PHP has been at the helm of the web for many years|
|PostScript||Interpreted, stack-based and Turing complete language|
|Prolog||A general purpose, declarative, logic programming language|
|Python||General-purpose, structured, powerful language|
|QML||Hierarchical declarative language for user interface layout - JSON-like syntax|
|R||De facto standard among statisticians and data analysts|
|Racket||General-purpose, object-oriented, multi-paradigm, functional language|
|Raku||Member of the Perl family of programming languages|
|Ruby||General purpose, scripting, structured, flexible, fully object-oriented language|
|Rust||Ideal for systems, embedded, and other performance critical code|
|Scala||Modern, object-functional, multi-paradigm, Java-based language|
|Scheme||A general-purpose, functional language descended from Lisp and Algol|
|Scratch||Visual programming language designed for 8-16 year-old children|
|SQL||Access and manipulate data held in a relational database management system|
|Standard ML||General-purpose functional language characterized as "Lisp with types"|
|Swift||Powerful and intuitive general-purpose programming language|
|Tcl||Dynamic language based on concepts of Lisp, C, and Unix shells|
|TeX||Markup and programming language - create professional quality typeset text|
|Vala||Object-oriented language, syntactically similar to C#|
|VHDL||Hardware description language used in electronic design automation|
|VimL||Powerful scripting language of the Vim editor|
|XML||Rules for defining semantic tags describing structure ad meaning|