r_cookbook_2nd_edition_by_jd_long_and_paul_teetor

R Cookbook, 2nd Edition by JD Long and Paul Teetor

Book Summary

Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this expanded second edition. The R language provides everything you need to do statistical work, but its structure can be difficult to master. These task-oriented recipes make you productive with R immediately. Solutions range from R basic tasks to R input and output, R general statistics, R graphics, and R linear regression.

Each R recipe addresses a specific problem and includes a discussion that explains the solution and provides insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an intermediate user, this book will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.

From the Introduction

1. Getting Started and Getting Help

Welcome to the R Cookbook, 2nd Edition

R is a powerful tool for R statistics, R graphics, and R statistical programming. It is used by tens of thousands of people daily to perform serious R statistical analyses. It is a free, open source system whose R implementation is the collective accomplishment of many intelligent, hard-working R people. There are more than 10,000 available R add-on packages, and R is a serious rival to all commercial statistical packages.

But R can be frustrating. It’s not obvious how to accomplish many R tasks, even simple ones. The simple tasks are easy once you know how, yet figuring out that “how” can be maddening.

This book is full of how-to R recipes, each of which solves a specific problem. Each R recipe includes a quick R introduction to the R solution followed by a R discussion that aims to unpack the solution and give you some insight into how it works. We know these recipes are useful and we know they work, because we use them ourselves.

The range of R recipes is broad. It starts with basic R tasks before moving on to R input and output, R general statistics, R graphics, and R linear regression. Any significant work with R will involve most or all of these areas.

If you are a R beginner, then this book will get you started faster. If you are an R intermediate user, this book will be useful for expanding your horizons and jogging your R memory (“How do I do that Kolmogorov–Smirnov test again?”).

The book is not a tutorial on R, although you will learn something by studying the R recipes. It is not a R reference manual, but it does contain a lot of useful R information. It is not a book on programming in R, although many recipes are useful inside R scripts.

Finally, this book is not an introduction to statistics. Many R recipes assume that you are familiar with the underlying R statistical procedure, if any, and just want to know how it’s done in R.

The Recipes

Most R recipes use one or two R functions to solve a specific R problem. It’s important to remember that we do not describe the R functions in detail; rather, we describe just enough to solve the immediate problem. Nearly every such function has additional capabilities beyond those described here, and some have amazing capabilities. We strongly urge you to read the functions’ R help pages. You will likely learn something valuable.

Each recipe presents one way to solve a particular problem. Of course, there are likely several reasonable solutions to each problem. When we knew of multiple solutions, we generally selected the simplest one. For any given task, you can probably discover several alternative solutions yourself. This is a cookbook, not a bible.

In particular, R has literally thousands of downloadable R add-on packages, many of which implement alternative R algorithms and R statistical methods. This R book concentrates on the R core functionality available through the basic R distribution combined with several important R packages known collectively as the tidyverse.

The most concise definition of the tidyverse comes from Hadley Wickham, its originator and one of its core maintainers:

“The R tidyverse is a set of R packages that work in harmony because they share common data representations and API design. The R tidyverse package is designed to make it easy to install and load R core packages from the tidyverse in a single R command. The best place to learn about all the packages in the tidyverse and how they fit together is R for Data Science.”

A Note on Terminology

R Terms: The goal of every R recipe is to solve a R problem and solve it quickly. Rather than laboring in tedious prose, we occasionally streamline the description with R terminology that is correct but not precise. A good example is the term R generic function. We refer to R print(x) and R plot(x) as generic functions because they work for many kinds of x, handling each kind appropriately. A computer scientist would wince at our terminology because, strictly speaking, these are not simply “R functions”; they are polymorphic methods with dynamic dispatching. But if we carefully unpacked every such technical detail, the essential R solutions would be buried in the technicalities. So we just call them functions, which we think is more readable.

Another example, taken from statistics, is the complexity surrounding the semantics of statistical hypothesis testing. Using the strict language of probability theory would obscure the practical application of some tests, so we use more colloquial language when describing each R statistical test. See the introduction to Chapter 9 for more about how R hypothesis tests are presented in the recipes.

Our goal is to make the power of R available to a wide audience by writing readably, not formally. We hope that experts in their respective fields will understand if our R terminology is occasionally informal.

Software and Platform Notes

The base distribution of R has frequent and planned releases, but the R language definition and R core implementation are stable. The R recipes in this book should work with any recent release of the base R distribution.

Some R recipes have platform-specific considerations, and we have carefully noted them. Those recipes mostly deal with software issues, such as installation and configuration. As far as we know, all other recipes will work on all three major platforms for R: R on Windows, R on macOS, and R on Linux/Unix.

Other Resources

Here are a few suggestions for further reading, if you’d like to dig a little deeper:

On the web

The mother ship for all things R is the R project site. From there you can download R for your platform, R add-on packages, R documentation, and R source code as well as many other R resources.

Beyond the R project site, we recommend using an R-specific search engine, such as RSeek, created by Sasha Goodman. You can use a generic search engine, such as Google, but the “R” search term brings up too much extraneous stuff. See Recipe 1.11 for more about searching the web.

Reading R blogs is a great way to learn about R and stay abreast of leading-edge developments. There are surprisingly many such blogs, so we recommend following two blogs-of-blogs: R-bloggers, created by Tal Galili, and PlanetR. By subscribing to their RSS feeds, you will be notified of interesting and useful articles from dozens of websites.

R books

There are many, many books about learning and using R. Listed here are a few that we have found useful. Note that the R project site contains an extensive R bibliography of books related to R.

R for Data Science by Hadley Wickham and Garrett Grolemund (O’Reilly), is an excellent introduction to the tidyverse packages, especially for using them in R data analysis and R statistics. It is also available online.

We find the R Graphics Cookbook, 2nd edition by Winston Chang (O’Reilly), indispensable for creating R graphics. The book ggplot2 - Elegant Graphics for Data Analysis by Hadley Wickham (Springer) is the definitive reference for the graphics package ggplot2, which we use in this book.

Anyone doing serious graphics work in R will want R Graphics by Paul Murrell (Chapman & Hall/CRC).

R in a Nutshell by Joseph Adler (O’Reilly), is the R quick tutorial and R reference you’ll keep by your side. It covers many more topics than this cookbook.

New books on programming in R appear regularly. We suggest Hands On Programming with R by Garrett Grolemund (O’Reilly) for an introduction, or The Art of R Programming by Normal Matloff (No Starch Press). Hadley Wickham’s Advanced R (Chapman & Hall/CRC) is available either as a printed book or free online and is a great deeper dive into advanced R topics. Efficient R Programming by Colin Gillespie and Robin Lovelace (O’Reilly), is another good guide to learning the deeper concepts about R programming.

Modern Applied Statistics with S, 4th ed., by William Venables and Brian Ripley (Springer), uses R to illustrate many advanced statistical techniques. The book’s functions and datasets are available in the MASS package, which is included in the standard distribution of R.

Serious geeks can download the R Language Definition from the R Core Team. The Definition is a work in progress, but it can answer many of your detailed questions regarding R as a programming language.

Statistics books

For learning statistics, a great choice is Using R for Introductory Statistics by John Verzani (Chapman & Hall/CRC). It teaches R statistics and R together, giving you the necessary computer skills to apply the R statistical methods.

You will need a good statistics textbook or reference book to accurately interpret the statistical tests performed in R. There are many such fine books—far too many for us to recommend any one above the others.

Increasingly, statistics authors are using R to illustrate their methods. If you work in a specialized field, then you will likely find a useful and relevant book in the R project bibliography.

Using Code Examples

Supplemental material (code examples, source code for the book, exercises, etc.) is available for download at http://rc2e.com. The Twitter account for content associated with this book is @R_cookbook.

Acknowledgments

With gratitude we thank the R community in general and the R Core Team in particular. Their selfless contributions are enormous. The world of statistics is benefiting tremendously from their work. The R Studio Community Discussion participants were very helpful in workshopping ideas around how to explain many things. And the staff and leadership of R Studio were supportive in so many little and big ways. We owe them a debt of gratitude for all they have given back to the R community.

We wish to thank the book’s technical reviewers: David Curran, Justin Shea, and MAJ Dusty Turner. Their feedback was critical for improving the quality, accuracy, and usefulness of this book. Our editors, Melissa Potter and Rachel Monaghan, were helpful beyond imagination and they frequently prevented us from publicly demonstrating our ignorance. Our production editor, Kristen Brown, is the envy of all technical authors because of her speed and her proficiency with both Markdown and Git.

Paul would like to thank his family for their support and patience during the creation of this book.

J.D. would like to thank his wife Mary Beth and daughter Ada for their patience with all the early mornings and weekends that he spent with his face in the laptop working on this book.

About the Authors

J.D. Long is a misplaced southern agricultural economist currently working for Renaissance Re in New York City. J.D. is an avid user of Python, R, AWS and colorful metaphors, and is a frequent presenter at R conferences as well as the founder of the Chicago R User Group. He lives in Jersey City, NJ with his wife, a recovering trial lawyer, and his 11-year-old circuit bending daughter.

Paul Teetor is a quantitative developer with Masters degrees in statistics and computer science. He specializes in analytics and software engineering for investment management, securities trading, and risk management. He works with hedge funds, market makers, and portfolio managers in the greater Chicago area.

Product Details

Research It More

R: R Fundamentals, R Inventor - R Language Designer: Ross Ihaka and Robert Gentleman in August 1993; R Core Team, R Language Definition on R-Project.org, R reserved words (R keywords), R data structures - R algorithms, R syntax, R input and Output, R data transformations, R probability, R statistics, R linear regression (ANOVA), R time series analysis, R graphics, R markdown, R OOP, R on Linux, R on macOS, R on Windows, R installation, R containerization, R configuration, R compiler - R interpreter (R REPL), R IDEs (RStudio, Jupyter Notebook), R development tools, R DevOps - R SRE, R data science - R DataOps, R machine learning, R deep learning, Functional R, R concurrency, R history, R bibliography, R glossary, R topics, R courses, R Standard Library, R libraries, R packages (tidyverse package), R frameworks, RDocumentation.org / CRAN, R research, R GitHub, Written in R, R popularity, R Awesome list, R Versions, Python. (navbar_r)


Fair Use Sources

Fair Use Sources:


© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers

SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.


r_cookbook_2nd_edition_by_jd_long_and_paul_teetor.txt · Last modified: 2024/04/28 03:36 (external edit)