• YARP, Yet Another R Primer
  • Preface
  • I R
  • 1 R Basics
    • 1.1 What is R?
    • 1.2 Pros and Cons of R
    • 1.3 RStudio
    • 1.4 Getting Started in R
      • 1.4.1 Calculator
      • 1.4.2 Atomic Classes
      • 1.4.3 Assigning Values to Variables
      • 1.4.4 More Ways to Assign Values
      • 1.4.5 Evaluation
      • 1.4.6 Functions
      • 1.4.7 Accessing Help in R
      • 1.4.8 Variable Names
      • 1.4.9 Vectors
      • 1.4.10 Vectors
      • 1.4.11 Matrices
      • 1.4.12 Factors
      • 1.4.13 Lists
      • 1.4.14 Lists with Names
      • 1.4.15 Missing Values
      • 1.4.16 NULL
      • 1.4.17 Coercion
      • 1.4.18 Data Frames
      • 1.4.19 Data Frames
      • 1.4.20 Data Frames
      • 1.4.21 Attributes
      • 1.4.22 Names
      • 1.4.23 Accessing Names
  • 2 Reproducible Data Analysis
    • 2.1 Definition and Motivation
    • 2.2 Reproducible vs. Replicable
    • 2.3 Steps to a Reproducible Analysis
    • 2.4 Organizing Your Data Analysis
    • 2.5 Common Mistakes
    • 2.6 R Markdown
      • 2.6.1 R + Markdown + knitr
      • 2.6.2 R Markdown Files
      • 2.6.3 Markdown
      • 2.6.4 LaTeX
      • 2.6.5 knitr
      • 2.6.6 knitr Chunks
      • 2.6.7 Chunk Option: echo
      • 2.6.8 Chunk Option: results
      • 2.6.9 Chunk Option: include
      • 2.6.10 Chunk Option: eval
      • 2.6.11 Chunk Names
      • 2.6.12 knitr Option: cache
      • 2.6.13 knitr Options: figures
      • 2.6.14 Changing Default Chunk Settings
      • 2.6.15 Documentation and Examples
  • 3 R Programming
    • 3.1 Control Structures
      • 3.1.1 Rationale
      • 3.1.2 Common Control Structures
      • 3.1.3 Some Boolean Logic
      • 3.1.4 if
      • 3.1.5 if-else
      • 3.1.6 for Loops
      • 3.1.7 Nested for Loops
      • 3.1.8 while
      • 3.1.9 repeat
      • 3.1.10 break and next
    • 3.2 Vectorized Operations
      • 3.2.1 Calculations on Vectors
      • 3.2.2 A Caveat
      • 3.2.3 Vectorized Matrix Operations
      • 3.2.4 Mixing Vectors and Matrices
      • 3.2.5 Mixing Vectors and Matrices
      • 3.2.6 Vectorized Boolean Logic
    • 3.3 Subsetting R Objects
      • 3.3.1 Subsetting Vectors
      • 3.3.2 Subsetting Vectors
      • 3.3.3 Subsettng Matrices
      • 3.3.4 Subsettng Matrices
      • 3.3.5 Subsettng Matrices
      • 3.3.6 Subsetting Lists
      • 3.3.7 Subsetting Data Frames
      • 3.3.8 Subsetting Data Frames
      • 3.3.9 Note on Data Frames
      • 3.3.10 Missing Values
      • 3.3.11 Subsetting by Matching
      • 3.3.12 Advanced Subsetting
    • 3.4 Functions
      • 3.4.1 Rationale
      • 3.4.2 Defining a New Function
      • 3.4.3 Example 1
      • 3.4.4 Example 2
      • 3.4.5 Example 3
      • 3.4.6 Default Function Argument Values
      • 3.4.7 The Ellipsis Argument
      • 3.4.8 Argument Matching
    • 3.5 Environment
      • 3.5.1 Loading .RData Files
      • 3.5.2 Listing Objects
      • 3.5.3 Removing Objects
      • 3.5.4 Advanced
    • 3.6 Packages
      • 3.6.1 Rationale
      • 3.6.2 Contents of a Package
      • 3.6.3 Installing Packages
      • 3.6.4 Loading Packages
      • 3.6.5 Getting Started with a Package
      • 3.6.6 Specifying a Function within a Package
      • 3.6.7 More on Packages
    • 3.7 Organizing Your Code
      • 3.7.1 Suggestions
      • 3.7.2 Where to Put Files
  • 4 Getting Data In and Out of R
    • 4.1 .RData Files
    • 4.2 readr Package
    • 4.3 Scraping from the Web
    • 4.4 APIs
  • II Data Wrangling
  • 5 Data Wrangling
    • 5.1 Definition
    • 5.2 Wrangling Challenges
  • 6 Tidy Data
    • 6.1 Motivation
    • 6.2 Definition
    • 6.3 Example: Titanic Data
      • 6.3.1 Intuitive Format
      • 6.3.2 Tidy Format
    • 6.4 Rules of Thumb
  • 7 Tidyverse
    • 7.1 Idea
    • 7.2 Packages
    • 7.3 Primary Packages
    • 7.4 Tidying Data
      • 7.4.1 tidyr Package
      • 7.4.2 Untidy Titanic Data
      • 7.4.3 gather()
      • 7.4.4 spread()
      • 7.4.5 Tidy with spread()
    • 7.5 Reshaping Data
      • 7.5.1 Wide vs. Long Format
      • 7.5.2 reshape2 Package
      • 7.5.3 Air Quality Data Set
      • 7.5.4 Melt
      • 7.5.5 Guided Melt
      • 7.5.6 Casting
      • 7.5.7 dcast()
  • 8 Transforming Data
    • 8.1 dplyr Package
    • 8.2 Grammar of dplyr
    • 8.3 Baby Names Data Set
    • 8.4 %>% Operator
    • 8.5 filter()
    • 8.6 arrange()
    • 8.7 rename()
    • 8.8 select()
    • 8.9 mutate()
    • 8.10 distinct()
    • 8.11 summarize()
    • 8.12 group_by()
    • 8.13 Chaining Verbs Together
  • 9 Relational Data
    • 9.1 Multiple Data Sets
    • 9.2 Toy Example
    • 9.3 Verbs
    • 9.4 inner_join()
    • 9.5 left_join()
    • 9.6 right_join()
    • 9.7 full_join()
    • 9.8 anti_join()
    • 9.9 semi_join()
    • 9.10 Repeated Key Values
    • 9.11 Set Operations
  • 10 Case Study in Data Wrangling
    • 10.1 Yeast Genomics
      • 10.1.1 Load Data
      • 10.1.2 Gene Expression Matrices
      • 10.1.3 Gene Position Matrix
      • 10.1.4 Row Names
      • 10.1.5 Unify Column Names
      • 10.1.6 Gene Positions
      • 10.1.7 Tidy Each Expression Matrix
      • 10.1.8 Combine Into Single Data Frame
      • 10.1.9 Join Gene Positions
      • 10.1.10 Apply dplyr Functions
  • 11 Further Reading
    • 11.1 Additional Examples
    • 11.2 Additional dplyr Features
  • III Expoloratory Data Analysis
  • 12 Exploratory Data Analysis
    • 12.1 What is EDA?
    • 12.2 Descriptive Statistics Examples
    • 12.3 Components of EDA
    • 12.4 Data Sets
      • 12.4.1 Data mtcars
      • 12.4.2 Data mpg
      • 12.4.3 Data diamonds
      • 12.4.4 Data gapminder
  • 13 Numerical Summaries of Data
    • 13.1 Useful Summaries
    • 13.2 Measures of Center
    • 13.3 Mean, Median, and Mode in R
    • 13.4 Quantiles and Percentiles
    • 13.5 Five Number Summary
    • 13.6 Measures of Spread
    • 13.7 Variance, SD, and IQR in R
    • 13.8 Identifying Outliers
    • 13.9 Application to mtcars Data
    • 13.10 Measuring Symmetry
    • 13.11 skewness() Function
    • 13.12 Measuring Tails
    • 13.13 Excess Kurtosis
    • 13.14 kurtosis() Function
    • 13.15 Visualizing Skewness and Kurtosis
    • 13.16 Covariance and Correlation
      • 13.16.1 Covariance
      • 13.16.2 Pearson Correlation
      • 13.16.3 Spearman Correlation
  • 14 Data Visualization Basics
    • 14.1 Plots
    • 14.2 R Base Graphics
    • 14.3 Read the Documentation
    • 14.4 Barplot
    • 14.5 Boxplot
    • 14.6 Constructing Boxplots
    • 14.7 Boxplot with Outliers
    • 14.8 Histogram
    • 14.9 Histogram with More Breaks
    • 14.10 Density Plot
    • 14.11 Boxplot (Side-By-Side)
    • 14.12 Stacked Barplot
    • 14.13 Scatterplot
    • 14.14 Quantile-Quantile Plots
  • 15 A Grammar of Graphics
    • 15.1 Rationale
    • 15.2 Package ggplot2
    • 15.3 Pieces of the Grammar
    • 15.4 Geometries
    • 15.5 Call Format
    • 15.6 Layers
    • 15.7 Placement of the aes() Call
    • 15.8 Original Publications
    • 15.9 Documentation
    • 15.10 Barplots
    • 15.11 Boxplots and Violin Plots
    • 15.12 Histograms and Density Plots
    • 15.13 Line Plots
    • 15.14 Scatterplots
    • 15.15 Axis Scales
    • 15.16 Scatterplot Smoothers
    • 15.17 Overplotting
    • 15.18 Labels and Legends
    • 15.19 Facets
    • 15.20 Colors
      • 15.20.1 Finding Colors
      • 15.20.2 Some Useful Layers
    • 15.21 Saving Plots
      • 15.21.1 Saving Plots as Variables
      • 15.21.2 Saving Plots to Files
    • 15.22 Dynamic Visualization
    • 15.23 Themes
      • 15.23.1 Available Themes
      • 15.23.2 Setting a Theme
  • Appendix
  • References
  • Session Information

YARP, Yet Another R Primer

YARP, Yet Another R Primer

A short introduction to R, R Markdown, and elements of the tidyverse

John D. Storey

Created 2017-02-01; Last modified 2020-01-28