1 R Basics

1.1 What is R?

  • R is a programming language, a high-level “interpreted language”
  • R is an interactive environment
  • R is used for doing statistics and data science

1.2 Pros and Cons of R

  • R is free and open-source
  • R stays on the cutting-edge because of its ability to utilize independently developed “packages”
  • R has some peculiar featues that experienced programmers should note (see The R Inferno)
  • R has an amazing community of passionate users and developers

1.3 RStudio

  • RStudio is an IDE (integrated development environment) for R

  • It contains many useful features for using R

  • We will use the free version of RStudio in this course

1.4 Getting Started in R

1.4.1 Calculator

Operations on numbers: + - * / ^

> 2+1
[1] 3
> 6+3*4-2^3
[1] 10
> 6+(3*4)-(2^3)
[1] 10

1.4.2 Atomic Classes

There are five atomic classes (or modes) of objects in R:

  1. character
  2. complex
  3. integer
  4. logical
  5. numeric (real number)

There is a sixth called “raw” that we will not discuss.

1.4.3 Assigning Values to Variables

> x <- "qcb508" # character
> x <- 2+1i     # complex
> x <- 4L       # integer
> x <- TRUE     # logical
> x <- 3.14159  # numeric

Note: Anything typed after the # sign is not evaluated. The # sign allows you to add comments to your code.

1.4.4 More Ways to Assign Values

> x <- 1
> 1 -> x
> x = 1

I recommend you only use x <- 1 since = is used for variable assignments in function calls.

1.4.5 Evaluation

When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.

> x <- 1
> x+2
[1] 3
> print(x)
[1] 1
> print(x+2)
[1] 3

1.4.6 Functions

There are many useful functions included in R. “Packages” (covered later) can be loaded as libraries to provide additional functions. You can also write your own functions in any R session.

Here are some examples of built-in functions:

> x <- 2
> print(x)
[1] 2
> sqrt(x)
[1] 1.414214
> log(x)
[1] 0.6931472
> class(x)
[1] "numeric"
> is.vector(x)
[1] TRUE

1.4.7 Accessing Help in R

You can open the help file for any function by typing ? with the functions name. Here is an example:

> ?sqrt

There’s also a function help.search that can do general searches for help. You can learn about it by typing:

> ?help.search

It’s also useful to use Google: for example, “r help square root”. The R help files are also on the web.

1.4.8 Variable Names

In the previous examples, we used x as our variable name. Do not use the following variable names, as they have special meanings in R:

c, q, s, t, C, D, F, I, T

When combining two words for a given variable, I recommend one of these common styles:

> my_variable <- 1
> myVariable <- 1

Variable names such as my.variable are problematic because of the special use of “.” in R.

1.4.9 Vectors

The vector is the most basic object in R. You can create vectors in a number of ways.

> x <- c(1, 2, 3, 4, 5)
> x
[1] 1 2 3 4 5
> 
> y <- 1:40
> y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
> 
> z <- seq(from=0, to=100, by=10)
> z
 [1]   0  10  20  30  40  50  60  70  80  90 100
> length(z)
[1] 11

1.4.10 Vectors

  • Programmers: vectors are indexed starting at 1, not 0
  • A vector can only contain elements of a single class:
> x <- "a"
> x[0]
character(0)
> x[1]
[1] "a"
> 
> y <- 1:3
> z <- c(x, y, TRUE, FALSE)
> z
[1] "a"     "1"     "2"     "3"     "TRUE"  "FALSE"

1.4.11 Matrices

Like vectors, matrices are objects that can contain elements of only one class.

> m <- matrix(1:6, nrow=2, ncol=3)
> m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> 
> m <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
> m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

1.4.12 Factors

In statistics, factors encode categorical data.

> paint <- factor(c("red", "white", "blue", "blue", "red", 
+                   "red"))
> paint
[1] red   white blue  blue  red   red  
Levels: blue red white
> 
> table(paint)
paint
 blue   red white 
    2     3     1 
> unclass(paint)
[1] 2 3 1 1 2 2
attr(,"levels")
[1] "blue"  "red"   "white"

1.4.13 Lists

Lists allow you to hold different classes of objects in one variable.

> x <- list(1:3, "a", c(TRUE, FALSE))
> x
[[1]]
[1] 1 2 3

[[2]]
[1] "a"

[[3]]
[1]  TRUE FALSE
> 
> ## access any element of the list
> x[[2]]
[1] "a"
> x[[3]][2]
[1] FALSE

1.4.14 Lists with Names

The elements of a list can be given names.

> x <- list(counting=1:3, char="a", logic=c(TRUE, FALSE))
> x
$counting
[1] 1 2 3

$char
[1] "a"

$logic
[1]  TRUE FALSE
> 
> ## access any element of the list
> x$char
[1] "a"
> x$logic[2]
[1] FALSE

1.4.15 Missing Values

In data analysis and model fitting, we often have missing values. NA represents missing values and NaN means “not a number”, which is a special type of missing value.

> m <- matrix(nrow=3, ncol=3)
> m
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
[3,]   NA   NA   NA
> 0/1
[1] 0
> 1/0
[1] Inf
> 0/0
[1] NaN

1.4.16 NULL

NULL is a special type of reserved value in R.

> x <- vector(mode="list", length=3)
> x
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

1.4.17 Coercion

We saw earlier that when we mixed classes in a vector they were all coerced to be of type character:

> c("a", 1:3, TRUE, FALSE)
[1] "a"     "1"     "2"     "3"     "TRUE"  "FALSE"

You can directly apply coercion with functions as.numeric(), as.character(), as.logical(), etc.

This doesn’t always work out well:

> x <- 1:3
> as.character(x)
[1] "1" "2" "3"
> 
> y <- c("a", "b", "c")
> as.numeric(y)
Warning: NAs introduced by coercion
[1] NA NA NA

1.4.18 Data Frames

The data frame is one of the most important objects in R. Data sets very often come in tabular form of mixed classes, and data frames are constructed exactly for this.

Data frames are lists where each element has the same length.

1.4.19 Data Frames

> df <- data.frame(counting=1:3, char=c("a", "b", "c"), 
+                  logic=c(TRUE, FALSE, TRUE))
> df
  counting char logic
1        1    a  TRUE
2        2    b FALSE
3        3    c  TRUE
> 
> nrow(df)
[1] 3
> ncol(df)
[1] 3

1.4.20 Data Frames

> dim(df)
[1] 3 3
> 
> names(df)
[1] "counting" "char"     "logic"   
> 
> attributes(df)
$names
[1] "counting" "char"     "logic"   

$class
[1] "data.frame"

$row.names
[1] 1 2 3

1.4.21 Attributes

Attributes give information (or meta-data) about R objects. The previous slide shows attributes(df), the attributes of the data frame df.

> x <- 1:3
> attributes(x) # no attributes for a standard vector
NULL
> 
> m <- matrix(1:6, nrow=2, ncol=3)
> attributes(m)
$dim
[1] 2 3
> paint <- factor(c("red", "white", "blue", "blue", "red", 
+                   "red"))
> attributes(paint)
$levels
[1] "blue"  "red"   "white"

$class
[1] "factor"

1.4.22 Names

Names can be assigned to columns and rows of vectors, matrices, and data frames. This makes your code easier to write and read.

> names(x) <- c("Princeton", "Rutgers", "Penn")
> x
Princeton   Rutgers      Penn 
        1         2         3 
> 
> colnames(m) <- c("NJ", "NY", "PA")
> rownames(m) <- c("East", "West")
> m
     NJ NY PA
East  1  3  5
West  2  4  6
> colnames(m)
[1] "NJ" "NY" "PA"

1.4.23 Accessing Names

Displaying or assigning names to these three types of objects does not have consistent syntax.

Object Column Names Row Names
vector names() N/A
data frame names() row.names()
data frame colnames() rownames()
matrix colnames() rownames()