1 R Basics
1.1 What is R?
- R is a programming language, a high-level “interpreted language”
- R is an interactive environment
- R is used for doing statistics and data science
1.2 Pros and Cons of R
- R is free and open-source
- R stays on the cutting-edge because of its ability to utilize independently developed “packages”
- R has some peculiar featues that experienced programmers should note (see The R Inferno)
- R has an amazing community of passionate users and developers
1.3 RStudio
RStudio is an IDE (integrated development environment) for R
It contains many useful features for using R
We will use the free version of RStudio in this course
1.4 Getting Started in R
1.4.1 Calculator
Operations on numbers: + - * / ^
> 2+1
[1] 3
> 6+3*4-2^3
[1] 10
> 6+(3*4)-(2^3)
[1] 10
1.4.2 Atomic Classes
There are five atomic classes (or modes) of objects in R:
- character
- complex
- integer
- logical
- numeric (real number)
There is a sixth called “raw” that we will not discuss.
1.4.3 Assigning Values to Variables
> x <- "qcb508" # character
> x <- 2+1i # complex
> x <- 4L # integer
> x <- TRUE # logical
> x <- 3.14159 # numeric
Note: Anything typed after the #
sign is not evaluated. The #
sign allows you to add comments to your code.
1.4.4 More Ways to Assign Values
> x <- 1
> 1 -> x
> x = 1
I recommend you only use x <- 1
since =
is used for variable assignments in function calls.
1.4.5 Evaluation
When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.
> x <- 1
> x+2
[1] 3
> print(x)
[1] 1
> print(x+2)
[1] 3
1.4.6 Functions
There are many useful functions included in R. “Packages” (covered later) can be loaded as libraries to provide additional functions. You can also write your own functions in any R session.
Here are some examples of built-in functions:
> x <- 2
> print(x)
[1] 2
> sqrt(x)
[1] 1.414214
> log(x)
[1] 0.6931472
> class(x)
[1] "numeric"
> is.vector(x)
[1] TRUE
1.4.7 Accessing Help in R
You can open the help file for any function by typing ?
with the functions name. Here is an example:
> ?sqrt
There’s also a function help.search
that can do general searches for help. You can learn about it by typing:
> ?help.search
It’s also useful to use Google: for example, “r help square root”. The R help files are also on the web.
1.4.8 Variable Names
In the previous examples, we used x
as our variable name. Do not use the following variable names, as they have special meanings in R:
c, q, s, t, C, D, F, I, T
When combining two words for a given variable, I recommend one of these common styles:
> my_variable <- 1
> myVariable <- 1
Variable names such as my.variable
are problematic because of the special use of “.” in R.
1.4.9 Vectors
The vector is the most basic object in R. You can create vectors in a number of ways.
> x <- c(1, 2, 3, 4, 5)
> x
[1] 1 2 3 4 5
>
> y <- 1:40
> y
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
>
> z <- seq(from=0, to=100, by=10)
> z
[1] 0 10 20 30 40 50 60 70 80 90 100
> length(z)
[1] 11
1.4.10 Vectors
- Programmers: vectors are indexed starting at
1
, not0
- A vector can only contain elements of a single class:
> x <- "a"
> x[0]
character(0)
> x[1]
[1] "a"
>
> y <- 1:3
> z <- c(x, y, TRUE, FALSE)
> z
[1] "a" "1" "2" "3" "TRUE" "FALSE"
1.4.11 Matrices
Like vectors, matrices are objects that can contain elements of only one class.
> m <- matrix(1:6, nrow=2, ncol=3)
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
>
> m <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
> m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
1.4.12 Factors
In statistics, factors encode categorical data.
> paint <- factor(c("red", "white", "blue", "blue", "red",
+ "red"))
> paint
[1] red white blue blue red red
Levels: blue red white
>
> table(paint)
paint
blue red white
2 3 1
> unclass(paint)
[1] 2 3 1 1 2 2
attr(,"levels")
[1] "blue" "red" "white"
1.4.13 Lists
Lists allow you to hold different classes of objects in one variable.
> x <- list(1:3, "a", c(TRUE, FALSE))
> x
[[1]]
[1] 1 2 3
[[2]]
[1] "a"
[[3]]
[1] TRUE FALSE
>
> ## access any element of the list
> x[[2]]
[1] "a"
> x[[3]][2]
[1] FALSE
1.4.14 Lists with Names
The elements of a list can be given names.
> x <- list(counting=1:3, char="a", logic=c(TRUE, FALSE))
> x
$counting
[1] 1 2 3
$char
[1] "a"
$logic
[1] TRUE FALSE
>
> ## access any element of the list
> x$char
[1] "a"
> x$logic[2]
[1] FALSE
1.4.15 Missing Values
In data analysis and model fitting, we often have missing values. NA
represents missing values and NaN
means “not a number”, which is a special type of missing value.
> m <- matrix(nrow=3, ncol=3)
> m
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
> 0/1
[1] 0
> 1/0
[1] Inf
> 0/0
[1] NaN
1.4.16 NULL
NULL
is a special type of reserved value in R.
> x <- vector(mode="list", length=3)
> x
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
1.4.17 Coercion
We saw earlier that when we mixed classes in a vector they were all coerced to be of type character:
> c("a", 1:3, TRUE, FALSE)
[1] "a" "1" "2" "3" "TRUE" "FALSE"
You can directly apply coercion with functions as.numeric()
, as.character()
, as.logical()
, etc.
This doesn’t always work out well:
> x <- 1:3
> as.character(x)
[1] "1" "2" "3"
>
> y <- c("a", "b", "c")
> as.numeric(y)
Warning: NAs introduced by coercion
[1] NA NA NA
1.4.18 Data Frames
The data frame is one of the most important objects in R. Data sets very often come in tabular form of mixed classes, and data frames are constructed exactly for this.
Data frames are lists where each element has the same length.
1.4.19 Data Frames
> df <- data.frame(counting=1:3, char=c("a", "b", "c"),
+ logic=c(TRUE, FALSE, TRUE))
> df
counting char logic
1 1 a TRUE
2 2 b FALSE
3 3 c TRUE
>
> nrow(df)
[1] 3
> ncol(df)
[1] 3
1.4.20 Data Frames
> dim(df)
[1] 3 3
>
> names(df)
[1] "counting" "char" "logic"
>
> attributes(df)
$names
[1] "counting" "char" "logic"
$class
[1] "data.frame"
$row.names
[1] 1 2 3
1.4.21 Attributes
Attributes give information (or meta-data) about R objects. The previous slide shows attributes(df)
, the attributes of the data frame df
.
> x <- 1:3
> attributes(x) # no attributes for a standard vector
NULL
>
> m <- matrix(1:6, nrow=2, ncol=3)
> attributes(m)
$dim
[1] 2 3
> paint <- factor(c("red", "white", "blue", "blue", "red",
+ "red"))
> attributes(paint)
$levels
[1] "blue" "red" "white"
$class
[1] "factor"
1.4.22 Names
Names can be assigned to columns and rows of vectors, matrices, and data frames. This makes your code easier to write and read.
> names(x) <- c("Princeton", "Rutgers", "Penn")
> x
Princeton Rutgers Penn
1 2 3
>
> colnames(m) <- c("NJ", "NY", "PA")
> rownames(m) <- c("East", "West")
> m
NJ NY PA
East 1 3 5
West 2 4 6
> colnames(m)
[1] "NJ" "NY" "PA"
1.4.23 Accessing Names
Displaying or assigning names to these three types of objects does not have consistent syntax.
Object | Column Names | Row Names |
---|---|---|
vector | names() |
N/A |
data frame | names() |
row.names() |
data frame | colnames() |
rownames() |
matrix | colnames() |
rownames() |