4 Getting Data In and Out of R
4.1 .RData
Files
R objects can be saved to binary .RData
files and loaded with the save
(or save.image
) and load
functions, respectively.
This is the easiest way to get data into R.
4.2 readr
Package
There are a number of R packages that provide more sophisticated tools for getting data in and out of R, especially as data sets have become larger and larger.
One of those packages is readr
for text files. It reads and writes data quickly, provides a useful status bar for large files, and does a good job at determining data types.
readr
is organized similarly to the base R functions. For example, there are functions read_table
, read_csv
, write_tsv
, and write_csv
.
See also fread
and fwrite
from the data.table
package.
4.3 Scraping from the Web
There are several packages that facilitate “scraping” data from the web, including rvest
demonstrated here.
> library("rvest")
> schedule <- read_html("http://jdstorey.github.io/asdscourse/schedule/")
> first_table <- html_table(schedule)[[1]]
> names(first_table) <- c("week", "topics", "reading")
> first_table[2,"week"]
> first_table[2,"topics"] %>% strsplit(split=" ")
> first_table[2,"reading"] %>% strsplit(split=" ")
> grep("R4DS", first_table$reading) # which rows (weeks) have R4DS
The rvest
documentation recommends SelectorGadget, which is “a javascript bookmarklet that allows you to interactively figure out what css selector you need to extract desired components from a page.”
> usg_url <- "https://princetonusg.com/senate/"
> usg <- read_html(usg_url)
> officers <- html_nodes(usg, ".team-member-name") %>%
+ html_text
> head(officers, n=20)
4.4 APIs
API stands for “application programming interface” which is a set of routines, protocols, and tools for building software and applications.
A specific website may provide an API for scraping data from that website.
There are R packages that provide an interface with specific APIs, such as the twitteR
package.