Preface
Hello, current and future statistics enthusiasts. I’m a professor at Princeton University who works at the interface of statistics, genetics, and genomics. I have worked with undergraduate and graduate students who come from many backgrounds: biology, computer science, engineering, mathematics, physics, and statistics. The goal for my students is to turn them into skilled applied statsiticians who are interested in tackling real world problems. The content of this book is what I want them to learn as soon as possible after joining my lab. More generally, this book establishes a foundation in applied statistics and data science for those interested in pursuing data-driven research. I have included many different types of data sets here, but the book places a special emphasis on modern biological problems and data sets due to my particular interests.
This book also serves as the primary reference for my Princeton University course titled, Foundations of Statistical Genomics, which has a web site, http://jdstorey.org/fsg/. The content of this book started with slides I created using revealjs
; these slides can be found at http://jdstorey.org/asdscourse2017/lectures/, and the source code at https://github.com/jdstorey/asdslectures. Although, I no longer use slides to teach this course, I was nevertheless able to easily move the R Markdown used to make the slides into a book format, using the bookdown
package.
The current draft of this book contains most of the statistics and R code that I intend to include. However, because it is being build from slides, there are two things to keep in mind. First, I will be adding a substantial amount of exposition and data analyses in the book. It currently mostly reads as terse slides. I will also be modifying some of the formatting to better suit bookdown. Second, it is common to quote from sources verbatim when making slides (citing the source, of course). It is not so common to do this in a book. Threfore, you will see more material quoted verbatim from outside sources than is typical in a book. I intend to remedy this over time.
This books is organized into several parts:
- Introduction
- Explortatory data analysis
- Probability
- Frequentist inference
- Bayesian inference
- Numerical methods for likelihood
- Nonparametric inference
- Statistical models
- High-dimensional inference
- Latent variable models
Source Files
The source files for this book are maintained on GitHub: https://github.com/jdstorey/fas
Feel free to visit this repository to help me make the book better.