# 54 Types of Models

## 54.1 Probabilistic Models

So far we have covered inference of paramters that quantify a population of interest.

This is called inference of probabilistic models.

## 54.2 Multivariate Models

Some of the probabilistic models we considered involve calculating conditional probabilities such as $$\Pr({\boldsymbol{Z}}| {\boldsymbol{X}}; {\boldsymbol{\theta}})$$ or $$\Pr({\boldsymbol{\theta}}| {\boldsymbol{X}})$$.

It is often the case that we would like to build a model that explains the variation of one variable in terms of other variables. Statistical modeling typically refers to this goal.

## 54.3 Variables

Let’s suppose our does comes in the form $$({\boldsymbol{X}}_1, Y_1), ({\boldsymbol{X}}_2, Y_2), \ldots, ({\boldsymbol{X}}_n, Y_n) \sim F$$.

We will call $${\boldsymbol{X}}_i = (X_{i1}, X_{i2}, \ldots, X_{ip}) \in \mathbb{R}_{1 \times p}$$ the explanatory variables and $$Y_i \in \mathbb{R}$$ the dependent variable or response variable.

We can collect all variables as matrices

${\boldsymbol{Y}}_{n \times 1} \ \mbox{ and } \ {\boldsymbol{X}}_{n \times p}$

where each row is a unique observation.

## 54.4 Statistical Model

Statistical models are concerned with how variables are dependent. The most general model would be to infer

$\Pr(Y | {\boldsymbol{X}}) = h({\boldsymbol{X}})$

where we would specifically study the form of $$h(\cdot)$$ to understand how $$Y$$ is dependent on $${\boldsymbol{X}}$$.

A more modest goal is to infer the transformed conditional expecation

$g\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = h({\boldsymbol{X}})$

which sometimes leads us back to an estimate of $$\Pr(Y | {\boldsymbol{X}})$$.

## 54.5 Parametric vs Nonparametric

A parametric model is a pre-specified form of $$h(X)$$ whose terms can be characterized by a formula and interpreted. This usually involves parameters on which inference can be performed, such as coefficients in a linear model.

A nonparametric model is a data-driven form of $$h(X)$$ that is often very flexible and is not easily expressed or intepreted. A nonparametric model often does not include parameters on which we can do inference.

## 54.6 Simple Linear Regression

For random variables $$(X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)$$, simple linear regression estimates the model

$Y_i = \beta_1 + \beta_2 X_i + E_i$

where $${\operatorname{E}}[E_i] = 0$$, $${\operatorname{Var}}(E_i) = \sigma^2$$, and $${\operatorname{Cov}}(E_i, E_j) = 0$$ for all $$1 \leq i, j \leq n$$ and $$i \not= j$$.

Note that in this model $${\operatorname{E}}[Y | X] = \beta_1 + \beta_2 X.$$

## 54.7 Ordinary Least Squares

Ordinary least squares (OLS) estimates the model

\begin{aligned} Y_i & = \beta_1 X_{i1} + \beta_2 X_{i2} + \ldots + \beta_p X_{ip} + E_i \\ & = {\boldsymbol{X}}_i {\boldsymbol{\beta}}+ E_i \end{aligned}

where $${\rm E}[E_i] = 0$$, $${\rm Var}(E_i) = \sigma^2$$, and $${\operatorname{Cov}}(E_i, E_j) = 0$$ for all $$1 \leq i, j \leq n$$ and $$i \not= j$$.

Note that typically $$X_{i1} = 1$$ for all $$i$$ so that $$\beta_1 X_{i1} = \beta_1$$ serves as the intercept.

## 54.8 Generalized Least Squares

Generalized least squares (GLS) assumes the same model as OLS, except it allows for heteroskedasticity and covariance among the $$E_i$$. Specifically, it is assumed that $${\boldsymbol{E}}= (E_1, \ldots, E_n)^T$$ is distributed as

${\boldsymbol{E}}_{n \times 1} \sim (\boldsymbol{0}, {\boldsymbol{\Sigma}})$ where $$\boldsymbol{0}$$ is the expected value $${\boldsymbol{\Sigma}}= (\sigma_{ij})$$ is the $$n \times n$$ symmetric covariance matrix.

## 54.9 Matrix Form of Linear Models

We can write the models as

${\boldsymbol{Y}}_{n \times 1} = {\boldsymbol{X}}_{n \times p} {\boldsymbol{\beta}}_{p \times 1} + {\boldsymbol{E}}_{n \times 1}$

where simple linear regression, OLS, and GLS differ in the value of $$p$$ or the distribution of the $$E_i$$. We can also write the conditional expecation and covariance as

${\operatorname{E}}[{\boldsymbol{Y}}| {\boldsymbol{X}}] = {\boldsymbol{X}}{\boldsymbol{\beta}}, \ {\operatorname{Cov}}({\boldsymbol{Y}}| {\boldsymbol{X}}) = {\boldsymbol{\Sigma}}.$

## 54.10 Least Squares Regression

In simple linear regression, OLS, and GLS, the $${\boldsymbol{\beta}}$$ parameters are fit by minimizing the sum of squares between $${\boldsymbol{Y}}$$ and $${\boldsymbol{X}}{\boldsymbol{\beta}}$$.

Fitting these models by “least squares” satisfies two types of optimality:

1. Gauss-Markov Theorem
2. Maximum likelihood estimate when in addition $${\boldsymbol{E}}\sim \mbox{MVN}_n(\boldsymbol{0}, {\boldsymbol{\Sigma}})$$

## 54.11 Generalized Linear Models

The generalized linear model (GLM) builds from OLS and GLS to allow the response variable to be distributed according to an exponential family distribution. Suppose that $$\eta(\theta)$$ is function of the expected value into the natural parameter. The estimated model is

$\eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = {\boldsymbol{X}}{\boldsymbol{\beta}}$

which is fit by maximized likelihood estimation.

Next week, we will finally arrive at inferring semiparametric models where $$Y | {\boldsymbol{X}}$$ is distributed according to an exponential family distribution. The models, which are called generalized additive models (GAMs), will be of the form

$\eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = \sum_{j=1}^p \sum_{k=1}^d h_k(X_{j})$

where $$\eta$$ is the canonical link function and the $$h_k(\cdot)$$ functions are very flexible.

There are several important trade-offs encountered in statistical modeling:

• Bias vs variance
• Accuracy vs computational time
• Flexibility vs intepretability

These are not mutually exclusive phenomena.

## 54.14 Bias and Variance

Suppose we estimate $$Y = h({\boldsymbol{X}}) + E$$ by some $$\hat{Y} = \hat{h}({\boldsymbol{X}})$$. The following bias-variance trade-off exists:

\begin{aligned} {\operatorname{E}}\left[\left(Y - \hat{Y}\right)^2\right] & = {\rm E}\left[\left(h({\boldsymbol{X}}) + E - \hat{h}({\boldsymbol{X}})\right)^2\right] \\ \ & = {\rm E}\left[\left(h({\boldsymbol{X}}) - \hat{h}({\boldsymbol{X}})\right)^2\right] + {\rm Var}(E) \\ \ & = \left(h({\boldsymbol{X}}) - {\rm E}[\hat{h}({\boldsymbol{X}})]\right)^2 + {\rm Var}\left(\hat{h}({\boldsymbol{X}})\right)^2 + {\rm Var}(E) \\ \ & = \mbox{bias}^2 + \mbox{variance} + {\rm Var}(E) \end{aligned}