54 Types of Models

54.1 Probabilistic Models

So far we have covered inference of paramters that quantify a population of interest.

This is called inference of probabilistic models.

54.2 Multivariate Models

Some of the probabilistic models we considered involve calculating conditional probabilities such as \(\Pr({\boldsymbol{Z}}| {\boldsymbol{X}}; {\boldsymbol{\theta}})\) or \(\Pr({\boldsymbol{\theta}}| {\boldsymbol{X}})\).

It is often the case that we would like to build a model that explains the variation of one variable in terms of other variables. Statistical modeling typically refers to this goal.

54.3 Variables

Let’s suppose our does comes in the form \(({\boldsymbol{X}}_1, Y_1), ({\boldsymbol{X}}_2, Y_2), \ldots, ({\boldsymbol{X}}_n, Y_n) \sim F\).

We will call \({\boldsymbol{X}}_i = (X_{i1}, X_{i2}, \ldots, X_{ip}) \in \mathbb{R}_{1 \times p}\) the explanatory variables and \(Y_i \in \mathbb{R}\) the dependent variable or response variable.

We can collect all variables as matrices

\[ {\boldsymbol{Y}}_{n \times 1} \ \mbox{ and } \ {\boldsymbol{X}}_{n \times p}\]

where each row is a unique observation.

54.4 Statistical Model

Statistical models are concerned with how variables are dependent. The most general model would be to infer

\[ \Pr(Y | {\boldsymbol{X}}) = h({\boldsymbol{X}}) \]

where we would specifically study the form of \(h(\cdot)\) to understand how \(Y\) is dependent on \({\boldsymbol{X}}\).

A more modest goal is to infer the transformed conditional expecation

\[ g\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = h({\boldsymbol{X}}) \]

which sometimes leads us back to an estimate of \(\Pr(Y | {\boldsymbol{X}})\).

54.5 Parametric vs Nonparametric

A parametric model is a pre-specified form of \(h(X)\) whose terms can be characterized by a formula and interpreted. This usually involves parameters on which inference can be performed, such as coefficients in a linear model.

A nonparametric model is a data-driven form of \(h(X)\) that is often very flexible and is not easily expressed or intepreted. A nonparametric model often does not include parameters on which we can do inference.

54.6 Simple Linear Regression

For random variables \((X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)\), simple linear regression estimates the model

\[ Y_i = \beta_1 + \beta_2 X_i + E_i \]

where \({\operatorname{E}}[E_i] = 0\), \({\operatorname{Var}}(E_i) = \sigma^2\), and \({\operatorname{Cov}}(E_i, E_j) = 0\) for all \(1 \leq i, j \leq n\) and \(i \not= j\).

Note that in this model \({\operatorname{E}}[Y | X] = \beta_1 + \beta_2 X.\)

54.7 Ordinary Least Squares

Ordinary least squares (OLS) estimates the model

\[ \begin{aligned} Y_i & = \beta_1 X_{i1} + \beta_2 X_{i2} + \ldots + \beta_p X_{ip} + E_i \\ & = {\boldsymbol{X}}_i {\boldsymbol{\beta}}+ E_i \end{aligned} \]

where \({\rm E}[E_i] = 0\), \({\rm Var}(E_i) = \sigma^2\), and \({\operatorname{Cov}}(E_i, E_j) = 0\) for all \(1 \leq i, j \leq n\) and \(i \not= j\).

Note that typically \(X_{i1} = 1\) for all \(i\) so that \(\beta_1 X_{i1} = \beta_1\) serves as the intercept.

54.8 Generalized Least Squares

Generalized least squares (GLS) assumes the same model as OLS, except it allows for heteroskedasticity and covariance among the \(E_i\). Specifically, it is assumed that \({\boldsymbol{E}}= (E_1, \ldots, E_n)^T\) is distributed as

\[ {\boldsymbol{E}}_{n \times 1} \sim (\boldsymbol{0}, {\boldsymbol{\Sigma}}) \] where \(\boldsymbol{0}\) is the expected value \({\boldsymbol{\Sigma}}= (\sigma_{ij})\) is the \(n \times n\) symmetric covariance matrix.

54.9 Matrix Form of Linear Models

We can write the models as

\[ {\boldsymbol{Y}}_{n \times 1} = {\boldsymbol{X}}_{n \times p} {\boldsymbol{\beta}}_{p \times 1} + {\boldsymbol{E}}_{n \times 1} \]

where simple linear regression, OLS, and GLS differ in the value of \(p\) or the distribution of the \(E_i\). We can also write the conditional expecation and covariance as

\[ {\operatorname{E}}[{\boldsymbol{Y}}| {\boldsymbol{X}}] = {\boldsymbol{X}}{\boldsymbol{\beta}}, \ {\operatorname{Cov}}({\boldsymbol{Y}}| {\boldsymbol{X}}) = {\boldsymbol{\Sigma}}. \]

54.10 Least Squares Regression

In simple linear regression, OLS, and GLS, the \({\boldsymbol{\beta}}\) parameters are fit by minimizing the sum of squares between \({\boldsymbol{Y}}\) and \({\boldsymbol{X}}{\boldsymbol{\beta}}\).

Fitting these models by “least squares” satisfies two types of optimality:

Gauss-Markov Theorem
Maximum likelihood estimate when in addition \({\boldsymbol{E}}\sim \mbox{MVN}_n(\boldsymbol{0}, {\boldsymbol{\Sigma}})\)

Details will follow on these.

54.11 Generalized Linear Models

The generalized linear model (GLM) builds from OLS and GLS to allow the response variable to be distributed according to an exponential family distribution. Suppose that \(\eta(\theta)\) is function of the expected value into the natural parameter. The estimated model is

\[ \eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = {\boldsymbol{X}}{\boldsymbol{\beta}}\]

which is fit by maximized likelihood estimation.

54.12 Generalized Additive Models

Next week, we will finally arrive at inferring semiparametric models where \(Y | {\boldsymbol{X}}\) is distributed according to an exponential family distribution. The models, which are called generalized additive models (GAMs), will be of the form

\[ \eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = \sum_{j=1}^p \sum_{k=1}^d h_k(X_{j}) \]

where \(\eta\) is the canonical link function and the \(h_k(\cdot)\) functions are very flexible.

54.13 Some Trade-offs

There are several important trade-offs encountered in statistical modeling:

Bias vs variance
Accuracy vs computational time
Flexibility vs intepretability

These are not mutually exclusive phenomena.

54.14 Bias and Variance

Suppose we estimate \(Y = h({\boldsymbol{X}}) + E\) by some \(\hat{Y} = \hat{h}({\boldsymbol{X}})\). The following bias-variance trade-off exists:

\[ \begin{aligned} {\operatorname{E}}\left[\left(Y - \hat{Y}\right)^2\right] & = {\rm E}\left[\left(h({\boldsymbol{X}}) + E - \hat{h}({\boldsymbol{X}})\right)^2\right] \\ \ & = {\rm E}\left[\left(h({\boldsymbol{X}}) - \hat{h}({\boldsymbol{X}})\right)^2\right] + {\rm Var}(E) \\ \ & = \left(h({\boldsymbol{X}}) - {\rm E}[\hat{h}({\boldsymbol{X}})]\right)^2 + {\rm Var}\left(\hat{h}({\boldsymbol{X}})\right)^2 + {\rm Var}(E) \\ \ & = \mbox{bias}^2 + \mbox{variance} + {\rm Var}(E) \end{aligned} \]