64 Logistic Regression

64.1 Goal

Logistic regression models a Bernoulli distributed response variable in terms of linear combinations of explanatory variables.

This extends least squares regression to the case where the response variable captures a “success” or “failure” type outcome.

64.2 Bernoulli as EFD

If \(Y \sim \mbox{Bernoulli}(p)\), then its pmf is:

\[ \begin{aligned} f(y; p) & = p^{y} (1-p)^{1-y} \\ & = \exp\left\{ \log\left(\frac{p}{1-p}\right)y + \log(1-p) \right\} \end{aligned} \]

In exponential family distribution (EFD) notation,

\[ \eta(p) = \log\left(\frac{p}{1-p}\right) \equiv {\operatorname{logit}}(p), \]

\(A(\eta(p)) = \log(1 + \exp(\eta)) = \log(1-p)\), and \(y\) is the sufficient statistic.

64.3 Model

\(({\boldsymbol{X}}_1, Y_1), ({\boldsymbol{X}}_2, Y_2), \ldots, ({\boldsymbol{X}}_n, Y_n)\) are distributed so that \(Y_i | {\boldsymbol{X}}_i \sim \mbox{Bernoulli}(p_i)\), where \(\{Y_i | {\boldsymbol{X}}_i\}_{i=1}^n\) are jointly independent and

\[ {\operatorname{logit}}\left({\operatorname{E}}[Y_i | {\boldsymbol{X}}_i]\right) = \log\left( \frac{\Pr(Y_i = 1 | {\boldsymbol{X}}_i)}{\Pr(Y_i = 0 | {\boldsymbol{X}}_i)} \right) = {\boldsymbol{X}}_i {\boldsymbol{\beta}}. \]

From this it follows that

\[ p_i = \frac{\exp\left({\boldsymbol{X}}_i {\boldsymbol{\beta}}\right)}{1 + \exp\left({\boldsymbol{X}}_i {\boldsymbol{\beta}}\right)}. \]

64.4 Maximum Likelihood Estimation

The \({\boldsymbol{\beta}}\) are estimated from the MLE calculated from:

\[ \begin{aligned} \ell\left({\boldsymbol{\beta}}; {\boldsymbol{y}}, {\boldsymbol{X}}\right) & = \sum_{i=1}^n \log\left(\frac{p_i}{1-p_i}\right) y_i + \log(1-p_i) \\ & = \sum_{i=1}^n ({\boldsymbol{x}}_i {\boldsymbol{\beta}}) y_i - \log\left(1 + \exp\left({\boldsymbol{x}}_i {\boldsymbol{\beta}}\right) \right) \end{aligned} \]

64.5 Iteratively Reweighted Least Squares

  1. Initialize \({\boldsymbol{\beta}}^{(1)}\).

  2. For each iteration \(t=1, 2, \ldots\), set \[ p_i^{(t)} = {\operatorname{logit}}^{-1}\left( {\boldsymbol{x}}_i {\boldsymbol{\beta}}^{(t)} \right), \ \ \ \ z_i^{(t)} = {\operatorname{logit}}\left(p_i^{(t)}\right) + \frac{y_i - p_i^{(t)}}{p_i^{(t)}(1-p_i^{(t)})} \] and let \({\boldsymbol{z}}^{(t)} = \left\{z_i^{(t)}\right\}_{i=1}^n\).

  3. Form \(n \times n\) diagonal matrix \(\boldsymbol{W}^{(t)}\) with \((i, i)\) entry equal to \(p_i^{(t)}(1-p_i^{(t)})\).

  4. Obtain \({\boldsymbol{\beta}}^{(t+1)}\) by performing the wieghted least squares regression (see GLS from earlier)

\[ {\boldsymbol{\beta}}^{(t+1)} = \left({\boldsymbol{X}}^T \boldsymbol{W}^{(t)} {\boldsymbol{X}}\right)^{-1} {\boldsymbol{X}}^T \boldsymbol{W}^{(t)} {\boldsymbol{z}}^{(t)}. \]

  1. Iterate Steps 2-4 over \(t=1, 2, 3, \ldots\) until convergence, setting \(\hat{{\boldsymbol{\beta}}} = {\boldsymbol{\beta}}^{(\infty)}\).

64.6 GLMs

For exponential family distribution response variables, the generalized linear model is

\[ \eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = {\boldsymbol{X}}{\boldsymbol{\beta}}\]

where \(\eta(\theta)\) is function of the expected value \(\theta\) into the natural parameter. This is called the canonical link function in the GLM setting.

The iteratively reweighted least squares algorithm presented above for calculating (local) maximum likelihood estimates of \({\boldsymbol{\beta}}\) has a generalization to a large class of exponential family distribution response vairables.