64 Logistic Regression

64.1 Goal

Logistic regression models a Bernoulli distributed response variable in terms of linear combinations of explanatory variables.

This extends least squares regression to the case where the response variable captures a “success” or “failure” type outcome.

64.2 Bernoulli as EFD

If $$Y \sim \mbox{Bernoulli}(p)$$, then its pmf is:

\begin{aligned} f(y; p) & = p^{y} (1-p)^{1-y} \\ & = \exp\left\{ \log\left(\frac{p}{1-p}\right)y + \log(1-p) \right\} \end{aligned}

In exponential family distribution (EFD) notation,

$\eta(p) = \log\left(\frac{p}{1-p}\right) \equiv {\operatorname{logit}}(p),$

$$A(\eta(p)) = \log(1 + \exp(\eta)) = \log(1-p)$$, and $$y$$ is the sufficient statistic.

64.3 Model

$$({\boldsymbol{X}}_1, Y_1), ({\boldsymbol{X}}_2, Y_2), \ldots, ({\boldsymbol{X}}_n, Y_n)$$ are distributed so that $$Y_i | {\boldsymbol{X}}_i \sim \mbox{Bernoulli}(p_i)$$, where $$\{Y_i | {\boldsymbol{X}}_i\}_{i=1}^n$$ are jointly independent and

${\operatorname{logit}}\left({\operatorname{E}}[Y_i | {\boldsymbol{X}}_i]\right) = \log\left( \frac{\Pr(Y_i = 1 | {\boldsymbol{X}}_i)}{\Pr(Y_i = 0 | {\boldsymbol{X}}_i)} \right) = {\boldsymbol{X}}_i {\boldsymbol{\beta}}.$

From this it follows that

$p_i = \frac{\exp\left({\boldsymbol{X}}_i {\boldsymbol{\beta}}\right)}{1 + \exp\left({\boldsymbol{X}}_i {\boldsymbol{\beta}}\right)}.$

64.4 Maximum Likelihood Estimation

The $${\boldsymbol{\beta}}$$ are estimated from the MLE calculated from:

\begin{aligned} \ell\left({\boldsymbol{\beta}}; {\boldsymbol{y}}, {\boldsymbol{X}}\right) & = \sum_{i=1}^n \log\left(\frac{p_i}{1-p_i}\right) y_i + \log(1-p_i) \\ & = \sum_{i=1}^n ({\boldsymbol{x}}_i {\boldsymbol{\beta}}) y_i - \log\left(1 + \exp\left({\boldsymbol{x}}_i {\boldsymbol{\beta}}\right) \right) \end{aligned}

64.5 Iteratively Reweighted Least Squares

1. Initialize $${\boldsymbol{\beta}}^{(1)}$$.

2. For each iteration $$t=1, 2, \ldots$$, set $p_i^{(t)} = {\operatorname{logit}}^{-1}\left( {\boldsymbol{x}}_i {\boldsymbol{\beta}}^{(t)} \right), \ \ \ \ z_i^{(t)} = {\operatorname{logit}}\left(p_i^{(t)}\right) + \frac{y_i - p_i^{(t)}}{p_i^{(t)}(1-p_i^{(t)})}$ and let $${\boldsymbol{z}}^{(t)} = \left\{z_i^{(t)}\right\}_{i=1}^n$$.

3. Form $$n \times n$$ diagonal matrix $$\boldsymbol{W}^{(t)}$$ with $$(i, i)$$ entry equal to $$p_i^{(t)}(1-p_i^{(t)})$$.

4. Obtain $${\boldsymbol{\beta}}^{(t+1)}$$ by performing the wieghted least squares regression (see GLS from earlier)

${\boldsymbol{\beta}}^{(t+1)} = \left({\boldsymbol{X}}^T \boldsymbol{W}^{(t)} {\boldsymbol{X}}\right)^{-1} {\boldsymbol{X}}^T \boldsymbol{W}^{(t)} {\boldsymbol{z}}^{(t)}.$

1. Iterate Steps 2-4 over $$t=1, 2, 3, \ldots$$ until convergence, setting $$\hat{{\boldsymbol{\beta}}} = {\boldsymbol{\beta}}^{(\infty)}$$.

64.6 GLMs

For exponential family distribution response variables, the generalized linear model is

$\eta\left({\operatorname{E}}[Y | {\boldsymbol{X}}]\right) = {\boldsymbol{X}}{\boldsymbol{\beta}}$

where $$\eta(\theta)$$ is function of the expected value $$\theta$$ into the natural parameter. This is called the canonical link function in the GLM setting.

The iteratively reweighted least squares algorithm presented above for calculating (local) maximum likelihood estimates of $${\boldsymbol{\beta}}$$ has a generalization to a large class of exponential family distribution response vairables.