21 Exponential Family Distributions

21.1 Rationale

Exponential family distributions (EFDs) provide a generalized parameterization and form of a very large class of distributions used in inference. For example, Binomia, Poisson, Exponential, Normal, Multinomial, MVN, and Dirichlet are all EFDs.

The generalized form provides generally applicable formulas for moments, estimators, etc.

EFDs also facilitate developing general algorithms for model fitting.

21.2 Definition

If \(X\) follows an EFD then it has pdf of the form

\[f(x ; \boldsymbol{\theta}) = h(x) \exp \left\{ \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x) - A(\boldsymbol{\eta}) \right\} \]

where \(\boldsymbol{\theta}\) is a vector of parameters, \(\{T_k(x)\}\) are sufficient statistics, \(A(\boldsymbol{\eta})\) is the cumulant generating function.

The functions \(\eta_k(\boldsymbol{\theta})\) for \(k=1, \ldots, d\) map the usual parameters to the “natural parameters”.

\(\{T_k(x)\}\) are sufficient statistics for \(\{\eta_k\}\) due to the factorization theorem.

\(A(\boldsymbol{\eta})\) is sometimes called the “log normalizer” because

\[A(\boldsymbol{\eta}) = \log \int h(x) \exp \left\{ \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x) \right\}.\]

21.3 Example: Bernoulli

\[\begin{align*} f(x ; p) & = p^x (1-p)^{1-x} \\ & = \exp \left\{ x \log(p) + (1-x) \log(1-p) \right\} \\ & = \exp \left\{ x \log\left( \frac{p}{1-p} \right) + \log(1-p) \right\} \end{align*}\]

\(\eta(p) = \log\left( \frac{p}{1-p} \right)\)

\(T(x) = x\)

\(A(\eta) = \log\left(1 + e^\eta\right)\)

21.4 Example: Normal

\[\begin{align*} f(x ; \mu, \sigma^2) & = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left\{-\frac{(x-\mu)^2}{2 \sigma^2}\right\} \\ & = \frac{1}{\sqrt{2 \pi}} \exp\left\{\frac{\mu}{\sigma^2} x - \frac{1}{2 \sigma^2} x^2 - \log(\sigma) - \frac{\mu^2}{2 \sigma^2}\right\} \end{align*}\]

\(\boldsymbol{\eta}(\mu, \sigma^2) = \left(\frac{\mu}{\sigma^2}, - \frac{1}{2 \sigma^2} \right)^T\)

\(\boldsymbol{T}(x) = \left(x, x^2\right)^T\)

\(A(\boldsymbol{\eta}) = \log(\sigma) + \frac{\mu^2}{2 \sigma^2} = -\frac{1}{2} \log(-2 \eta_2) - \frac{\eta_1^2}{4\eta_2}\)

21.5 Natural Single Parameter EFD

A natural single parameter EFD simplifies to the scenario where \(d=1\) and \(T(x) = x\):

\[f(x ; \eta) = h(x) \exp \left\{ \eta x - A(\eta) \right\} \]

21.6 Calculating Moments

\[ \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}) = {\operatorname{E}}[T_k(X)] \]

\[ \frac{\partial^2}{\partial \eta_k^2} A(\boldsymbol{\eta}) = {\operatorname{Var}}[T_k(X)] \]

21.7 Example: Normal

For \(X \sim \mbox{Normal}(\mu, \sigma^2)\),

\[{\operatorname{E}}[X] = \frac{\partial}{\partial \eta_1} A(\boldsymbol{\eta}) = -\frac{\eta_1}{2 \eta_2} = \mu,\]

\[{\operatorname{Var}}(X) = \frac{\partial^2}{\partial \eta_1^2} A(\boldsymbol{\eta}) = -\frac{1}{2 \eta_2} = \sigma^2.\]

21.8 Maximum Likelihood

Suppose \(X_1, X_2, \ldots, X_n\) are iid from some EFD. Then,

\[ \ell(\boldsymbol{\eta} ; \boldsymbol{x}) = \sum_{i=1}^n \left[ \log h(x_i) + \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x_i) - A(\boldsymbol{\eta}) \right] \]

\[ \frac{\partial}{\partial \eta_k} \ell(\boldsymbol{\eta} ; \boldsymbol{x}) = \sum_{i=1}^n T_k(x_i) - n \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}) \] Setting the second equation to 0, it follows that the MLE of \(\eta_k\) is the solution to

\[ \frac{1}{n} \sum_{i=1}^n T_k(x_i) = \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}). \]

Foundations of Applied Statistics