21 Exponential Family Distributions
21.1 Rationale
Exponential family distributions (EFDs) provide a generalized parameterization and form of a very large class of distributions used in inference. For example, Binomia, Poisson, Exponential, Normal, Multinomial, MVN, and Dirichlet are all EFDs.
The generalized form provides generally applicable formulas for moments, estimators, etc.
EFDs also facilitate developing general algorithms for model fitting.
21.2 Definition
If \(X\) follows an EFD then it has pdf of the form
\[f(x ; \boldsymbol{\theta}) = h(x) \exp \left\{ \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x) - A(\boldsymbol{\eta}) \right\} \]
where \(\boldsymbol{\theta}\) is a vector of parameters, \(\{T_k(x)\}\) are sufficient statistics, \(A(\boldsymbol{\eta})\) is the cumulant generating function.
The functions \(\eta_k(\boldsymbol{\theta})\) for \(k=1, \ldots, d\) map the usual parameters to the “natural parameters”.
\(\{T_k(x)\}\) are sufficient statistics for \(\{\eta_k\}\) due to the factorization theorem.
\(A(\boldsymbol{\eta})\) is sometimes called the “log normalizer” because
\[A(\boldsymbol{\eta}) = \log \int h(x) \exp \left\{ \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x) \right\}.\]
21.3 Example: Bernoulli
\[\begin{align*} f(x ; p) & = p^x (1-p)^{1-x} \\ & = \exp \left\{ x \log(p) + (1-x) \log(1-p) \right\} \\ & = \exp \left\{ x \log\left( \frac{p}{1-p} \right) + \log(1-p) \right\} \end{align*}\]\(\eta(p) = \log\left( \frac{p}{1-p} \right)\)
\(T(x) = x\)
\(A(\eta) = \log\left(1 + e^\eta\right)\)
21.4 Example: Normal
\[\begin{align*} f(x ; \mu, \sigma^2) & = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left\{-\frac{(x-\mu)^2}{2 \sigma^2}\right\} \\ & = \frac{1}{\sqrt{2 \pi}} \exp\left\{\frac{\mu}{\sigma^2} x - \frac{1}{2 \sigma^2} x^2 - \log(\sigma) - \frac{\mu^2}{2 \sigma^2}\right\} \end{align*}\]\(\boldsymbol{\eta}(\mu, \sigma^2) = \left(\frac{\mu}{\sigma^2}, - \frac{1}{2 \sigma^2} \right)^T\)
\(\boldsymbol{T}(x) = \left(x, x^2\right)^T\)
\(A(\boldsymbol{\eta}) = \log(\sigma) + \frac{\mu^2}{2 \sigma^2} = -\frac{1}{2} \log(-2 \eta_2) - \frac{\eta_1^2}{4\eta_2}\)
21.5 Natural Single Parameter EFD
A natural single parameter EFD simplifies to the scenario where \(d=1\) and \(T(x) = x\):
\[f(x ; \eta) = h(x) \exp \left\{ \eta x - A(\eta) \right\} \]
21.6 Calculating Moments
\[ \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}) = {\operatorname{E}}[T_k(X)] \]
\[ \frac{\partial^2}{\partial \eta_k^2} A(\boldsymbol{\eta}) = {\operatorname{Var}}[T_k(X)] \]
21.7 Example: Normal
For \(X \sim \mbox{Normal}(\mu, \sigma^2)\),
\[{\operatorname{E}}[X] = \frac{\partial}{\partial \eta_1} A(\boldsymbol{\eta}) = -\frac{\eta_1}{2 \eta_2} = \mu,\]
\[{\operatorname{Var}}(X) = \frac{\partial^2}{\partial \eta_1^2} A(\boldsymbol{\eta}) = -\frac{1}{2 \eta_2} = \sigma^2.\]
21.8 Maximum Likelihood
Suppose \(X_1, X_2, \ldots, X_n\) are iid from some EFD. Then,
\[ \ell(\boldsymbol{\eta} ; \boldsymbol{x}) = \sum_{i=1}^n \left[ \log h(x_i) + \sum_{k=1}^d \eta_k(\boldsymbol{\theta}) T_k(x_i) - A(\boldsymbol{\eta}) \right] \]
\[ \frac{\partial}{\partial \eta_k} \ell(\boldsymbol{\eta} ; \boldsymbol{x}) = \sum_{i=1}^n T_k(x_i) - n \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}) \] Setting the second equation to 0, it follows that the MLE of \(\eta_k\) is the solution to
\[ \frac{1}{n} \sum_{i=1}^n T_k(x_i) = \frac{\partial}{\partial \eta_k} A(\boldsymbol{\eta}). \]