41 EM Algorithm
41.1 Rationale
For any likelihood function, \(L({\boldsymbol{\theta}}; {\boldsymbol{x}}) = f({\boldsymbol{x}}; {\boldsymbol{\theta}})\), there is an abundance of optimization methods that can be used to find the MLE or MAP. However:
- Optimization methods can be messy to implement
- There may be probabilistic structure that we can use to simplify the optimization process and also provide theoretical guarantees on its convergence
- Optimization isn’t necessarily the only goal, but one may also be interested in point estimates of the latent variable values
41.2 Requirement
The expectation-maximization (EM) algorithm allows us to calculate MLEs and MAPs when certain geometric properties are satisfied in the probabilistic model.
In order for the EM algorithm to be a practical approach, then we should have a latent variable model \(f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}})\) that is used to do inference on \(f({\boldsymbol{x}}; {\boldsymbol{\theta}})\) or \(f({\boldsymbol{\theta}}| {\boldsymbol{x}})\).
Note: Sometimes \(({\boldsymbol{x}}, {\boldsymbol{z}})\) is called the complete data and \({\boldsymbol{x}}\) is called the observed data when we are using the EM as a method for dealing with missing data.
41.3 The Algorithm
Choose initial value \({\boldsymbol{\theta}}^{(0)}\)
Calculate \(f({\boldsymbol{z}}| {\boldsymbol{x}}, {\boldsymbol{\theta}}^{(t)})\)
Calculate \[Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = {\operatorname{E}}_{{\boldsymbol{Z}}|{\boldsymbol{X}}={\boldsymbol{x}}}\left[\log f({\boldsymbol{x}}, {\boldsymbol{Z}}; {\boldsymbol{\theta}}); {\boldsymbol{\theta}}^{(t)}\right]\]
Set \[{\boldsymbol{\theta}}^{(t+1)} = {\text{argmax}}_{{\boldsymbol{\theta}}} Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})\]
Iterate until convergence and set \(\widehat{{\boldsymbol{\theta}}} = {\boldsymbol{\theta}}^{(\infty)}\)
41.4 \(Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})\)
Continuous \({\boldsymbol{Z}}\):
\[Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = \int \log f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}}) f({\boldsymbol{z}}| {\boldsymbol{x}}; {\boldsymbol{\theta}}^{(t)}) d{\boldsymbol{z}}\]
Discrete \({\boldsymbol{Z}}\):
\[Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = \sum_{{\boldsymbol{z}}} \log f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}}) f({\boldsymbol{z}}| {\boldsymbol{x}}; {\boldsymbol{\theta}}^{(t)})\]
41.5 EM for MAP
If we wish to calculate the MAP we replace \(Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})\) with
\[Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = {\operatorname{E}}_{{\boldsymbol{Z}}|{\boldsymbol{X}}={\boldsymbol{x}}}\left[\log f({\boldsymbol{x}}, {\boldsymbol{Z}}; {\boldsymbol{\theta}}); {\boldsymbol{\theta}}^{(t)}\right] + \log f({\boldsymbol{\theta}})\]
where \(f({\boldsymbol{\theta}})\) is the prior distribution on \({\boldsymbol{\theta}}\).