# 41 EM Algorithm

## 41.1 Rationale

For any likelihood function, $$L({\boldsymbol{\theta}}; {\boldsymbol{x}}) = f({\boldsymbol{x}}; {\boldsymbol{\theta}})$$, there is an abundance of optimization methods that can be used to find the MLE or MAP. However:

• Optimization methods can be messy to implement
• There may be probabilistic structure that we can use to simplify the optimization process and also provide theoretical guarantees on its convergence
• Optimization isn’t necessarily the only goal, but one may also be interested in point estimates of the latent variable values

## 41.2 Requirement

The expectation-maximization (EM) algorithm allows us to calculate MLEs and MAPs when certain geometric properties are satisfied in the probabilistic model.

In order for the EM algorithm to be a practical approach, then we should have a latent variable model $$f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}})$$ that is used to do inference on $$f({\boldsymbol{x}}; {\boldsymbol{\theta}})$$ or $$f({\boldsymbol{\theta}}| {\boldsymbol{x}})$$.

Note: Sometimes $$({\boldsymbol{x}}, {\boldsymbol{z}})$$ is called the complete data and $${\boldsymbol{x}}$$ is called the observed data when we are using the EM as a method for dealing with missing data.

## 41.3 The Algorithm

1. Choose initial value $${\boldsymbol{\theta}}^{(0)}$$

2. Calculate $$f({\boldsymbol{z}}| {\boldsymbol{x}}, {\boldsymbol{\theta}}^{(t)})$$

3. Calculate $Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = {\operatorname{E}}_{{\boldsymbol{Z}}|{\boldsymbol{X}}={\boldsymbol{x}}}\left[\log f({\boldsymbol{x}}, {\boldsymbol{Z}}; {\boldsymbol{\theta}}); {\boldsymbol{\theta}}^{(t)}\right]$

4. Set ${\boldsymbol{\theta}}^{(t+1)} = {\text{argmax}}_{{\boldsymbol{\theta}}} Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})$

5. Iterate until convergence and set $$\widehat{{\boldsymbol{\theta}}} = {\boldsymbol{\theta}}^{(\infty)}$$

## 41.4$$Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})$$

Continuous $${\boldsymbol{Z}}$$:

$Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = \int \log f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}}) f({\boldsymbol{z}}| {\boldsymbol{x}}; {\boldsymbol{\theta}}^{(t)}) d{\boldsymbol{z}}$

Discrete $${\boldsymbol{Z}}$$:

$Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = \sum_{{\boldsymbol{z}}} \log f({\boldsymbol{x}}, {\boldsymbol{z}}; {\boldsymbol{\theta}}) f({\boldsymbol{z}}| {\boldsymbol{x}}; {\boldsymbol{\theta}}^{(t)})$

## 41.5 EM for MAP

If we wish to calculate the MAP we replace $$Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)})$$ with

$Q({\boldsymbol{\theta}}, {\boldsymbol{\theta}}^{(t)}) = {\operatorname{E}}_{{\boldsymbol{Z}}|{\boldsymbol{X}}={\boldsymbol{x}}}\left[\log f({\boldsymbol{x}}, {\boldsymbol{Z}}; {\boldsymbol{\theta}}); {\boldsymbol{\theta}}^{(t)}\right] + \log f({\boldsymbol{\theta}})$

where $$f({\boldsymbol{\theta}})$$ is the prior distribution on $${\boldsymbol{\theta}}$$.