34 Estimation

34.1 Assumptions

We will assume that $$(X_1, X_2, \ldots, X_n) | \theta {\; \stackrel{\text{iid}}{\sim}\;}F_{\theta}$$ with prior distribution $$\theta \sim F_{\tau}$$ unless stated otherwise. Shorthand for the former is $$\boldsymbol{X} | \theta {\; \stackrel{\text{iid}}{\sim}\;}F_{\theta}$$.

We will write the pdf or pmf of $$X$$ as $$f(x | \theta)$$ as opposed to $$f(x ; \theta)$$ because in the Bayesian framework this actually represents conditional probability.

We will write the pdf or pmf of $$\theta$$ as $$f(\theta)$$ or $$f(\theta ; \tau)$$ or $$f(\theta | \tau)$$. Always remember that prior distributions require paramater values, even if we don’t explicitly write them.

34.2 Posterior Distribution

The posterior distribution of $$\theta | \boldsymbol{X}$$ is obtained through Bayes theorem:

\begin{align*} f(\theta | \boldsymbol{x}) & = \frac{f(\boldsymbol{x} | \theta) f(\theta)}{f(\boldsymbol{x})} = \frac{f(\boldsymbol{x} | \theta) f(\theta)}{\int f(\boldsymbol{x} | \theta^*) f(\theta^*) d\theta^*} \\ & \propto L(\theta ; \boldsymbol{x}) f(\theta) \end{align*}

34.3 Posterior Expectation

A very common point estimate of $$\theta$$ in Bayesian inference is the posterior expected value:

\begin{align*} \operatorname{E}[\theta | \boldsymbol{x}] & = \int \theta f(\theta | \boldsymbol{x}) d\theta \\ & = \frac{\int \theta L(\theta ; \boldsymbol{x}) f(\theta) d\theta}{\int L(\theta ; \boldsymbol{x}) f(\theta) d\theta} \end{align*}

34.4 Posterior Interval

The Bayesian analog of the frequentist confidence interval is the $$1-\alpha$$ posterior interval, where $$C_{\ell}$$ and $$C_{u}$$ are determined so that:

$1-\alpha = \Pr(C_\ell \leq \theta \leq C_u | \boldsymbol{x})$

34.5 Maximum A Posteriori Probability

The maximum a posteriori probability (MAP) is the value (or values) of $$\theta$$ that maximize the posterior pdf or pmf:

\begin{align*} \hat{\theta}_{\text{MAP}} & = \operatorname{argmax}_\theta \Pr(\theta | \boldsymbol{x}) \\ & = \operatorname{argmax}_\theta L(\theta ; \boldsymbol{x}) f(\theta) \end{align*}

This is a frequentist-esque use of the Bayesian framework.

34.6 Loss Functions

Let $$\mathcal{L}\left(\theta, \tilde{\theta}\right)$$ be a loss function for a given estimator $$\tilde{\theta}$$. Examples are

$\mathcal{L}\left(\theta, \tilde{\theta}\right) = \left(\theta - \tilde{\theta}\right)^2 \mbox{ or } \mathcal{L}\left(\theta, \tilde{\theta}\right) = \left|\theta - \tilde{\theta}\right|.$

Note that, where the expected value is over $$f(\boldsymbol{x}; \theta)$$:

\begin{align*} \operatorname{E}\left[\left(\theta - \tilde{\theta}\right)^2\right] & = \left(\operatorname{E}\left[\tilde{\theta}\right] - \theta\right)^2 + \operatorname{Var}\left(\tilde{\theta}\right) \\ & = \mbox{bias}^2 + \mbox{variance} \end{align*}

34.7 Bayes Risk

The Bayes risk, $$R\left(\theta, \tilde{\theta}\right)$$, is the expected loss with respect to the posterior:

${\operatorname{E}}\left[ \left. \mathcal{L}\left(\theta, \tilde{\theta}\right) \right| \boldsymbol{x} \right] = \int \mathcal{L}\left(\theta, \tilde{\theta}\right) f(\theta | \boldsymbol{x}) d\theta$

34.8 Bayes Estimators

The Bayes estimator minimizes the Bayes risk.

The posterior expectation $${\operatorname{E}}\left[ \left. \theta \right| \boldsymbol{x} \right]$$ minimizes the Bayes risk of $$\mathcal{L}\left(\theta, \tilde{\theta}\right) = \left(\theta - \tilde{\theta}\right)^2$$.

The median of $$f(\theta | \boldsymbol{x})$$, calculated by $$F^{-1}_{\theta | \boldsymbol{x}}(1/2)$$, minimizes the Bayes risk of $$\mathcal{L}\left(\theta, \tilde{\theta}\right) = \left|\theta - \tilde{\theta}\right|$$.