36 Priors
36.1 Conjugate Priors
A conjugate prior is a prior distribution for a data generating distribution so that the posterior distribution is of the same type as the prior.
Conjugate priors are useful for obtaining stratightforward calculations of the posterior.
There is a systematic method for calculating conjugate priors for exponential family distributions.
36.2 Example: Beta-Bernoulli
Suppose \(\boldsymbol{X} | \mu {\; \stackrel{\text{iid}}{\sim}\;}\mbox{Bernoulli}(p)\) and suppose that \(p \sim \mbox{Beta}(\alpha, \beta)\).
\[\begin{align*} f(p | \boldsymbol{x}) & \propto L(p ; \boldsymbol{x}) f(p) \\ & = p^{\sum x_i} (1-p)^{\sum (1-x_i)} p^{\alpha - 1} (1-p)^{\beta-1} \\ & = p^{\alpha - 1 + \sum x_i} (1-p)^{\beta - 1 + \sum (1-x_i)} \\ & \propto \mbox{Beta}(\alpha + \sum x_i, \beta + \sum (1-x_i)) \end{align*}\]Therefore, \[ {\operatorname{E}}[p | \boldsymbol{x}] = \frac{\alpha + \sum x_i}{\alpha + \beta + n}. \]
36.3 Example: Normal-Normal
Suppose \(\boldsymbol{X} | \mu {\; \stackrel{\text{iid}}{\sim}\;}\mbox{Normal}(\mu, \sigma^2)\), where \(\sigma^2\) is known, and suppose that \(\mu \sim \mbox{Normal}(a, b^2)\).
Then it can be shown that \(\mu | \boldsymbol{x} \sim \mbox{Normal}({\operatorname{E}}[\mu | \boldsymbol{x}], {\operatorname{Var}}(\mu | \boldsymbol{x}))\) where
\[ {\operatorname{E}}[\mu | \boldsymbol{x}] = \frac{b^2}{\frac{\sigma^2}{n} + b^2} \overline{x} + \frac{\frac{\sigma^2}{n}}{\frac{\sigma^2}{n} + b^2} a \]
\[ {\operatorname{Var}}(\mu | \boldsymbol{x}) = \frac{b^2 \frac{\sigma^2}{n}}{\frac{\sigma^2}{n} + b^2} \]
36.4 Example: Dirichlet-Multinomial
This is a problem on Homework 3!
36.5 Example: Gamma-Poisson
This is a problem on Homework 3!
36.6 Jeffreys Prior
If we do inference based on prior \(\theta \sim F_{\tau}\) to obtain \(f(\theta | \boldsymbol{x}) \propto L(\theta; \boldsymbol{x}) f(\theta)\), it follows that this inference may not be invariant to transformations of \(\theta\), such as \(\eta = g(\theta)\).
If we utilize a Jeffreys prior, which means it is such that
\[f(\theta) \propto \sqrt{I(\theta)}\]
then the prior will be invariant to transformations of \(\theta\). We would want to show that \(f(\theta) \propto \sqrt{I(\theta)}\) implies \(f(\eta) \propto \sqrt{I(\eta)}\).
36.7 Examples: Jeffreys Priors
Normal\((\mu, \sigma^2)\), \(\sigma^2\) known: \(f(\mu) \propto 1\)
Normal\((\mu, \sigma^2)\), \(\mu\) known: \(f(\sigma) \propto \frac{1}{\sigma}\)
Poisson\((\lambda)\): \(f(\lambda) \propto \frac{1}{\sqrt{\lambda}}\)
Bernoulli\((p)\): \(f(p) \propto \frac{1}{\sqrt{p(1-p)}}\)
36.8 Improper Prior
An improper prior is a prior such that \(\int f(\theta) d\theta = \infty\). Nevertheless, sometimes it still may be the case that \(f(\theta | \boldsymbol{x}) \propto L(\theta; \boldsymbol{x}) f(\theta)\) yields a probability distribution.
Take for example the case where \(\boldsymbol{X} | \mu {\; \stackrel{\text{iid}}{\sim}\;}\mbox{Normal}(\mu, \sigma^2)\), where \(\sigma^2\) is known, and suppose that \(f(\mu) \propto 1\). Then \(\int f(\theta) d\theta = \infty\), but
\[ f(\theta | \boldsymbol{x}) \propto L(\theta; \boldsymbol{x}) f(\theta) \sim \mbox{Normal}\left(\overline{x}, \sigma^2/n\right)\]
which is a proper probability distribution.