33 Bayesian Inference

33.1 Frequentist Probability

The inference framework we have covered so far uses a frequentist intepretation of probability.

We made statements such as, “If we repeat this study over and over, the long run frequency is such that…”

33.2 Bayesian Probability

Traditional Bayesian inference is based on a different interpretation of probability, that probability is a measure of subjective belief.

We will call this “subjective Bayesian statistics.”

33.3 The Framework

A prior probability distribution is introduced for an unknown parameter, which is a probability distribution on the unknown parameter that captures one’s subjective belief about its possible values.

The posterior probability distributuon of the parameter is then calculated using Bayes theorem once data are observed. Analogs of confidence intervals and hypothesis tests can then be obtained through the posterior distribution.

33.4 An Example

Prior: \(P \sim \mbox{Uniform}(0,1)\)

Data generating distribution: \(X|P=p \sim \mbox{Binomial}(n,p)\)

Posterior pdf (via Bayes Theorem):

\[\begin{align*} f(p | X=x) & = \frac{\Pr(X=x | P=p) f(p)}{\Pr(X=x)} \\ & = \frac{\Pr(X=x | P=p) f(p)}{\int \Pr(X=x | P=p^*) f(p^*) dp^*} \end{align*}\]

33.5 Calculations

In the previous example, it is possible to analytically calculate the posterior distribution. (In the example, it is a Beta distribution with parameters that involve \(x\).) However, this is often impossible.

Bayesian inference often involves complicated and intensive calculations to numerically approximate the posterior probability distribution.

33.6 In Practice

Although the Bayesian inference framework has its roots in the subjective view of probability, in modern times this philosophical aspect is often ignored or unimportant.

When subjectivism is ignored, is this really Bayesian statistics, or is it frequentist statistics that includes a probability model on the unknown parameter(s) that employes Bayes Theorem?

Bayesian inference is often used because it provides a flexible and sometimes superior model for real world problems. But the interpretation and evaluation are often tacitly frequentist.

There are very few pure subjective Bayesians working in the natural sciences or in technology industries.

33.7 Goal

Suppose we model \((X_1, X_2, \ldots, X_n) | \theta \ {\; \stackrel{\text{iid}}{\sim}\;}\ F_{\theta}\) with prior distribution \(\theta \sim F_{\tau}\) where it should be noted that \(\theta\) also depends on (possibly unknown or subjective) parameter(s) \(\tau\).

The ultimate goal is to determine the posterior distribution of \(\theta | \boldsymbol{X}\) through Bayes theorem:

\[ f(\theta | \boldsymbol{X}) = \frac{f(\boldsymbol{X} | \theta) f(\theta)}{f(\boldsymbol{X})} = \frac{f(\boldsymbol{X} | \theta) f(\theta)}{\int f(\boldsymbol{X} | \theta^*) f(\theta^*) d\theta^*}. \]

If there is a true fixed value of \(\theta\), then a well-behaved model should be so that \(f(\theta | \boldsymbol{X})\) concentrates around this fixed value as \(n \rightarrow \infty\).

33.8 Advantages

  • Statements on measures of uncertainty and inference are easier to make
  • Often superior numerical stability to the estimates
  • Data across studies or multiple samples easier to combine (e.g., how to combine frequentist p-values?)
  • High-dimensional inference works especially well in a Bayesian framework

33.9 Computation

Bayesian inference can be particularly computationally intensive. The challenge is usually in calculating the denominator of the right hand side of Bayes thereom, \(f(\boldsymbol{X})\):

\[ f(\theta | \boldsymbol{X}) = \frac{f(\boldsymbol{X} | \theta) f(\theta)}{f(\boldsymbol{X})} \]

Markov chain Monte Carlo methods and variational inference methods are particularly popular for dealing with the numerical challenges of obtain good estimates of the posterior distribution.