15 Joint Distributions

15.1 Bivariate Random Variables

For a pair of rv’s \(X\) and \(Y\) defined on the same probability space, we can define their joint pmf or pdf. For the discrete case,

\[\begin{align*} f(x, y) & = \Pr(\{\omega: X(\omega) = x\} \cap \{\omega: Y(\omega) = y\}) \\ \ & = \Pr(X=x, Y=y). \end{align*}\]

The joint pdf is defined analogously for continuous rv’s.

15.2 Events for Bivariate RVs

Let \(A_x \times A_y \subseteq \mathbb{R} \times \mathbb{R}\) be an event. Then \(\Pr(X \in A_x, Y \in A_y)\) is calculated by:

\[\begin{align*} & \sum_{x \in A_x} \sum_{y \in A_y} f(x, y) & \mbox{(discrete)} \\ & \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx & \mbox{(continuous)} \\ & \int_{x \in A_x} \int_{y \in A_y} dF_Y(y) dF_{X}(x) & \mbox{(general)} \end{align*}\]

15.3 Marginal Distributions

We can calculate the marginal distribution of \(X\) (or \(Y\)) from their joint distribution:

\[f(x) = \sum_{y \in \mathcal{R}_y} f(x, y)\]

\[f(x) = \int_{-\infty}^{\infty} f(x, y) dy\]

15.4 Independent Random Variables

Two rv’s are independent when their joint pmf or pdf factor:

\[f(x,y) = f(x) f(y)\]

This means, for example, in the continuous case,

\[\begin{align*} \Pr(X \in A_x, Y \in A_y) & = \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx \\ \ & = \int_{x \in A_x} \int_{y \in A_y} f(x) f(y) dy dx \\ \ & = \Pr(X \in A_x) \Pr(Y \in A_y) \end{align*}\]

15.5 Conditional Distributions

We can define the conditional distribution of \(X\) given \(Y\) as follows. The conditional rv \(X | Y \sim F_{X|Y}\) with conditional pmf or pdf for \(X | Y=y\) given by

\[ f(x | y) = \frac{f(x, y)}{f(y)}. \]

15.6 Conditional Moments

The \(k\)th conditional moment (when it exists) is calculated by:

\[{\operatorname{E}}\left[X^k | Y=y\right] = \sum_{x \in \mathcal{R}_x} x^k f(x | y)\]

\[{\operatorname{E}}\left[X^k | Y=y\right] = \int_{-\infty}^{\infty} x^k f(x | y) dx\]

Note that \({\operatorname{E}}\left[X^k | Y\right]\) is a random variable that is a function of \(Y\) whose distribution is determined by that of \(Y\).

15.7 Law of Total Variance

We can partition the variance of \(X\) according to the following conditional calculations on \(Y\):

\[{\operatorname{Var}}(X) = {\operatorname{Var}}({\operatorname{E}}[X | Y]) + {\operatorname{E}}[{\operatorname{Var}}(X | Y)].\]

This is a useful result for partitioning variation in modeling fitting.

15.8 Covariance and Correlation

The covariance, also called the “population covariance”, measures how two rv’s covary. It is calculated in a fashion analogous to the sample covariance:

\[{\operatorname{Cov}}(X, Y) = \operatorname{E} \left[ (X - \operatorname{E}[X]) (Y - \operatorname{E}[Y]) \right]\]

Note that \({\operatorname{Cov}}(X, X) = {\operatorname{Var}}(X)\).

The population correlation is calculated analogously to the sample correlation:

\[\operatorname{Cor}(X, Y) = \frac{{\operatorname{Cov}}(X, Y)}{\operatorname{SD}(X)\operatorname{SD}(Y)}\]

15.9 Multivariate Distributions

Let \(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T\) be a vector of \(n\) rv’s. We also let realized values be \(\boldsymbol{x} = (x_1, x_2, \ldots, x_n)^T\). The joint pmf or pdf is written as

\[f(\boldsymbol{x}) = f(x_1, x_2, \ldots, x_n)\]

and if the rv’s are independent then

\[f(\boldsymbol{x}) = \prod_{i=1}^{n} f(x_i).\]

15.10 MV Expected Value

The expected value of \(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T\) is an \(n\)-vector:

\[{\operatorname{E}}[\boldsymbol{X}] = \begin{bmatrix} {\operatorname{E}}[X_1] \\ {\operatorname{E}}[X_2] \\ \vdots \\ {\operatorname{E}}[X_n] \end{bmatrix} \]

15.11 MV Variance-Covariance Matrix

The variance-covariance matrix of \(\boldsymbol{X}\) is an \(n \times n\) matrix with \((i, j)\) entry equal to \({\operatorname{Cov}}(X_i, X_j)\).

\[{\operatorname{Var}}(\boldsymbol{X}) = \begin{bmatrix} {\operatorname{Var}}(X_1) & {\operatorname{Cov}}(X_1, X_2) & \cdots & {\operatorname{Cov}}(X_1, X_n) \\ {\operatorname{Cov}}(X_2, X_1) & {\operatorname{Var}}(X_2) & \cdots & \vdots \\ \vdots & & \ddots & \vdots \\ {\operatorname{Cov}}(X_n, X_1) & \cdots & & {\operatorname{Var}}(X_n) \end{bmatrix} \]

Often times \(\boldsymbol{\Sigma}\) is used as the matrix of population covariances in \({\operatorname{Var}}(\boldsymbol{X})\) so that \(\boldsymbol{\Sigma}={\operatorname{Var}}(\boldsymbol{X})\).