15 Joint Distributions

15.1 Bivariate Random Variables

For a pair of rv’s $$X$$ and $$Y$$ defined on the same probability space, we can define their joint pmf or pdf. For the discrete case,

\begin{align*} f(x, y) & = \Pr(\{\omega: X(\omega) = x\} \cap \{\omega: Y(\omega) = y\}) \\ \ & = \Pr(X=x, Y=y). \end{align*}

The joint pdf is defined analogously for continuous rv’s.

15.2 Events for Bivariate RVs

Let $$A_x \times A_y \subseteq \mathbb{R} \times \mathbb{R}$$ be an event. Then $$\Pr(X \in A_x, Y \in A_y)$$ is calculated by:

\begin{align*} & \sum_{x \in A_x} \sum_{y \in A_y} f(x, y) & \mbox{(discrete)} \\ & \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx & \mbox{(continuous)} \\ & \int_{x \in A_x} \int_{y \in A_y} dF_Y(y) dF_{X}(x) & \mbox{(general)} \end{align*}

15.3 Marginal Distributions

We can calculate the marginal distribution of $$X$$ (or $$Y$$) from their joint distribution:

$f(x) = \sum_{y \in \mathcal{R}_y} f(x, y)$

$f(x) = \int_{-\infty}^{\infty} f(x, y) dy$

15.4 Independent Random Variables

Two rv’s are independent when their joint pmf or pdf factor:

$f(x,y) = f(x) f(y)$

This means, for example, in the continuous case,

\begin{align*} \Pr(X \in A_x, Y \in A_y) & = \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx \\ \ & = \int_{x \in A_x} \int_{y \in A_y} f(x) f(y) dy dx \\ \ & = \Pr(X \in A_x) \Pr(Y \in A_y) \end{align*}

15.5 Conditional Distributions

We can define the conditional distribution of $$X$$ given $$Y$$ as follows. The conditional rv $$X | Y \sim F_{X|Y}$$ with conditional pmf or pdf for $$X | Y=y$$ given by

$f(x | y) = \frac{f(x, y)}{f(y)}.$

15.6 Conditional Moments

The $$k$$th conditional moment (when it exists) is calculated by:

${\operatorname{E}}\left[X^k | Y=y\right] = \sum_{x \in \mathcal{R}_x} x^k f(x | y)$

${\operatorname{E}}\left[X^k | Y=y\right] = \int_{-\infty}^{\infty} x^k f(x | y) dx$

Note that $${\operatorname{E}}\left[X^k | Y\right]$$ is a random variable that is a function of $$Y$$ whose distribution is determined by that of $$Y$$.

15.7 Law of Total Variance

We can partition the variance of $$X$$ according to the following conditional calculations on $$Y$$:

${\operatorname{Var}}(X) = {\operatorname{Var}}({\operatorname{E}}[X | Y]) + {\operatorname{E}}[{\operatorname{Var}}(X | Y)].$

This is a useful result for partitioning variation in modeling fitting.

15.8 Covariance and Correlation

The covariance, also called the “population covariance”, measures how two rv’s covary. It is calculated in a fashion analogous to the sample covariance:

${\operatorname{Cov}}(X, Y) = \operatorname{E} \left[ (X - \operatorname{E}[X]) (Y - \operatorname{E}[Y]) \right]$

Note that $${\operatorname{Cov}}(X, X) = {\operatorname{Var}}(X)$$.

The population correlation is calculated analogously to the sample correlation:

$\operatorname{Cor}(X, Y) = \frac{{\operatorname{Cov}}(X, Y)}{\operatorname{SD}(X)\operatorname{SD}(Y)}$

15.9 Multivariate Distributions

Let $$\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T$$ be a vector of $$n$$ rv’s. We also let realized values be $$\boldsymbol{x} = (x_1, x_2, \ldots, x_n)^T$$. The joint pmf or pdf is written as

$f(\boldsymbol{x}) = f(x_1, x_2, \ldots, x_n)$

and if the rv’s are independent then

$f(\boldsymbol{x}) = \prod_{i=1}^{n} f(x_i).$

15.10 MV Expected Value

The expected value of $$\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T$$ is an $$n$$-vector:

${\operatorname{E}}[\boldsymbol{X}] = \begin{bmatrix} {\operatorname{E}}[X_1] \\ {\operatorname{E}}[X_2] \\ \vdots \\ {\operatorname{E}}[X_n] \end{bmatrix}$

15.11 MV Variance-Covariance Matrix

The variance-covariance matrix of $$\boldsymbol{X}$$ is an $$n \times n$$ matrix with $$(i, j)$$ entry equal to $${\operatorname{Cov}}(X_i, X_j)$$.

${\operatorname{Var}}(\boldsymbol{X}) = \begin{bmatrix} {\operatorname{Var}}(X_1) & {\operatorname{Cov}}(X_1, X_2) & \cdots & {\operatorname{Cov}}(X_1, X_n) \\ {\operatorname{Cov}}(X_2, X_1) & {\operatorname{Var}}(X_2) & \cdots & \vdots \\ \vdots & & \ddots & \vdots \\ {\operatorname{Cov}}(X_n, X_1) & \cdots & & {\operatorname{Var}}(X_n) \end{bmatrix}$

Often times $$\boldsymbol{\Sigma}$$ is used as the matrix of population covariances in $${\operatorname{Var}}(\boldsymbol{X})$$ so that $$\boldsymbol{\Sigma}={\operatorname{Var}}(\boldsymbol{X})$$.