15 Joint Distributions
15.1 Bivariate Random Variables
For a pair of rv’s \(X\) and \(Y\) defined on the same probability space, we can define their joint pmf or pdf. For the discrete case,
\[\begin{align*} f(x, y) & = \Pr(\{\omega: X(\omega) = x\} \cap \{\omega: Y(\omega) = y\}) \\ \ & = \Pr(X=x, Y=y). \end{align*}\]The joint pdf is defined analogously for continuous rv’s.
15.2 Events for Bivariate RVs
Let \(A_x \times A_y \subseteq \mathbb{R} \times \mathbb{R}\) be an event. Then \(\Pr(X \in A_x, Y \in A_y)\) is calculated by:
\[\begin{align*} & \sum_{x \in A_x} \sum_{y \in A_y} f(x, y) & \mbox{(discrete)} \\ & \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx & \mbox{(continuous)} \\ & \int_{x \in A_x} \int_{y \in A_y} dF_Y(y) dF_{X}(x) & \mbox{(general)} \end{align*}\]15.3 Marginal Distributions
We can calculate the marginal distribution of \(X\) (or \(Y\)) from their joint distribution:
\[f(x) = \sum_{y \in \mathcal{R}_y} f(x, y)\]
\[f(x) = \int_{-\infty}^{\infty} f(x, y) dy\]
15.4 Independent Random Variables
Two rv’s are independent when their joint pmf or pdf factor:
\[f(x,y) = f(x) f(y)\]
This means, for example, in the continuous case,
\[\begin{align*} \Pr(X \in A_x, Y \in A_y) & = \int_{x \in A_x} \int_{y \in A_y} f(x, y) dy dx \\ \ & = \int_{x \in A_x} \int_{y \in A_y} f(x) f(y) dy dx \\ \ & = \Pr(X \in A_x) \Pr(Y \in A_y) \end{align*}\]15.5 Conditional Distributions
We can define the conditional distribution of \(X\) given \(Y\) as follows. The conditional rv \(X | Y \sim F_{X|Y}\) with conditional pmf or pdf for \(X | Y=y\) given by
\[ f(x | y) = \frac{f(x, y)}{f(y)}. \]
15.6 Conditional Moments
The \(k\)th conditional moment (when it exists) is calculated by:
\[{\operatorname{E}}\left[X^k | Y=y\right] = \sum_{x \in \mathcal{R}_x} x^k f(x | y)\]
\[{\operatorname{E}}\left[X^k | Y=y\right] = \int_{-\infty}^{\infty} x^k f(x | y) dx\]
Note that \({\operatorname{E}}\left[X^k | Y\right]\) is a random variable that is a function of \(Y\) whose distribution is determined by that of \(Y\).
15.7 Law of Total Variance
We can partition the variance of \(X\) according to the following conditional calculations on \(Y\):
\[{\operatorname{Var}}(X) = {\operatorname{Var}}({\operatorname{E}}[X | Y]) + {\operatorname{E}}[{\operatorname{Var}}(X | Y)].\]
This is a useful result for partitioning variation in modeling fitting.
15.8 Covariance and Correlation
The covariance, also called the “population covariance”, measures how two rv’s covary. It is calculated in a fashion analogous to the sample covariance:
\[{\operatorname{Cov}}(X, Y) = \operatorname{E} \left[ (X - \operatorname{E}[X]) (Y - \operatorname{E}[Y]) \right]\]
Note that \({\operatorname{Cov}}(X, X) = {\operatorname{Var}}(X)\).
The population correlation is calculated analogously to the sample correlation:
\[\operatorname{Cor}(X, Y) = \frac{{\operatorname{Cov}}(X, Y)}{\operatorname{SD}(X)\operatorname{SD}(Y)}\]
15.9 Multivariate Distributions
Let \(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T\) be a vector of \(n\) rv’s. We also let realized values be \(\boldsymbol{x} = (x_1, x_2, \ldots, x_n)^T\). The joint pmf or pdf is written as
\[f(\boldsymbol{x}) = f(x_1, x_2, \ldots, x_n)\]
and if the rv’s are independent then
\[f(\boldsymbol{x}) = \prod_{i=1}^{n} f(x_i).\]
15.10 MV Expected Value
The expected value of \(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)^T\) is an \(n\)-vector:
\[{\operatorname{E}}[\boldsymbol{X}] = \begin{bmatrix} {\operatorname{E}}[X_1] \\ {\operatorname{E}}[X_2] \\ \vdots \\ {\operatorname{E}}[X_n] \end{bmatrix} \]
15.11 MV Variance-Covariance Matrix
The variance-covariance matrix of \(\boldsymbol{X}\) is an \(n \times n\) matrix with \((i, j)\) entry equal to \({\operatorname{Cov}}(X_i, X_j)\).
\[{\operatorname{Var}}(\boldsymbol{X}) = \begin{bmatrix} {\operatorname{Var}}(X_1) & {\operatorname{Cov}}(X_1, X_2) & \cdots & {\operatorname{Cov}}(X_1, X_n) \\ {\operatorname{Cov}}(X_2, X_1) & {\operatorname{Var}}(X_2) & \cdots & \vdots \\ \vdots & & \ddots & \vdots \\ {\operatorname{Cov}}(X_n, X_1) & \cdots & & {\operatorname{Var}}(X_n) \end{bmatrix} \]
Often times \(\boldsymbol{\Sigma}\) is used as the matrix of population covariances in \({\operatorname{Var}}(\boldsymbol{X})\) so that \(\boldsymbol{\Sigma}={\operatorname{Var}}(\boldsymbol{X})\).