# 19 Population Principal Components Analysis

Suppose we have $$m$$ random variables $$X_1, X_2, \ldots, X_m$$. We wish to identify a set of weights $$w_1, w_2, \ldots, w_m$$ that maximizes

${\operatorname{Var}}\left(w_1 X_1 + w_2 X_2 + \cdots + w_m X_m \right).$

However, this is unbounded, so we need to constrain the weights. It turns out that constraining the weights so that

$\| {\boldsymbol{w}}\|_2^2 = \sum_{i=1}^m w_i^2 = 1$

is both interpretable and mathematically tractable.

Therefore we wish to maximize

${\operatorname{Var}}\left(w_1 X_1 + w_2 X_2 + \cdots + w_m X_m \right)$

subject to $$\| {\boldsymbol{w}}\|_2^2 = 1$$. Let $${\boldsymbol{\Sigma}}$$ be the $$m \times m$$ population covariance matrix of the random variables $$X_1, X_2, \ldots, X_m$$. It follows that

${\operatorname{Var}}\left(w_1 X_1 + w_2 X_2 + \cdots + w_m X_m \right) = {\boldsymbol{w}}^T {\boldsymbol{\Sigma}}{\boldsymbol{w}}.$

Using a Lagrange multiplier, we wish to maximize

${\boldsymbol{w}}^T {\boldsymbol{\Sigma}}{\boldsymbol{w}}+ \lambda({\boldsymbol{w}}^T {\boldsymbol{w}}- 1).$

Differentiating with respect to $${\boldsymbol{w}}$$ and setting to $${\boldsymbol{0}}$$, we get $${\boldsymbol{\Sigma}}{\boldsymbol{w}}- \lambda {\boldsymbol{w}}= 0$$ or

${\boldsymbol{\Sigma}}{\boldsymbol{w}}= \lambda {\boldsymbol{w}}.$

For any such $${\boldsymbol{w}}$$ and $$\lambda$$ where this holds, note that

${\operatorname{Var}}\left(w_1 X_1 + w_2 X_2 + \cdots + w_m X_m \right) = {\boldsymbol{w}}^T {\boldsymbol{\Sigma}}{\boldsymbol{w}}= \lambda$

so the variance is $$\lambda$$.

The eigendecompositon of a matrix identifies all such solutions to $${\boldsymbol{\Sigma}}{\boldsymbol{w}}= \lambda {\boldsymbol{w}}$$. Specifically, it calculates the decompositon

${\boldsymbol{\Sigma}}= {\boldsymbol{W}}{\boldsymbol{\Lambda}}{\boldsymbol{W}}^T$

where $${\boldsymbol{W}}$$ is an $$m \times m$$ orthogonal matrix and $${\boldsymbol{\Lambda}}$$ is a diagonal matrix with entries $$\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_m \geq 0$$.

The fact that $${\boldsymbol{W}}$$ is orthogonal means $${\boldsymbol{W}}{\boldsymbol{W}}^T = {\boldsymbol{W}}^T {\boldsymbol{W}}= {\boldsymbol{I}}$$.

The following therefore hold:

• For each column $$j$$ of $${\boldsymbol{W}}$$, say $${\boldsymbol{w}}_j$$, it follows that $${\boldsymbol{\Sigma}}{\boldsymbol{w}}_j = \lambda_j {\boldsymbol{w}}_j$$
• $$\| {\boldsymbol{w}}_j \|^2_2 = 1$$ and $${\boldsymbol{w}}_j^T {\boldsymbol{w}}_k = {\boldsymbol{0}}$$ for $$\lambda_j \not= \lambda_k$$
• $${\operatorname{Var}}({\boldsymbol{w}}_j^T {\boldsymbol{X}}) = \lambda_j$$
• $${\operatorname{Var}}({\boldsymbol{w}}_1^T {\boldsymbol{X}}) \geq {\operatorname{Var}}({\boldsymbol{w}}_2^T {\boldsymbol{X}}) \geq \cdots \geq {\operatorname{Var}}({\boldsymbol{w}}_m^T {\boldsymbol{X}})$$
• $${\boldsymbol{\Sigma}}= \sum_{j=1}^m \lambda_j {\boldsymbol{w}}_j {\boldsymbol{w}}_j^T$$
• For $$\lambda_j \not= \lambda_k$$, ${\operatorname{Cov}}({\boldsymbol{w}}_j^T {\boldsymbol{X}}, {\boldsymbol{w}}_k^T {\boldsymbol{X}}) = {\boldsymbol{w}}_j^T {\boldsymbol{\Sigma}}{\boldsymbol{w}}_k = \lambda_k {\boldsymbol{w}}_j^T {\boldsymbol{w}}_k = {\boldsymbol{0}}$

The $$j$$th population principal component (PC) of $$X_1, X_2, \ldots, X_m$$ is

${\boldsymbol{w}}_j^T {\boldsymbol{X}}= w_{1j} X_1 + w_{2j} X_2 + \cdots + w_{mj} X_m$

where $${\boldsymbol{w}}_j = (w_{1j}, w_{2j}, \ldots, w_{mj})^T$$ is column $$j$$ of $${\boldsymbol{W}}$$ from the eigendecomposition

${\boldsymbol{\Sigma}}= {\boldsymbol{W}}{\boldsymbol{\Lambda}}{\boldsymbol{W}}^T.$

The column $${\boldsymbol{w}}_j$$ are called the loadings of the $$j$$th principal component. The variance explained by the $$j$$th PC is $$\lambda_j$$, which is diagonal element $$j$$ of $${\boldsymbol{\Lambda}}$$.