49 Empirical Distribution Functions

49.1 Definition

Suppose \(X_1, X_2, \ldots, X_n {\; \stackrel{\text{iid}}{\sim}\;}F\). The empirical distribution function (edf) – or empirical cumulative distribution function (ecdf) – is the distribution that puts probability \(1/n\) on each observed value \(X_i\).

Let \(1(X_i \leq y) = 1\) if \(X_i \leq y\) and \(1(X_i \leq y) = 0\) if \(X_i > y\).

\[ \mbox{Random variable: } \hat{F}_{{\boldsymbol{X}}}(y) = \frac{1}{n} \sum_{i=1}^{n} 1(X_i \leq y) \]

\[ \mbox{Observed variable: } \hat{F}_{{\boldsymbol{x}}}(y) = \frac{1}{n} \sum_{i=1}^{n} 1(x_i \leq y) \]

49.2 Example: Normal

49.3 Pointwise Convergence

Under our assumptions, by the strong law of large numbers for each \(y \in \mathbb{R}\),

\[ \hat{F}_{{\boldsymbol{X}}}(y) \stackrel{\text{a.s.}}{\longrightarrow} F(y) \]

as \(n \rightarrow \infty\).

49.4 Glivenko-Cantelli Theorem

Under our assumptions, we can get a much stronger convergence result:

\[ \sup_{y \in \mathbb{R}} \left| \hat{F}_{{\boldsymbol{X}}}(y) - F(y) \right| \stackrel{\text{a.s.}}{\longrightarrow} 0 \]

as \(n \rightarrow \infty\). Here, “sup” is short for supremum, which is a mathematical generalization of maximum.

This result says that even the worst difference between the edf and the true cdf converges with probability 1 to zero.

49.5 Dvoretzky-Kiefer-Wolfowitz (DKW) Inequality

This result gives us an upper bound on how far off the edf is from the true cdf, which allows us to construct confidence bands about the edf.

\[ \Pr\left( \sup_{y \in \mathbb{R}} \left| \hat{F}_{{\boldsymbol{X}}}(y) - F(y) \right| > \epsilon \right) \leq 2 \exp{-2 n \epsilon^2} \]

As outlined in All of Nonparametric Statistics, setting

\[\epsilon_n = \sqrt{\frac{1}{2n} \log\left(\frac{2}{\alpha}\right)}\]

\[L(y) = \max\{\hat{F}_{{\boldsymbol{X}}}(y) - \epsilon_n, 0 \}\]

\[U(y) = \min\{\hat{F}_{{\boldsymbol{X}}}(y) + \epsilon_n, 1 \}\]

guarantees that \(\Pr(L(y) \leq F(y) \leq U(y) \mbox{ for all } y) \geq 1-\alpha\).

49.6 Statistical Functionals

A statistical functional \(T(F)\) is any function of \(F\). Examples:

\(\mu(F) = \int x dF(x)\)
\(\sigma^2(F) = \int (x-\mu(F))^2 dF(x)\)
\(\text{median}(F) = F^{-1}(1/2)\)

A linear statistical functional is such that \(T(F) = \int a(x) dF(x)\).

49.7 Plug-In Estimator

A plug-in estimator of \(T(F)\) based on the edf is \(T(\hat{F}_{{\boldsymbol{X}}})\). Examples:

\(\hat{\mu} = \mu(\hat{F}_{{\boldsymbol{X}}}) = \int x \hat{F}_{{\boldsymbol{X}}}(x) = \frac{1}{n} \sum_{i=1}^n X_i\)
\(\hat{\sigma}^2 = \sigma^2(\hat{F}_{{\boldsymbol{X}}}) = \int (x-\hat{\mu})^2 \hat{F}_{{\boldsymbol{X}}}(x) = \frac{1}{n} \sum_{i=1}^n (X_i - \hat{\mu})^2\)
\(\text{median}(\hat{F}_{{\boldsymbol{X}}}) = \hat{F}_{{\boldsymbol{X}}}^{-1}(1/2)\)

49.8 EDF Standard Error

Suppose that \(T(F) = \int a(x) dF(x)\) is a linear functional. Then:

\[ \begin{aligned} \ & {\operatorname{Var}}(T(\hat{F}_{{\boldsymbol{X}}})) = \frac{1}{n^2} \sum_{i=1}^n {\operatorname{Var}}(a(X_i)) = \frac{{\operatorname{Var}}_F(a(X))}{n} \\ \ & {\operatorname{se}}(T(\hat{F}_{{\boldsymbol{X}}})) = \sqrt{\frac{{\operatorname{Var}}_F(a(X))}{n}} \\ \ & \hat{{\operatorname{se}}}(T(\hat{F}_{{\boldsymbol{X}}})) = \sqrt{\frac{{\operatorname{Var}}_{\hat{F}_{{\boldsymbol{X}}}}(a(X))}{n}} \end{aligned} \]

Note that

\[ {\operatorname{Var}}_F(a(X)) = \int (a(x) - T(F))^2 dF(x) \] because \(T(F) = \int a(x) dF(x) = {\operatorname{E}}_F[a(X)]\). Likewise,

\[ {\operatorname{Var}}_{\hat{F}_{{\boldsymbol{X}}}}(a(X)) = \frac{1}{n} \sum_{i=1}^n (a(X_i) - T(\hat{F}_{{\boldsymbol{X}}}))^2 \]

where \(T(\hat{F}_{{\boldsymbol{X}}}) = \frac{1}{n} \sum_{i=1}^n a(X_i)\).

49.9 EDF CLT

Suppose that \({\operatorname{Var}}_F(a(X)) < \infty\). Then we have the following convergences as \(n \rightarrow \infty\):

\[ \frac{{\operatorname{Var}}_{\hat{F}_{{\boldsymbol{X}}}}(a(X))}{{\operatorname{Var}}_{F}(a(X))} \stackrel{P}{\longrightarrow} 1 \mbox{ , } \frac{\hat{{\operatorname{se}}}(T(\hat{F}_{{\boldsymbol{X}}}))}{{\operatorname{se}}(T(\hat{F}_{{\boldsymbol{X}}}))} \stackrel{P}{\longrightarrow} 1 \]

\[ \frac{T(F) - T(\hat{F}_{{\boldsymbol{X}}})}{\hat{{\operatorname{se}}}(T(\hat{F}_{{\boldsymbol{X}}}))} \stackrel{D}{\longrightarrow} \mbox{Normal}(0,1) \]

The estimators are very easy to calculate on real data, so this a powerful set of results.