Econometric Theory/Asymptotic Convergence

< Econometric Theory


Asymptotic ConvergenceEdit

Modes of ConvergenceEdit

Convergence in ProbabilityEdit

Convergence in probability is going to be a very useful tool for deriving asymptotic distributions later on in this book. Alongside convergence in distribution it will be the most commonly seen mode of convergence.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} converges in probability to X_{ } if:

\forall \epsilon, \delta >0,
 \exists N \; \operatorname{s.t.} \; \forall n \geq N,
 \Pr \{ |X_n - X| > \delta \}< \epsilon

an equivalent statement is:

\forall \delta >0,
 \lim_{n \to \infty} \Pr \{ |X_n - X| > \delta \}=0

This will be written as either X_n \begin{matrix} \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X or \operatorname{plim} X_n = X.


X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}

We'll make an intelligent guess that this series converges in probability to the degenerate random variable \eta. So we have that:

\forall \delta >0,\; \Pr \{ |X_n - \eta| > \delta \} \leq \Pr \{ |X_n - \eta| > 0 \}= \Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix}

Therefore our definition for convergence in probability in this case is:

\forall \epsilon , \delta >0,
\exists N \quad \operatorname{s.t.} \forall n \geq N,
\Pr \{ |X_n - \eta | > \delta \} \leq \Pr \{ |X_n - \eta | > 0 \}=\Pr \{ X_n= \theta \}= \begin{matrix} \frac{1}{n} \end{matrix} < \epsilon

So for any positive values of \epsilon \in \mathbb{R} we can always find an N \in \mathbb{N} large enough so that our definition is satisfied. Therefore we have proved that X_n \begin{matrix} { }_p \\ \longrightarrow \\{ } \end{matrix} \eta.

Convergence Almost SureEdit

Almost-sure convergence has a marked similarity to convergence in probability, however the conditions for this mode of convergence are stronger; as we will see later, convergence almost surely actually implies that the sequence also converges in probability.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} converges almost surely to the random variable X if:

\forall \delta >0,
 \lim_{n \to \infty} \Pr \{ \bigcup_{m \geq n} |X_m - X| > \delta, \}=0


\Pr \{ \lim_{n \to \infty} X_n = X \}=1

Under these conditions we use the notation X_n \begin{matrix} \begin{matrix} { }_{a.s.} \\ \longrightarrow \\{ } \end{matrix} \end{matrix} X or \lim_{n \to \infty} X_n = X \operatorname{a.s.}.


Let's see if our example from the convergence in probability section also converges almost surely. Defining:

X_n = \begin{cases} \eta & 1- \begin{matrix} \frac{1}{n} \end{matrix} \\ \theta & \begin{matrix} \frac{1}{n} \end{matrix} \end{cases}

we again guess that the convergence is to \eta. Inspecting the resulting expression we see that:

\Pr \{ \lim_{n \to \infty} X_n = \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n \ne \eta \}=1- \Pr \{ \lim_{n \to \infty} X_n= \theta \} \geq 1-\lim_{n \to \infty}\begin{matrix} \frac{1}{n} \end{matrix}=1

Thereby satisfying our definition of almost-sure convergence.

Convergence in DistributionEdit

Convergence in distribution will appear very frequently in our econometric models through the use of the Central Limit Theorem. So let's define this type of convergence.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} asymptotically converges in distribution to the random variable X if F_{X_n}(\zeta ) \rightarrow F_{X}(\zeta ) for all continuity points. F_{X_n}(\zeta ) and F_{X_{}}(\zeta ) are the cumulative density functions of X_n and X respectively.

It is the distribution of the random variable that we are concerned with here. Think of a students-T distribution: as the degrees of freedom, n, increases our distribution becomes closer and closer to that of a gaussian distribution. Therefore the random variable Y_n \sim t(n) converges in distribution to the random variable Y \sim N(0,1) (n.b. we say that the random variable Y_n  \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} Y as a notational crutch, what we really should use is f_{Y_n} (\zeta )\begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} f_Y(\zeta )/


Let's consider the distribution Xn whose sample space consists of two points, 1/n and 1, with equal probability (1/2). Let X be the binomial distribution with p = 1/2. Then Xn converges in distribution to X.

The proof is simple: we ignore 0 and 1 (where the distribution of X is discontinuous) and prove that, for all other points a, \lim F_{X_n}(a) = F_X(a)\,. Since for a < 0 all Fs are 0, and for a > 1 all Fs are 1, it remains to prove the convergence for 0 < a < 1. But F_{X_n}(a) = \frac{1}{2} ([a \ge \frac{1}{n}] + [a \ge 1]) (using Iverson brackets), so for any a chose N > 1/a, and for n > N we have:

n > 1/a \rightarrow a > 1/n \rightarrow [a \ge \frac{1}{n}] = 1 \land [a \ge 1] = 0 \rightarrow F_{X_n}(a) = \frac{1}{2}\,

So the sequence F_{X_n}(a)\, converges to F_X(a)\, for all points where FX is continuous.

Convergence in R-mean SquareEdit

Convergence in R-mean square is not going to be used in this book, however for completeness the definition is provided below.


A sequence of random variables \{ X_n ; n=1,2, \cdots \} asymptotically converges in r-th mean (or in the L^r norm) to the random variable X if, for any real number r>0 and provided that E(|X_n|^r) < \infty for all n and r\geq 1,

\lim_{n\to \infty }E\left( \left\vert X_n-X\right\vert ^r\right) =0.

Cramer-Wold DeviceEdit

The Cramer-Wold device will allow us to extend our convergence techniques for random variables from scalars to vectors.


A random vector \mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} \mathbf{X} \; \iff \; {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X}_n \begin{matrix} { }_{d} \\ \longrightarrow \\{ } \end{matrix} {\mathbf{\lambda}}^{\operatorname{T}}\mathbf{X} \quad \forall \lVert \mathbf{\lambda} \rVert \ne 0.

Relationships Between Modes of ConvergenceEdit

Law of Large NumbersEdit

Central Limit TheoremEdit

Let \ X_1, X_2, X_3, ... be a sequence of random variables which are defined on the same probability space, share the same probability distribution D and are independent. Assume that both the expected value μ and the standard deviation σ of D exist and are finite.

Consider the sum \ S_n = X_1 + ... + X_n . Then the expected value of \ S_n is nμ and its standard error is σ n1/2. Furthermore, informally speaking, the distribution of Sn approaches the normal distribution N(nμ,σ2n) as n approaches ∞.

Continuous Mapping TheoremEdit

Slutsky's TheoremEdit