Research IX) CLT – Central limit theorem

In probability theory, the central limit theorem (CLT) establishes that, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.

This theorem is a key concept in probability theory, because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

The central limit theorem has a number of variants. In its common form, the random variables must be identically distributed. In variants, convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations, given that they comply with certain conditions.

The earliest version of this theorem, that the normal distribution may be used as an approximation to the binomial distribution, is now known as the de Moivre-Laplace theorem.

We’ll now try to list different approaches to the CLT.

Classical CLT

Let ${X 1, \dots, X n}$ be a random sample of size $n$ — that is, a sequence of independent and identically distributed random variables drawn from distributions of expected values given by $µ$ and finite variances given by $σ 2$ . Suppose we are interested in the sample average of these random variables.

$S_{n}:={\frac {X_{1}+\cdots +X_{n}}{n}}$

By the law of large numbers, the sample averages converge in probability to the expected value $µ$ as $n \to \infty$ .

The classical central limit theorem describes the size and the distributional form of the stochastic fluctuations around the deterministic number $µ$ during this convergence.

More precisely, it states that as $n$ gets larger, the distribution of the difference between the sample average $S n$ and its limit $µ$ approximates the normal distribution with mean 0 and variance $σ 2$ . For large enough $n$ , the distribution of $S n$ is close to the normal distribution with mean $µ$ and variance $σ 2 / n$ .

CLT under weak dependence

A useful generalization of a sequence of independent, identically distributed random variables is a mixing random process in discrete time; this means that random variables that are temporally far apart from one another, are nearly independent. Several kinds of mixing are used in probability theory, especially strong mixing (also called α-mixing) defined by $α (n) \to 0$ where $α (n)$ is so-called strong mixing coefficient.

A simplified formulation of the central limit theorem under strong mixing is:

Theorem. Suppose that $X 1, X 2, \dots$ is stationary and $α$ -mixing with $α n = O (n -5)$ and that $E(X n) = 0$ and $E((X^12) n) < \infty$ .
Denote $S n = X 1 + \dots + X n$ , then the limit

$\sigma ^{2}=\lim _{n}{\frac {\operatorname {E} \left(S_{n}^{2}\right)}{n}}$

exists, and if $σ \neq 0$ then $S n / σ \sqrt n$ converges in distribution to $N (0,1)$ .

In fact, we have $\sigma ^{2}=\operatorname {E} \left(X_{1}^{2}\right)+2\sum _{k=1}^{\infty }\operatorname {E} \left(X_{1}X_{1+k}\right),$

where the series converges absolutely.
The assumption $σ \neq 0$ cannot be omitted, since the asymptotic normality fails for $X n = Y n - Y n - 1$ where $Y n$ are another stationary sequence.

There is also a stronger version of the theorem: the assumption $E((X^12) n) < \infty$ is replaced with $E(| X n | 2 + δ) < \infty$ , and the assumption $α n = O (n -5)$ is replaced with $\sum _{n}\alpha _{n}^{\frac {\delta }{2(2+\delta )}}<\infty .$

Existence of such $δ > 0$ ensures the conclusion.

For a theorem of such fundamental importance to statistics and probability, the CTL has a remarkably simple proof using characteristic functions. It is similar to the proof of the (weak) law of large numbers.
We already talked about the law of large numbers (LLN). Soon, we’ll talk about the weak version of it too.

Meanwhile, as stated above, suppose ${X 1, \dots, X n}$ are independent and identically distributed random variables, each with mean $µ$ and finite variance $σ 2$ . The sum $X 1 + \dots + X n$ has mean $nµ$ and variance $nσ 2$ . Consider the random variable

Z_{n}\ =\ {\frac {X_{1}+\cdots +X_{n}-n\mu }{\sqrt {n\sigma ^{2}}}}\ =\ \sum _{i=1}^{n}{\frac {X_{i}-\mu }{\sqrt {n\sigma ^{2}}}}\ =\ \sum _{i=1}^{n}{\frac {1}{\sqrt {n}}}Y_{i},

where in the last step we defined the new random variables $Y i = X i - μ / σ$ , each with zero mean and unit variance ( $var(Y) = 1$ ). The characteristic function of $Z n$ is given by

\varphi _{Z_{n}}\!(t)\ =\ \varphi _{\sum _{i=1}^{n}{{\frac {1}{\sqrt {n}}}Y_{i}}}\!(t)\ =\ \varphi _{Y_{1}}\!\!\left({\frac {t}{\sqrt {n}}}\right)\varphi _{Y_{2}}\!\!\left({\frac {t}{\sqrt {n}}}\right)\cdots \varphi _{Y_{n}}\!\!\left({\frac {t}{\sqrt {n}}}\right)\ =\ \left[\varphi _{Y_{1}}\!\!\left({\frac {t}{\sqrt {n}}}\right)\right]^{n},

where in the last step we used the fact that all of the $Y i$ are identically distributed. The characteristic function of $Y 1$ is, by Taylor’s theorem,

\varphi _{Y_{1}}\!\!\left({\frac {t}{\sqrt {n}}}\right)\ =\ 1-{\frac {t^{2}}{2n}}+c{\frac {t^{3}}{6n^{\frac {3}{2}}}}+o\!\!\left({\frac {t^{3}}{n^{\frac {3}{2}}}}\right),\quad {\bigg (}{\frac {t}{\sqrt {n}}}{\bigg )}\rightarrow 0

where $c$ is a (complex) constant and $o (t 3)$ is “little $o$ notation” for some function of $t$ that goes to zero more rapidly than $t 3$ . By the limit of the exponential function ( $e x = lim(1 + x / n) n$ ), the characteristic function of $Z n$ equals

\varphi _{Z_{n}}(t)=\left(1-{\frac {t^{2}}{2n}}+c{\frac {t^{3}}{6n^{\frac {3}{2}}}}+o\left({\frac {t^{3}}{n^{\frac {3}{2}}}}\right)\right)^{n}\rightarrow e^{-{\frac {1}{2}}t^{2}},\quad n\rightarrow \infty .

Note that all of the higher order terms vanish in the limit $n \to \infty$ . The right hand side equals the characteristic function of a standard normal distribution $N (0,1)$ , which implies that the distribution of $Z n$ will approach $N (0,1)$ as $n \to \infty$ .

Therefore, the sum $X 1 + \dots + X n$ will approach that of the normal distribution $N (nµ, nσ 2)$ , and the sample average converges to the normal distribution $N (µ, σ 2 / n)$ , from which the central theorem follows. $S_{n}={\frac {X_{1}+\cdots +X_{n}}{n}}$

The law of large numbers as well as the central limit theorem are partial solutions to a general problem: “What is the limiting behaviour of $S n$ as $n$ approaches infinity?”

Suppose we have an asymptotic expansion of $f (n)$ :

f(n)=a_{1}\varphi _{1}(n)+a_{2}\varphi _{2}(n)+O{\big (}\varphi _{3}(n){\big )}\qquad (n\rightarrow \infty ).

Dividing both parts by $φ 1 (n)$ and taking the limit will produce $a 1$ , the coefficient of the highest-order term in the expansion, which represents the rate at which $f (n)$ changes in its leading term.

\lim _{n\to \infty }{\frac {f(n)}{\varphi _{1}(n)}}=a_{1}.

$So f (n)$ grows approximately as $a 1 φ 1 (n)$ . Taking the difference between $f (n)$ and its approximation and then dividing by the next term in the expansion, we arrive at a more refined statement about $f (n)$ :

\lim _{n\to \infty }{\frac {f(n)-a_{1}\varphi _{1}(n)}{\varphi _{2}(n)}}=a_{2}.

Here one can say that the difference between the function and its approximation grows approximately as $a 2 φ 2 (n)$ . The idea is that dividing the function by appropriate normalizing functions, and looking at the limiting behavior of the result, can tell us much about the limiting behavior of the original function itself.

Informally, when the sum $S n$ of independent identically distributed random variables $X 1, \dots, X n$ is studied in classical probability theory, if each $X i$ has finite mean $μ$ , then by the law of large numbers, $S n / n \to μ, while$ if in addition each $X i$ has finite variance $σ 2$ , then by the central limit theorem, we have that:

{\frac {S_{n}-n\mu }{\sqrt {n}}}\rightarrow \xi ,

where $ξ$ is distributed as $N (0, σ 2)$ . This provides values of the first two constants in the informal expansion $S_{n}\approx \mu n+\xi {\sqrt {n}}.$

In the case where the $X i$ do not have finite mean or variance, convergence of the shifted and rescaled sum can also occur with different centering and scaling factors: ${\frac {S_{n}-a_{n}}{b_{n}}}\rightarrow \Xi ,$ or informally: $S_{n}\approx a_{n}+\Xi b_{n}.$

researches

Rambling things

Research IX) CLT – Central limit theorem

Classical CLT

CLT under weak dependence

Lascia un commento Cancella risposta

Classical CLT

CLT under weak dependence

Condividi:

Correlati

Lascia un commento Cancella risposta