In this quick in-depth analysis we’ll show how Markov’s inequality somehow implies the weak law of large numbers.
For starters, Markov’s inequality is stated as
We’ll show the proof as follow:

It’s important to notice how the term Markov’s inequality may also refer to Chebyshev’s inequality, especially in the context of analysis. Chebyshev’s inequality is usually stated for random variables, but can be generalized to a statement about measure spaces.
Let X (integrable) be a random variable with finite expected value μ and finite non-zero variance σ2. Then for any real number k > 0,
Only the case is useful. When
the right hand side
and the inequality is trivial as all probabilities are ≤ 1.
As an example, using shows that the probability that values lie outside the interval
does not exceed
.
Markov’s inequality states that for any real-valued random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev’s inequality is to apply Markov’s inequality to the random variable Y = (X − μ)2 with a = (kσ)2.
It can also be proved directly. For any event A, let IA be the indicator random variable of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then
The weak law of large numbers (also called Khintchine’s law) states that the sample average converges in probability towards the expected value:
That is to say that for any positive number ε, we have the following limit:
- Interpreting this result, the weak law essentially states that for any non-zero margin specified, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value within the margin, no matter how small that specified margin is.
- Convergence in probability is also called weak convergence of random variables. This version is called the weak law because random variables may converge weakly (in probability) as above without (almost surely) converging strongly as below.
As mentioned earlier, the weak law applies in the case of independent identically distributed random variables having an expected value. But it also applies in some other cases.
For example, the variance may be different for each random variable in the series, keeping the expected value constant.If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.
In fact, Chebyshev’s proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity. As an example, assume that each random variable in the series follows a Gaussian distribution with mean zero, but with variance equal to
At each stage, the average will be normally distributed (since it is the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is asymptotic to
.
The variance of the average is therefore asymptotic to
and goes to zero.
This shows how the Markov’s inequality has led us to the weak law of large numbers.
