Flash 9

The standard deviation is a measure of how much a dataset differs from its mean; it tells us how dispersed the data are. A dataset that’s pretty much clumped around a single point would have a small standard deviation, while a dataset that’s all over the map would have a large standard deviation.

Given a sample (x1 … xN) the standard deviation is defined as the square root of the variance:stand

 

There is an accurate way to compute variance with which is guaranteed to always give positive results. This method computes a running variance, that means that the method computes the variance as the values arrive one at a time. The data do not need to be saved for a second pass.

This good way of computing variance it’s thanks to a 1962 paper by B. P. Welford.

welfordWelford’s method is a usable single-pass method for computing the variance. It can be derived by looking at the differences between the sums of squared differences for N and N-1 samples. 

This means we can compute the variance in a single pass using the following algorithm:

variance(samples):
  M := 0
  S := 0
  for k from 1 to N:
    x := samples[k]
    oldM := M
    M := M + (x-M)/k
    S := S + (x-M)*(x-oldM)
  return S/(N-1)

To know more about the algorithm click here, while if you want to know more about how to implement it, it’s possible to look at a Python welford algorithm by clicking here.

Lascia un commento