Research II) Befitting methods

In the first research, we’ve seen how in a scale of measurements there’re four main levels. Now, we’ll talk about three most common statistical items: the mode, the median and the mean.

The mode of a set of data values is the value that appears most often, or in other words the value that is most likely to be sampled. Similarly to mean and median, the mode is used as a way to express information about random variables and populations. Unlike mean and median however, the mode is a concept that can be applied to non-numerical values such as the brand of coffee most commonly purchased from a grocery store. To be computed, the mode requires all the four level of measurements.

The median is the value separating the higher half of a data sample, from the lower half. For a data set, it may be thought of as the “middle” value. Finding the median essentially involves finding the value in a data sample that has a physical location between the rest of the numbers, so the order of the data samples is important.
The basic advantage of the median in describing data compared to the mean (often simply described as the “average”) is that it’s not too asymmetric by extremely large or small values, and so it may give a better idea of a “typical” value. So even though the median sometimes involves the computation of a mean, when this case arises, it will involve only the two middle values, while a mean involves all the values in the data sample. To be computed, the median requires all the four level of measurements but the nominal.

The wrod mean is similarly ambiguous even in the area of mathematics. Depending on the context, whether mathematical or statistical, what is meant by the “mean” changes. So, in this context, the arithmetic mean, or simply the mean or average, is the sum of a collection of numbers divided by the number of numbers in the collection.
As said before, the word mean has a lot of different meaning. A more in-depth article on this topic can be found here. To be computed, the mean requires the interval and the ratio level of measurements.

The web mean/AVG, or the online mean can be computed by using algorithms. There are different kind of algorithms that may help our studies:

online algorithm: an algorithm that can process its inputs piece-by-piece in a serial way, such as in the order that the input is given to the algorithm, without having the whole input available from the start. Since it doesn’t know the entire input, the online input is forced to make optimal decisions every time it recieves a piece of the input, but there’s the chance those decisions may later turn out not to be optimal in the end. An example of an online algorithms is the greedy algorithm

Greedy-search-path-example — How the greedy algorithm works. It fails to reach the highest sum because of its step-by-step optimal decisions.

offline algorithm: an algorithm to which the problem data is given as a whole from the beginning. It is required to output an answer which solves the problem at hand. An example of an offline algorithm is a sorting algorithm, an algorithm that puts elements of a list in a certain order.
streaming algorithm: an algorithm used for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes. These algorithms have limited memory available to them and also limited processing time per item. Therefore, they focus on the amount of memory needed to accurately represent past inputs. These algorithms have many similarities with online algorithms since they both require decisions to be made before all data are available, but they differ on the available limited memory.
dynamic algorithm: Dynamic problems are problems stated in terms of the changing input data. In the most general form a problem in this category is usually stated as – given a class of input objects – an algorithm and data structures able to answer a certain query about a set of input objects each time the input data is modified (i.e., objects are inserted or deleted.) The overall set of computations for a dynamic problem is called a dynamic algorithm. So it’s an algorithm that focuses on the time complexity of maintaining solutions of problems with online inputs.

Among these different kind of algorithms there are at least three important algorithms on which we’d like to focus: the iterative Knuth’s algorithm, the compensated sum Kahan’s algorithm and its modification by Neumaier.

This is how the Knuth’s algorithm works:

to reduce the number of iterations Nuth suggests that the column choosing algorithm selects a column with the lowest number of 1s in it.

The compensated sum of Kahan significantly reduces the numerical error in the total obtained, adding a sequence of finite floating point numbers. This is done by keeping a separate “running compensation”, a variable used to accumulate small errors. Since the worst-case error is meant to be independent of n, a large number of values can be summed with an error that only depends on the floating point precision.
This is how the Kahan’s algorithm works:

the sum is performed with two accumulations: sum holds the sum and c accumulates the parts not assimilated in sum, to nudge the low-order part of sum the next time around.

Regarding this way of operating, the modification of the algorithm by Neumaier covers the case in which the next term to be added in c is larger in absolute value than the running sum, effectively swapping the roles of what is large and what is small in the calculation.
This is how the Neumaier’s algorithm works:

there’s a specific situation in which this edited version of the algorithm gives a more correct answer instead of the original one, and that’s a simple example in which it’s confirmed that summing $[1.0,+10^{100},1.0,-10^{100}]$ in double precision, Kahan’s algorithm yields 0.0 whereas Neumaier’s algorithm yields the correct value 2.0.

Glossary:

Algorithm:= an unambiguous specification of how to solve a class of problems.
synonyms: breakthrough, method, formula, way out.

Mean:= the sum of some numbers divided by the number of numbers in the collection.
synonyms: median, medium, middle, midpoint, standard.

Mode:= the value that is most likely to be sampled.
synonyms: technique, status, situation, method.

Median:= the value separating the higher half of a data sample, from the lower half.
synonyms: average, center,central, intermediary, origin.

researches

Rambling things

Research II) Befitting methods

Lascia un commento Cancella risposta

Condividi:

Correlati

Lascia un commento Cancella risposta