The mean serving speed of John Isner is higher than most tennis players.
The mean velocity at which today’s satellite launch vehicle reaches space is much higher than those that were used a decade ago.
The median forecast of mean global temperature is higher for equators as compared to that for the Nordic countries.
We regularly encounter terms like mean and median, but then the necessity of getting into mathematics is felt when there is a need to chart a successful statistical learning journey.
The trio of mean, median, and mode is often referred to as “measures of central tendency” and are important metrics in exploratory data analysis. More insights, below:
Understanding measures of central tendency
In a real-life, variables assume numerous distinct values, and most often the aim is to get an estimate of location that concentrates most of the data. This estimate is referred to as a measure of central tendency.
While mean and average are often interchangeably used, multivariate statistical analysis (MSA) requires interpreting mean in the broader mathematical context.
Commonly, arithmetic mean which sums all elements and divides it by the element count is what we use.
However, this shouldn’t undermine the equal importance of other mean types.
Mean, however, doesn’t represent the best measurement of a central value, and hence we have median and mode as other metrics to get insight into the central tendency of a data variable.
Median in simple terms is the middle term in a dataset.
However, the data must be ranked in an ascending order to arrive at the right median value.
When there is an even number of values, the median is the mean of two middle values in the data set.
Mode is a French word that means a prevalent fashion, denoting an extremely popular style.
In statistics, the word translates into the value with the highest frequency. E.g. two products record the highest sales for the same number of months in a year.
Practical significance of measures of central tendency
Practically, mean is sensitive to subtle variations in the data. For instance, if the value set (10, 20, 30, 40, 100) mistakenly gets replaced with (10, 20, 30, 40, 1000), then see how the arithmetic mean suffers.
Mean, therefore, is not appropriate to understand centrality measure, when you have thousands and millions of values, common in today’s big data context.
Here, median proves an effective metric and helps overcome the outlier to which mean is highly vulnerable.
Since median is based on the middle values, data extremities don’t impact the metric. You can experiment with the above dataset.
The three basic measures of central tendency create pathways for next-level metrics in the context. These are
- Weighted mean: The sum of all values times a weight divided by the sum of the weights.
- Weighted median: The value such that one-half of the sum of the weights lies above and below the sorted data.
- Trimmed mean: The average of all values after dropping a fixed number of extreme values.
- Geometric mean: The nth root of the product of the n number of elements.
We are AI ML Editorial Team. We come up with informative quality articles on AI, Data Science, and Machine Learning. If you also want to contribute, kindly get in touch with us.