An agricultural product manufacturing company develops a chemical to improve crop level, irrespective of temperature increase. The efficacy of the chemical would be proved if the mean yield across different temperatures remains the same. Surrounding this are two hypotheses: the null hypothesis that states – yes the mean yields remain the same and the alternative hypothesis – the mean yields differ across different temperatures. How is this to be validated that the mean yield, indeed, remains unaffected by temperature fluctuations?
The answer lies in ANOVA i.e. analysis of variance, a statistical technique that deploys a series of mechanisms to deal with problems as the one discussed above.
When the problem is translated into a statistical problem to be addressed using ANOVA, it gets the following form.
First, two hypotheses are defined. These have been statistically represented as below:
H0: μ1 = μ2 = μ3 =…… = μn
Ha: Not all population means are equal
Using ANOVA, it is determined if the observed differences across the n samples are large enough to reject H0. Of course, this is a long process, requiring assessments of various metrics that form part of ANOVA.
What does it mean when H0 is rejected?
Rejection of H0 doesn’t mean that all population means are not same. However, it means that means of at least two different populations have different values.
What does “variance” indicate in Analysis of variance (ANOVA)?
One might wonder, why the term Analysis of “variance,” when in reality we are dealing with means. Well, let’s see how this works.
When ANOVA says means for five populations are equal, then the five sample means are supposed to be in close range. In fact, when the closer the five sample means, the more is the evidence of the population means being equal. As against this, the more the five sample means differ, the stronger the evidence that the population means are not equal. Cannot we say that when the variability amongst the sample means is negligible, H0 becomes valid? Yes, that is true. Similarly, when the variability amongst the sample means is high, Ha becomes valid.
What are the assumptions of ANOVA?
Like every other statistical technique, analysis of variance (ANOVA), too, is without its assumptions. These are:
- For each of the populations, the response variable needs to follow the normal distribution.
- For the response variable, the variance i.e. σ2 remains the same across all populations.
- An observation exhibits an independent relationship with every other observation.
Statistical significance in ANOVA
If any of the populations differ significantly from the overall group mean, then a statistically significant result is obtained. The metric that allows us to measure if significant differences exist amongst different populations is F statistic. Statistically, F statistic is the ratio of the mean of the sum of squares to the mean square error. When F statistic is higher than the critical value, then the difference amongst different populations is considered statistically significant.
What we discussed above is just a high-level discussion on ANOVA. Understanding the extensiveness of the concept, mentioned below are some resources which will give you a good insight into the technicalities in detail.
We are AI ML Editorial Team. We come up with informative quality articles on AI, Data Science, and Machine Learning. If you also want to contribute, kindly get in touch with us.