What are the probabilities of families in your residential area possessing zero, one, two, three, or more than three cars?
What are the probabilities associated with different values of speed of an aircraft?
What is the probability distribution of total sales ($) of your online stores?
Well “probability distributions” is often the most frequently used and discussed area in multivariate statistical analysis (MSA). And the reason is that it is a necessary step to giving your analysis the right course of direction.
In simple terms, the concept is made of two components; the first is probability and the second is distribution. However, it is incomplete without something that is referred to as a random variable. A random variable can assume any values, and when it assumes a certain fixed value in a given range, it is called a discrete random variable, and when it assumes any value in a given range then it is called a continuous random variable.
You do an analysis and find that the probabilities of your neighborhood families possessing zero, one, two, three, or more than three cars are 0.15, 0.35, 0.25, 0.15, and 0.10 respectively.
When you list these probabilities against the respective values, it becomes probability distribution.
Since here the variable can assume five values and associated with those five values are five different probabilities, the probability distribution is a discrete probability distribution.
Variables like speed of an aircraft and sales ($) registered by an online store are continuous variables as they can assume any values, and so the probability distribution they form becomes continuous probability distribution.
Probability distribution practical implications
Based on whether your data variable is discrete or continuous, you have to deal with Poisson, Binomial, Normal or Hypergeometric distribution.
Most commonly, in testing and validating hypotheses, confirming results of statistical analysis, you leverage normal distribution. It is a distribution the shape of which resembles a hill; to describe in mathematical jargon, it is a bell-shaped curve, the tail of which continues to near the horizontal axis but never touches it – making it asymptotic.
The curve remains symmetric about the mean, meaning that the data is equally concentrated around the mean.
With continuous variables, ideally, your data should be normally distributed, because that makes the dataset ideal for further steps in the entire analytics life cycle. A normally distributed data means there are fewer outlier occurrences and you can trust your results.
In real-world applications, when data depicts the right distribution, then the application is considered to be efficacious.
For instance, while driving risk analytics, probability distributions give businesses an idea of what percentage of chance exists that the revenue generated would be less than the desired revenue goal.
Sample sizes often decide the character of your distribution and more the sample size more are the chances that the distribution would tend to be a normal distribution, something that the central limit theorem states.
The probability distribution is that area in Multivariate Statistical Analysis (MSA) that requires in-depth research. It is independent in itself but strongly interposes itself at a junction where the linkages are created in the statistical analysis lifecycle.
To assist you in your journey to strengthening the understanding of the concept, here are some resources.
We are AI ML Editorial Team. We come up with informative quality articles on AI, Data Science, and Machine Learning. If you also want to contribute, kindly get in touch with us.