As someone who focuses on data analysis, I don’t often have to calculate probabilities, but I always love it when I do. It brings me back to the basics I learned in stats classes.
A client asked what is the probability of choosing exactly 3 outliers in a sample of 30, if 3.5% of the population is outliers.
I got to pull out my old theoretical stats book to make sure I still remembered the pdf of a binomial distribution (I almost remembered it).
I calculated it out, but wanted to double-check that it was correct, and found this fabulous website for calculating binomial probabilities:
StatTrek
(I have to admit, the geek in me loves the name as well).
So if you ever need to calculate a binomial, or other probability, check it out.





{ 6 comments… read them below or add one }
Looks like it has a bunch of other neat stuff too: online tutorials, tables, practice exams, etc. Thanks for the great resource!
Agreed, it looks fabulous.
And no problem. Whenever I come across great resources, I love to pass them along!
Karen
I think the probability would be 0. That’s because 3.5% of 30 is slightly over 1. So, if only 1 out of the 30 in the sample is an outlier, then it’s impossible to choose 3, right?
Hi Gabe,
Almost.
The 3.5% is not of the sample, it’s of the population. So if you have a population, say of 2000, and 70 are outliers (3.5%), and you randomly sample 30, more than one observation in the sample can be an outlier.
It turns out to be a binomial probability. If each time you randomly sample an individual, the probability of an outlier is .035, that’s the probability on one trial.
Because we’re doing it 30 times, we have 30 trials. So we’re looking for the probability that the number of outliers is exactly 3 out of 30 trials.
I started to write out the equation, but it looked pretty ugly because I can’t do the equation very easily here. There is a tutorial on the binomial distribution on StatTrek here: http://stattrek.com/Lesson2/Binomial.aspx, which includes the binomial probability equation.
So for this example,
p = .035 (probability of an outlier on each trial)
x = 3 (number of outliers we’re looking for)
n=30 (total number of trials).
Calculate it out, and you’ll find the probability is small, but not quite 0.
Karen
If it’s 3.5% of a large population, you can approximate it pretty well with a Poisson, which would be about (1/6e) or about 1/16.
Hi Dennis,
True (although this population was pretty finite, actually). But the probability still comes out nearly identical.
Karen