Nearly all granting agencies require an estimate of an adequate sample size to detect the effects hypothesized in the study. But all studies are well served by estimates of sample size, as it can save a great deal on resources.

Why? Undersized studies can’t find real results, and oversized studies find even insubstantial ones. Both undersized and oversized studies waste time, energy, and money; the former by using resources without finding results, and the latter by using more resources than necessary. Both expose an unnecessary number of participants to experimental risks.

The trick is to size a study so that it is *just* large enough to detect an effect of scientific importance. If your effect turns out to be bigger, so much the better. But first you need to gather some information about on which to base the estimates.

Once you’ve gathered that information, you can calculate by hand using a formula found in many textbooks, use one of many specialized software packages, or hand it over to a statistician, depending on the complexity of the analysis. But regardless of which way you or your statistician calculates it, you need to first do the following 5 steps:

**Step 1. Specify a hypothesis test.**

Most studies have many hypotheses, but for sample size calculations, choose one to three main hypotheses. Make them explicit in terms of a null and alternative hypothesis.

**Step 2. Specify the significance level of the test. **

It is usually alpha = .05, but it doesn’t have to be.

**Step 3. Specify the smallest effect size that is of scientific interest. **

This is often the hardest step. The point here is *not* to specify the effect size that you *expect* to find or that others have found, but the *smallest effect size of scientific interest*.

What does that mean? Any effect size can be statistically significant with a large enough sample. Your job is to figure out at what point your colleagues will say, “So what if it is significant? It doesn’t affect anything!”

For some outcome variables, the right value is obvious; for others, not at all.

Some examples:

- If your therapy lowered anxiety by 3%, would it actually improve a patient’s life? How big would the drop have to be?
- If response times to the stimulus in the experimental condition were 40 ms faster than in the control condition, does that mean anything? Is a 40 ms difference meaningful? Is 20? 100?
- If 4 fewer beetles were found per plant with the treatment than with the control, would that really affect the plant? Can 4 more beetles destroy, or even stunt a plant, or does it require 10? 20?

**Step 4. Estimate the values of other parameters necessary to compute the power function. **

Most statistical tests have the format of effect/standard error. We’ve chosen a value for the effect in step 3. Standard error is generally the standard deviation/n. To solve for n, which is the point of all this, we need a value for standard deviation. There are only two ways to get it.

1. The best way is to use data from a pilot study to compute standard deviation.

2. The other way is to use historical data–another study that used the same dependent variable. If you have more than one study, even better. Average their standard deviations for a more reliable estimate.

Sometimes both sources of information can be hard to come by, but if you want sample sizes that are even remotely accurate, you need one or the other.

**Step 5. Specify the intended power of the test.**

The power of a test is the probability of finding significance if the alternative hypothesis is true.

A power of .8 is the minimum. If it will be difficult to rerun the study or add a few more participants, a power of .9 is better. If you are applying for a grant, a power of .9 is always better.

**Now Calculate.**

Azi says

Thank you for the explanation. I have two question, first, how many responses do I need for the pilot study in order to be able to calculi the standard deviation of the pilot study? what formula do you use to finally calculate sample size? is this correct: (1.96*1.96)(SD*SD)/(error*error)?

Lyla says

This is an excellent article, assumptions and explanation are really so good, All your contributions are very useful for professionals and non-professionals. Thnaks a lot for sharing a awesome article, Keep on posting.

Michael says

Thanks for your post. Please I wanna ask, is there no specific formula for calculating sample size? I’m working on a project that had to do with parasites of fish in the wild but I seem not to know the sample size I should collect. Please help me out.

Thank you.

wycliff says

great

Farman Khan says

I need guidance how to calculate the sample size of 330. I need the best formula which suit this number.

thank you

Erik says

What formula do I use? I have the information needed but cannot find the formula that I should use to calculate my sample size.

Silas Kabhele says

What does step number three real mean?

Silas

Karen says

Hi Silas,

That’s the hardest step for researchers to wrap their heads around. Think of it this way. You could find statistical signficance for a tiny, tiny effect with a large enough sample size. But is that really meaningful? How small is the smallest *meaningful* effect?

nurilign says

Dear Karen, Thank you for your suggestion. I will read more based on your indication.

thank you,

nurilign

nurilign says

i got it very valuable but I have one question when I calculate sample size using Open Epi and other statistical soft wares how can I know the expected/desired difference between my hypothesis? plus for different software I got different number regardless the same input i used. please I need this answer and I hope you will tell me as usual.

thank you.

Karen says

Hi Nurilign,

Ah, that’s the hardest question to answer. You should not use the difference you expect OR the one you desire, but the smallest difference that is scientifically meaningful.

As for using different software, this often comes from the defaults they use, the way they define effect sizes, and what they control for. It can be hard to tell which answer to use. All I can suggest it to dig deeply into the manuals for the different software and make sure you really understand what assumptions they’re making. Choose the one that seems more valid.

Karen