Nearly all granting agencies require an estimate of an adequate sample size to detect the effects hypothesized in the study. But all studies are well served by estimates of sample size, as it can save a great deal on resources.

Why? Undersized studies can’t find real results, and oversized studies find even insubstantial ones. Both undersized and oversized studies waste time, energy, and money; the former by using resources without finding results, and the latter by using more resources than necessary. Both expose an unnecessary number of participants to experimental risks.

The trick is to size a study so that it is *just* large enough to detect an effect of scientific importance. If your effect turns out to be bigger, so much the better. But first you need to gather some information about on which to base the estimates.

Once you’ve gathered that information, you can calculate by hand using a formula found in many textbooks, use one of many specialized software packages, or hand it over to a statistician, depending on the complexity of the analysis. But regardless of which way you or your statistician calculates it, you need to first do the following 5 steps:

**Step 1. Specify a hypothesis test.**

Most studies have many hypotheses, but for sample size calculations, choose one to three main hypotheses. Make them explicit in terms of a null and alternative hypothesis.

**Step 2. Specify the significance level of the test. **

It is usually alpha = .05, but it doesn’t have to be.

**Step 3. Specify the smallest effect size that is of scientific interest. **

This is often the hardest step. The point here is *not* to specify the effect size that you *expect* to find or that others have found, but the *smallest effect size of scientific interest*.

What does that mean? Any effect size can be statistically significant with a large enough sample. Your job is to figure out at what point your colleagues will say, “So what if it is significant? It doesn’t affect anything!”

For some outcome variables, the right value is obvious; for others, not at all.

Some examples:

- If your therapy lowered anxiety by 3%, would it actually improve a patient’s life? How big would the drop have to be?
- If response times to the stimulus in the experimental condition were 40 ms faster than in the control condition, does that mean anything? Is a 40 ms difference meaningful? Is 20? 100?
- If 4 fewer beetles were found per plant with the treatment than with the control, would that really affect the plant? Can 4 more beetles destroy, or even stunt a plant, or does it require 10? 20?

**Step 4. Estimate the values of other parameters necessary to compute the power function. **

Most statistical tests have the format of effect/standard error. We’ve chosen a value for the effect in step 3. Standard error is generally the standard deviation/n. To solve for n, which is the point of all this, we need a value for standard deviation. There are only two ways to get it.

1. The best way is to use data from a pilot study to compute standard deviation.

2. The other way is to use historical data–another study that used the same dependent variable. If you have more than one study, even better. Average their standard deviations for a more reliable estimate.

Sometimes both sources of information can be hard to come by, but if you want sample sizes that are even remotely accurate, you need one or the other.

**Step 5. Specify the intended power of the test.**

The power of a test is the probability of finding significance if the alternative hypothesis is true.

A power of .8 is the minimum. If it will be difficult to rerun the study or add a few more participants, a power of .9 is better. If you are applying for a grant, a power of .9 is always better.

**Now Calculate.**

Cait says

Hello, thank you for your post. If you have many hypotheses and choose 3 main hypotheses, how many power analyses do you perform? Do you report the sample size estimate for all three hypotheses?

Karen Grace-Martin says

If all three are vital to the study, then yes. Otherwise I recommend using only the #1 most important hypothesis.

Isaac says

I am in a similar situation. Sorry if my question is dumb but by this, are you saying that I can do the analysis for each primary variable/hypothesis? I tried that and afterwards, i had 3 different sample sizes for the 3 different hypothesis. Is that right?

Karen Grace-Martin says

Yes, you’ll get three different sample sizes. The conservative thing to do is pick the largest.

Paula J Caplan says

Hello, and thank you for your article. But I don’t see an equation to use. Here is my situation: We have an N of 2,660 and are doing research on a random sample of the 2,660. We have information on 16.5% of the 2,660 but need to know if that is a representative sample. If it is not, how many more do we have to include in our random sample? Re: your five steps — we do not have an hypothesis. We are trying to gather crucial info (I won’t bore you by describing it). For step 2, p<.05 works fine. I know you said to choose an effect size, but that is not necessary or even really possible with this study. For the 4th step, I don't really have a way to come up with a SEM but will try if it is part of the appropriate equation. For step 5, I could use .8 or .9 I will be very grateful for your help.

Karen Grace-Martin says

Hi Paula,

A couple points:

First, the size of the sample isn’t the main issue in how representative it is. That’s about the selection process.

And second, if you do not have a hypothesis, I’m guessing you’re creating a confidence interval for your descriptive statistics. You can use that instead. But you will need a standard error.

Azi says

Thank you for the explanation. I have two question, first, how many responses do I need for the pilot study in order to be able to calculi the standard deviation of the pilot study? what formula do you use to finally calculate sample size? is this correct: (1.96*1.96)(SD*SD)/(error*error)?

Lyla says

This is an excellent article, assumptions and explanation are really so good, All your contributions are very useful for professionals and non-professionals. Thnaks a lot for sharing a awesome article, Keep on posting.

Michael says

Thanks for your post. Please I wanna ask, is there no specific formula for calculating sample size? I’m working on a project that had to do with parasites of fish in the wild but I seem not to know the sample size I should collect. Please help me out.

Thank you.

Karen Grace-Martin says

Hi Michael,

There are many equations. The specific one depends on the specific hypothesis test you are creating the sample for. You really want to use software here.

wycliff says

great

Farman Khan says

I need guidance how to calculate the sample size of 330. I need the best formula which suit this number.

thank you

Erik says

What formula do I use? I have the information needed but cannot find the formula that I should use to calculate my sample size.

Silas Kabhele says

What does step number three real mean?

Silas

Karen says

Hi Silas,

That’s the hardest step for researchers to wrap their heads around. Think of it this way. You could find statistical signficance for a tiny, tiny effect with a large enough sample size. But is that really meaningful? How small is the smallest *meaningful* effect?

nurilign says

Dear Karen, Thank you for your suggestion. I will read more based on your indication.

thank you,

nurilign

nurilign says

i got it very valuable but I have one question when I calculate sample size using Open Epi and other statistical soft wares how can I know the expected/desired difference between my hypothesis? plus for different software I got different number regardless the same input i used. please I need this answer and I hope you will tell me as usual.

thank you.

Karen says

Hi Nurilign,

Ah, that’s the hardest question to answer. You should not use the difference you expect OR the one you desire, but the smallest difference that is scientifically meaningful.

As for using different software, this often comes from the defaults they use, the way they define effect sizes, and what they control for. It can be hard to tell which answer to use. All I can suggest it to dig deeply into the manuals for the different software and make sure you really understand what assumptions they’re making. Choose the one that seems more valid.

Karen