• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

5 Steps for Calculating Sample Size

by Karen Grace-Martin 16 Comments

Nearly all granting agencies require an estimate of an adequate sample size to detect the effects hypothesized in the study. But all studies are well served by estimates of sample size, as it can save a great deal on resources.

Why? Undersized studies can’t find real results, and oversized studies find even insubstantial ones. Both undersized and oversized studies waste time, energy, and money; the former by using resources without finding results, and the latter by using more resources than necessary. Both expose an unnecessary number of participants to experimental risks.

The trick is to size a study so that it is just large enough to detect an effect of scientific importance. If your effect turns out to be bigger, so much the better. But first you need to gather some information about on which to base the estimates.

Once you’ve gathered that information, you can calculate by hand using a formula found in many textbooks, use one of many specialized software packages, or hand it over to a statistician, depending on the complexity of the analysis. But regardless of which way you or your statistician calculates it, you need to first do the following 5 steps:

Step 1. Specify a hypothesis test.

Most studies have many hypotheses, but for sample size calculations, choose one to three main hypotheses. Make them explicit in terms of a null and alternative hypothesis.

Step 2. Specify the significance level of the test.

It is usually alpha = .05, but it doesn’t have to be.

Step 3. Specify the smallest effect size that is of scientific interest.

This is often the hardest step. The point here is not to specify the effect size that you expect to find or that others have found, but the smallest effect size of scientific interest.

What does that mean? Any effect size can be statistically significant with a large enough sample. Your job is to figure out at what point your colleagues will say, “So what if it is significant? It doesn’t affect anything!”

For some outcome variables, the right value is obvious; for others, not at all.

Some examples:

  • If your therapy lowered anxiety by 3%, would it actually improve a patient’s life? How big would the drop have to be?
  • If response times to the stimulus in the experimental condition were 40 ms faster than in the control condition, does that mean anything? Is a 40 ms difference meaningful? Is 20? 100?
  • If 4 fewer beetles were found per plant with the treatment than with the control, would that really affect the plant? Can 4 more beetles destroy, or even stunt a plant, or does it require 10? 20?

Step 4. Estimate the values of other parameters necessary to compute the power function.

Most statistical tests have the format of effect/standard error. We’ve chosen a value for the effect in step 3. Standard error is generally the standard deviation/n. To solve for n, which is the point of all this, we need a value for standard deviation. There are only two ways to get it.

1. The best way is to use data from a pilot study to compute standard deviation.

2. The other way is to use historical data–another study that used the same dependent variable. If you have more than one study, even better. Average their standard deviations for a more reliable estimate.

Sometimes both sources of information can be hard to come by, but if you want sample sizes that are even remotely accurate, you need one or the other.

Step 5. Specify the intended power of the test.

The power of a test is the probability of finding significance if the alternative hypothesis is true.

A power of .8 is the minimum. If it will be difficult to rerun the study or add a few more participants, a power of .9 is better. If you are applying for a grant, a power of .9 is always better.

Now Calculate.

 

Effect Size Statistics
Statistical software doesn't always give us the effect sizes we need. Learn some of the common effect size statistics and the ways to calculate them yourself.

Tagged With: Calculating Sample Size, power calculation, Sample Size Estimation

Related Posts

  • Two Types of Effect Size Statistic: Standardized and Unstandardized
  • The Effect Size: The Most Difficult Step in Calculating Sample Size Estimates
  • 5 Reasons to Run Sample Size Calculations Before Collecting Data
  • How Does the Distribution of a Population Impact the Confidence Interval?

Reader Interactions

Comments

  1. Paula J Caplan says

    November 23, 2020 at 10:45 pm

    Hello, and thank you for your article. But I don’t see an equation to use. Here is my situation: We have an N of 2,660 and are doing research on a random sample of the 2,660. We have information on 16.5% of the 2,660 but need to know if that is a representative sample. If it is not, how many more do we have to include in our random sample? Re: your five steps — we do not have an hypothesis. We are trying to gather crucial info (I won’t bore you by describing it). For step 2, p<.05 works fine. I know you said to choose an effect size, but that is not necessary or even really possible with this study. For the 4th step, I don't really have a way to come up with a SEM but will try if it is part of the appropriate equation. For step 5, I could use .8 or .9 I will be very grateful for your help.

    Reply
    • Karen Grace-Martin says

      November 24, 2020 at 12:06 pm

      Hi Paula,

      A couple points:
      First, the size of the sample isn’t the main issue in how representative it is. That’s about the selection process.
      And second, if you do not have a hypothesis, I’m guessing you’re creating a confidence interval for your descriptive statistics. You can use that instead. But you will need a standard error.

      Reply
  2. Azi says

    March 22, 2020 at 4:23 pm

    Thank you for the explanation. I have two question, first, how many responses do I need for the pilot study in order to be able to calculi the standard deviation of the pilot study? what formula do you use to finally calculate sample size? is this correct: (1.96*1.96)(SD*SD)/(error*error)?

    Reply
  3. Lyla says

    January 8, 2020 at 7:39 am

    This is an excellent article, assumptions and explanation are really so good, All your contributions are very useful for professionals and non-professionals. Thnaks a lot for sharing a awesome article, Keep on posting.

    Reply
  4. Michael says

    June 7, 2017 at 1:53 am

    Thanks for your post. Please I wanna ask, is there no specific formula for calculating sample size? I’m working on a project that had to do with parasites of fish in the wild but I seem not to know the sample size I should collect. Please help me out.
    Thank you.

    Reply
    • Karen Grace-Martin says

      November 24, 2020 at 12:01 pm

      Hi Michael,

      There are many equations. The specific one depends on the specific hypothesis test you are creating the sample for. You really want to use software here.

      Reply
  5. wycliff says

    March 17, 2017 at 9:55 am

    great

    Reply
  6. Farman Khan says

    February 3, 2017 at 1:24 pm

    I need guidance how to calculate the sample size of 330. I need the best formula which suit this number.
    thank you

    Reply
  7. Erik says

    July 26, 2016 at 1:53 pm

    What formula do I use? I have the information needed but cannot find the formula that I should use to calculate my sample size.

    Reply
  8. Silas Kabhele says

    February 16, 2014 at 3:26 pm

    What does step number three real mean?
    Silas

    Reply
    • Karen says

      March 10, 2014 at 5:35 pm

      Hi Silas,

      That’s the hardest step for researchers to wrap their heads around. Think of it this way. You could find statistical signficance for a tiny, tiny effect with a large enough sample size. But is that really meaningful? How small is the smallest *meaningful* effect?

      Reply
  9. nurilign says

    December 12, 2012 at 5:06 am

    Dear Karen, Thank you for your suggestion. I will read more based on your indication.

    thank you,
    nurilign

    Reply
  10. nurilign says

    December 1, 2012 at 11:08 am

    i got it very valuable but I have one question when I calculate sample size using Open Epi and other statistical soft wares how can I know the expected/desired difference between my hypothesis? plus for different software I got different number regardless the same input i used. please I need this answer and I hope you will tell me as usual.

    thank you.

    Reply
    • Karen says

      December 3, 2012 at 5:06 pm

      Hi Nurilign,

      Ah, that’s the hardest question to answer. You should not use the difference you expect OR the one you desire, but the smallest difference that is scientifically meaningful.

      As for using different software, this often comes from the defaults they use, the way they define effect sizes, and what they control for. It can be hard to tell which answer to use. All I can suggest it to dig deeply into the manuals for the different software and make sure you really understand what assumptions they’re making. Choose the one that seems more valid.

      Karen

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.