• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Three Issues in Sample Size Estimates for Multilevel Models

by Karen Grace-Martin 4 Comments

If you’ve ever worked with multilevel models, you know that they are an extension of linear models. For a researcher learning them, this is both good and bad news.

The good side is that many of the concepts, calculations, and results are familiar. The down side of the extension is that everything is more complicated in multilevel models.

This includes power and sample size calculations.

If you’re not familiar with them, multilevel models are required when data are clustered. The basic idea is that each observation in the sample is not independent–those from the same cluster are associated, while observations from different clusters are not.

There are many designs with multiple observations in a cluster. Repeated measures data have multiple observations from the same subject. Randomized block studies have multiple plant measurements nested within a farm. An evaluation may have social workers clustered within an agency.

Because of the clustering, there are a few issues that come up when conducting sample size calculations for multilevel models that don’t usually come up when running calculations for simpler models.

Issue 1: Choosing an Effect

The first step in any sample size calculation is always to choose a hypothesis test. Any model tests many effects–each main effect and interaction in an ANOVA is a separate hypothesis test.

Although the point of some multilevel studies is to test random effects, usually in multilevel models the effect of interest is a fixed effect–the overall regression coefficients or mean differences.

Let’s use the example of testing the mean difference between an intervention group and a control group for our social workers.

Issue 2: Sample sizes at each level

Another issue is that there are multiple sample sizes. In planning this kind of study, you need to select a sample size at each level: how many social workers do you need per agency, and how many agencies?

An overall sample of 300 workers will have different implications for power if it is made up of 5 workers each at 60 agencies or 20 workers each at 15 agencies.

As a general rule, the sample size that matters most is the sample size at the level the effect is measured.

For example, if we can randomly assign each social worker to one of the intervention groups, so the effect of interest is at the social worker level, the most important sample size is the overall number of social workers in the sample–the 300. It doesn’t matter much how many agencies they came from.

However, depending on the nature of the intervention, there are often design and practical issues with assigning people from the same agency to different conditions.

From a design perspective, it may be impossible to assign people from the same agency to different conditions if they will influence each other.

From a practical perspective, it may be necessary to apply the condition to the entire group at once (as in a training).

In either case, it may be necessary to assign groups at the agency level, making our effect of interest, group comparison, at the agency level.

This means that the number of agencies has more of an effect of the power of this test than the number of workers per agency. So having 60 agencies with only 5 people each will give you more power than 20 agencies, even if the total number of people in the sample are the same.

The difference can have large time and cost implications. In many studies, adding more social workers per agency has a marginal cost to the time and budget. The big cost is recruiting and administering the training for each agency.

Issue 3: Estimate more parameters

The fourth step in any sample size calculation is to obtain reasonably accurate measures of the other parameters that are used in the statistical test.

This always includes standard deviation, but can also include others, like the correlation among multiple predictors. These estimates need to come from previous research or a pilot study.

In multilevel models, you need to also estimate the Intra-Class Correlation, or ICC.

The ICC is a measure of how correlated observations are within a cluster. You can think of it as a measure of how much non-unique information there is in each observation.

If the social workers at each agency respond in similar ways (high ICC), adding another worker from an agency doesn’t add a lot of new information about the effect you’re testing.

On the other hand, if the clustering isn’t having a big effect on responses, so workers from the same agency aren’t very similar, then adding more workers to your sample from a single agency has a bigger impact on power.

So although there are more pieces of information to include, the steps and the ways of thinking about the issues are exactly the same as they are in any sample size estimate.

Bookmark and Share

tn_mdWant to get up to speed on the meaning and logic of power, sample size, and how to calculate estimates? Check out our on-demand workshop, Calculating Power and Sample Size.

Random Intercept and Random Slope Models
Get started with the two building blocks of mixed models and see how understanding them makes these tough models much clearer.

Tagged With: Intraclass Correlation Coefficient, multilevel model, Sample Size Calculations

Related Posts

  • Sample Size Estimates for Multilevel Randomized Trials
  • Multilevel, Hierarchical, and Mixed Models–Questions about Terminology
  • Covariance Matrices, Covariance Structures, and Bears, Oh My!
  • Concepts in Linear Regression you need to know before learning Multilevel Models

Reader Interactions

Comments

  1. Maxime says

    February 11, 2021 at 7:23 am

    Hi,

    Very interesting introduction !

    Do you have a reference for the following sentence: “As a general rule, the sample size that matters most is the sample size at the level the effect is measured.”

    Thank you in advance.
    Maxime

    Reply
  2. Matt Jans says

    June 19, 2018 at 7:34 am

    Excellent introduction! Can you recommend a program (or Excel template) for calculating power scenarios for MLM? Thanks!

    Reply
    • Karen Grace-Martin says

      October 26, 2018 at 5:14 pm

      Hi Matt,

      Unfortunately, there aren’t a lot of choices. For very specific MLMs, you can use GLIMMPSE or Optimal Design (both free, just google them) software. Both are very limited though. I’ve found I always end up having to use simulations. We had a recent webinar on how to do this: https://www.theanalysisfactor.com/august-2018-power-analysis-and-sample-size-determination-using-simulation/

      Reply
  3. El Samuels says

    January 16, 2018 at 10:45 pm

    Great post; thank you.

    It makes me think of another common concern with sample size–when they are unequal.

    I know that one of the advantages of using multilevel models is their tolerance to heterogeneity of variances between groups–and unequal sample sizes can cause heterogeneous variance.

    But _how_ tolerant they are? I’ve looked around, but can’t find good guidelines or suggestions for handling unequal sample sizes in multilevel models.

    For example, how unequal is too unequal? Does it affect other assumptions or tests, e.g., does having more of the variance–homo- or heterogeneous–come from one group and not the other affect interpreting results?

    Thanks

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT