• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

The Intraclass Correlation Coefficient in Mixed Models

by Karen Grace-Martin 18 Comments

The ICC, or Intraclass Correlation Coefficient, can be very useful in many statistical situations, but especially so in Linear Mixed Models.

Linear Mixed Models are used when there is some sort of clustering in the data.

Two common examples of clustered data include:

  • individuals were sampled within sites (hospitals, companies, community centers, schools, etc.).  The site is the cluster.
  • repeated measures or longitudinal data where multiple observations are collected from the same individual.  The individual is the cluster in which multiple observations are grouped.

Observations from the same cluster are usually more similar to each other than observations from different clusters.  If they are, you can’t use statistical methods on these data to that assume independence, because estimates of variance, and therefore p-values, will be incorrect.

Mixed models not only account for the correlations among observations in the same cluster, they give you an estimate of that correlation.

At the right  is the equation of a very simple linear mixed model.  This has a single fixed independent variable, X, and a single random effect u.  For simplicity, I’m going to assume that X is centered on it’s mean.  This is also known as a random intercept model.

The subscripts i and j on the Y indicate that each observation j is nested within cluster i.

The u represents the random intercept for each cluster.  It’s really a residual term that measures the distance from each subject’s intercept around the overall intercept β0.  Rather than calculate an estimate for every one of those distances, the model is able to just estimate a single variance σ0.

That variance parameter estimate is the between-cluster variance.  The variance of the residuals is the within-cluster variance.  Their sum is the total variance in Y that is not explained by X.

random-intercept-graphIf there is no real correlation among observations within a cluster, the cluster means won’t differ.  It’s only when some clusters have generally high values and others have relatively low values that the values within a cluster are correlated.

In the graph on the right, each cluster has its own trajectory of a different color.  The thick black line represents the overall trajectory, averaged across all clusters.

Some clusters, like the magenta one, have all three values above the overall (black) mean.  Those values will be correlated, because they’re all relatively high.  Simultaneously, those three points have a high mean.

Likewise, the turquoise cluster has all three values below the overall (black) mean.  Again, those values will be correlated, because they’re all relatively low.  And the turquoise mean is  quite low.

And so it goes.  When some clusters have generally high values and others have generally low, (in other words, where there is consistency among a cluster’s responses), there is variation among the clusters’ means.  This is the between-cluster variance.

The within-cluster variance represents how far each point is to the cluster specific mean.  In other words, what the variation of the magenta points around the magenta trajectory?

In this graph, it’s pretty small.  Because those magenta points are all pretty high, they are quite close to their trajectory, and there is not a lot of within-cluster variation.

The ratio of the between-cluster variance to the total variance is called the Intraclass Correlation.  It tells you the proportion of the total variance in Y that is accounted for by the clustering.

It can also be interpreted as the correlation among observations within the same cluster.

 

Why ICC is useful

1. It can help you determine whether or not a linear mixed model is even necessary. If you find that the correlation is zero, that means the observations within clusters are no more similar than observations from different clusters.  Go ahead and use a simpler analysis technique.

2. It can be theoretically meaningful to understand how much of the overall variation in the response is explained simply by clustering.  For example, in a repeated measures psychological study you can tell to what extent mood is a trait (varies among people, but not within a person on different occasions) or state (varies little on average among people, but varies a lot across occasions).

3. It can also be meaningful to see how the ICC (as well as the between and within cluster variances) changes as variable are added to the model.

Fixed and Random Factors in Mixed Models
One of the hardest parts of mixed models is understanding which factors to make fixed and which to make random. Learn the important criteria to help you decide.

Tagged With: Intraclass Correlation Coefficient, mixed model

Related Posts

  • The Difference Between Random Factors and Random Effects
  • Examples for Writing up Results of Mixed Models
  • The Difference Between Crossed and Nested Factors
  • Multilevel Models with Crossed Random Effects

Reader Interactions

Comments

  1. Betregiorgis Zegeye says

    September 16, 2020 at 6:54 pm

    Thank you for helpful notes. I am confused about when to use or not use multilevel alalysis using ICC as indicator. What is the cuttpoint to run multi level analysis by see the output of ICC?

    Reply
  2. Christina says

    June 2, 2020 at 12:25 pm

    Thank you very much for this very helpful explanation.
    I have one question. I’ve seen that we can calculate the ICC using this formula:
    ICC = residual / (residual+intercept)
    and I have also seen this:
    ICC = variance of IV / (variance of IV) + (variance of error)
    I’m using SPSS and I fitted a model via: Analyse –> Mixed Models –> Generalized Linear. However, in the output, I’m not sure what Table I’m supposed to look at to get the values for residual, intercept or variance, variance of error, that will help me calculate the ICC.Thank you.

    Reply
  3. Belkis says

    March 2, 2020 at 1:15 am

    Such a great explanation! Generally, what would it mean if the ICC was higher/lower when more variables added to a model?

    Reply
  4. Jong Na says

    August 27, 2018 at 2:42 am

    Hi Karen,
    Thanks for the post. Could you check the following expression in your post correct or not?
    Cov(Y_ij , Y_ij’ ) = u_i =sigma_0^2
    I think the u_i should be replaced by Var(u_i). Thanks.

    Reply
  5. Shyam says

    January 28, 2018 at 1:19 am

    Thank you for the information.
    I am still in search of how to calculate it in SPSS. Which tool is available and what is the process to calculate intra-class correlation.

    Reply
    • belkis says

      March 2, 2020 at 1:16 am

      Hi, i believe ICC cannot be calculated in SPSS. The equation is easy enough to use:
      residual / (residual+intercept) = ICC

      Reply
  6. Alex says

    December 6, 2017 at 2:05 pm

    Thanks for this lucid explication. I’m seeking guidance about threshold values of ICC for switching from OLS to HLM when cases (in this case students) are clustered (in this case schools or colleges). As you note the choice is simple and obvious when ICC=0. But it’s rarely zero. I seem to recall seeing folks use .1 as a rule of thumb (ie ICC>.1 suggests the need to compensate for nesting). Does that sound right? Might you recommend a source for this? Many thanks…

    Reply
    • Geoff says

      February 7, 2018 at 2:46 pm

      While blanket cutoff points are always a potential problem this article covers how ICC at different levels affects AIC and other measures of model fit. 0.1 in their work is where they saw models that accounted for it and ones that didn’t pull away from each other. http://www.cibtech.org/sp.ed/jls/2015/02/185-JLS-S2-188-KIANOUSH-APPLICATION%20%20-78.pdf

      Reply
  7. Pascale says

    May 19, 2017 at 10:53 am

    Great explanation. Thank

    Reply
  8. roms says

    October 13, 2016 at 2:46 pm

    please,
    for my master thesis, I work on linear mixed effect model. i meet som problem please can you help me?
    1. I would like to know really what I will deals in my output
    2. how to interprate the ML or REML?
    3. how to interprate ICC?
    4. how to interprate random intercept and random slope?

    Reply
  9. Victor Novack says

    October 2, 2016 at 5:23 am

    Wonderful explanation! So intuitive, my students really liked it.
    I have the following question – suppose you have a continuous signal with minute to minute observations per subject (e.g. continuous glucose monitoring). Will mixed models be useful to compare, say, between two treatments?
    Sincerely
    Victor

    Reply
    • Reema says

      January 23, 2018 at 11:20 am

      I suppose what you are explaining is similar to what is called daily diary design. I think HLM would be appropriate for such design

      Reply
  10. Sam says

    July 29, 2016 at 9:27 pm

    Is there a recommended resource for how to compute ICCs in SPSS or State specifically for determining whether to use a multilevel model? Everything I am finding is on how to compute an ICC for repeated measures designs. Thanks!

    Reply
  11. Epifunky says

    December 6, 2015 at 4:51 am

    Hi Karen,

    Thanks for this accessible post. One comment: you say “If you find that the correlation is zero, that means the observations within clusters are no more similar than observations from different clusters.” But it could be that you haven’t enough power, right?

    Thanks

    Reply
    • Andrew says

      May 30, 2016 at 2:57 pm

      Sort of. You could have an ICC estimate that is “statistically significant” but is so close to zero that it’s essentially null. Better to focus on the effect size estimate and go from there. Power is less about the effect size and more about uncertainty regarding it (i.e., SEs).

      Reply
  12. saeed says

    April 24, 2014 at 8:05 am

    Thanks for the note, i m iostatistics in shiraz university(Iran) it was helpful for me.

    Reply
  13. Eleanor Carson says

    September 4, 2013 at 7:31 am

    Karen, thank you for writing the article, The Intraclass Correlation Coefficient in Mixed Models. I’m enrolled in your upcoming “Analyzing Repeated Measures Data Workshop,” and I fit the criterion of “not for you” because of my lack of knowledge of the basic principles in the workshop. I have about 3 weeks to learn as many of those basic principles as possible, before my own lectures start, because I must be able to analyze my PhD research data within a longitudinal repeated-measures design, whether I know the concepts or not. After reading your article, I was able to understand the introduction of West et al’s book on Linear Mixed Models, and I feel that there is some hope! I couldn’t understand, before. Your article was not only well-written and easy to understand, but explained the principles in a way that I can remember. Thank you!

    Reply
    • Karen says

      September 4, 2013 at 2:14 pm

      Hi Eleanor,

      Thanks for the kind note. I’m glad that you found this so helpful. We are definitely going to cover a LOT in that workshop, and much of it requires the noted background.

      West et. al is excellent, and that’s great you have it. If you haven’t seen this article, please use it as a starting point to get caught up: https://www.theanalysisfactor.com/concepts-you-need-to-understand-to-run-a-mixed-or-multilevel-model/

      Also, the bonus videos that are available on the Repeated Measures workshop site should give you a bit of background, so be sure to watch those before we begin.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • February Member Training: Choosing the Best Statistical Analysis

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT