• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Linear Mixed Models for Missing Data in Pre-Post Studies

by Karen Grace-Martin 18 Comments

In the past few months, I’ve gotten the same question from a few clients about using linear mixed models for repeated measures data.  They want to take advantage of its ability to give unbiased results in the presence of missing data.  In each case the study has two groups complete a pre-test and a post-test measure.  Both of these have a lot of missing data.

The research question is whether the groups have different improvements in the dependent variable from pre to post test.

As a typical example, say you have a study with 160 participants.

90 of them completed both the pre and the post test.

Another 48 completed only the pretest and 22 completed only the post-test.

Repeated Measures ANOVA will deal with the missing data through listwise deletion. That means keeping only the 90 people with complete data.  This causes problems with both power and bias, but bias is the bigger issue.

Another alternative is to use a Linear Mixed Model, which will use the full data set.  This is an advantage, but it’s not as big of an advantage in this design as in other studies.

The mixed model will retain the 70 people who have data for only one time point.  It will use the 48 people with pretest-only data along with the 90 people with full data to estimate the pretest mean.

Likewise, it will use the 22 people with posttest-only data along with the 90 people with full data to estimate the post-test mean.

If the data are missing at random, this will give you unbiased estimates of each of these means.

But most of the time in Pre-Post studies, the interest is in the change from pre to post across groups.

The difference in means from pre to post will be calculated based on the estimates at each time point.  But the degrees of freedom for the difference will be based only on the number of subjects who have data at both time points.

So with only two time points, if the people with one time point are no different from those with full data (creating no bias), you’re not gaining anything by keeping those 72 people in the analysis.

Compare this to a study I also saw in consulting with 5 time points.  Nearly all the participants had 4 out of the 5 observations.  The missing data was pretty random–some participants missed time 1, others, time 4, etc.  Only 6 people out of 150 had full data.  Listwise deletion created a nightmare, leaving only 6 people in the data set.

Each person contributed data to 4 means, so each mean had a pretty reasonable sample size.  Since the missingness was random, each mean was unbiased.  Each subject fully contributed data and df to many of the mean comparisons.

With more than 2 time points and data that are missing at random, each subject can contribute to some change measurements.  Keep that in mind the next time you design a study.

 

Random Intercept and Random Slope Models
Get started with the two building blocks of mixed models and see how understanding them makes these tough models much clearer.

Tagged With: ANOVA, linear mixed model, listwise deletion, Missing Data, pre-post, Repeated Measures

Related Posts

  • Six Differences Between Repeated Measures ANOVA and Linear Mixed Models
  • When Does Repeated Measures ANOVA not work for Repeated Measures Data?
  • Five Advantages of Running Repeated Measures ANOVA as a Mixed Model
  • Member Training: Missing Data

Reader Interactions

Comments

  1. Mariam says

    March 17, 2022 at 5:16 am

    Hi Karen,

    I would like to know if you know in R using lme or lmer how to specify to the software how to deal with missingness and predict the missingness.

    Reply
    • Karen Grace-Martin says

      March 17, 2022 at 10:01 am

      Mariam,
      You don’t have to do anything for this to work, at least not for the outcome variable. It’s inherent in how the model is estimated.

      Reply
  2. Daisy says

    December 24, 2021 at 6:21 pm

    This is very helpful, thanks! You mentioned, “So with only two time points, if the people with one time point are no different from those with full data(creating no bias), you’re not gaining anything by keeping those 72 people in the analysis.” May I please ask what analysis should I run to test if there is any difference between people with one-time point and those with full data?

    Reply
    • Karen Grace-Martin says

      January 7, 2022 at 12:36 pm

      Daisy, you’re still better off with this analysis. If there is a difference, you want to account for it.

      Reply
  3. Sandra says

    July 9, 2021 at 9:36 am

    Hi,

    Thank you for this clear explanation. I am still slightly unclear on one point – when you say “The difference in means from pre to post will be calculated based on the estimates at each time point. But the degrees of freedom for the difference will be based only on the number of subjects who have data at both time points.”

    Do you mean that the results of the model do take all the data into account (including maximum likelihood for missing data) – but when you look at the degrees of freedom this won’t be reflected, since this will only be based on the cases that have data at both timepoints?

    I’m trying to work out whether it makes more sense to impute missing values across a dataset before feeding this into a mixed model – or whether to just do the analysis using a mixed model with the missing data included. Do you have any guidance on this?

    Thanks for your help!

    Reply
    • Karen Grace-Martin says

      July 16, 2021 at 11:37 am

      Yes, the results of the model take into account all the data.

      If I am missing data only on level one variables (including the outcome) I would not impute and instead would rely on the mixed model’s maximum likelihood.

      The missingness mechanism assumption is MAR for both, so there is no advantage there. The one exception might be if there are auxiliary variables that could help predict the missing values in the multiple imputation.

      Reply
  4. Emma says

    May 5, 2021 at 9:16 am

    Hi there,

    Do you have any references that support linear mixed models can handle missing outcome data?

    Reply
    • Karen Grace-Martin says

      November 30, 2021 at 4:32 pm

      Hi Emma,

      Just about every book on linear mixed models talks about missing values.

      Reply
  5. Nathan says

    April 8, 2021 at 2:23 am

    Hi Karen,

    This explanation really helped me, so thanks!

    If I was interested in better understanding the justification for a mixed linear model in the case of missing data, would you recommend any sources?

    Thanks a lot,

    Nathan

    Reply
  6. mary says

    March 22, 2021 at 9:52 am

    I will compare a continuous variable between two different treatment in 5 time points. But I have the data of 47 patients in the first time point, 15 patients in the third time point, only 8 patients in the 5th time point. Missingness is not random. Can I use a linear mixed model? And what is the non-parametric alternative test to linear mixed model?
    Thanks

    Reply
    • Karen Grace-Martin says

      December 6, 2021 at 1:06 pm

      Mary, you’ll want to check your missing data mechanism. Linear mixed models (and the maximum likelihood estimation it uses) assumes missing at random, but not missing completely at random. You’ll get better estimates from LMM than from any other option. See: https://www.theanalysisfactor.com/missing-data-two-recommended-solutions/

      Reply
  7. Igbiks Tamuno says

    January 8, 2021 at 6:26 pm

    I have a data of over 200 patients followed up for 12 months with creatinine measurements at six time points. missing data is over 60%. multiple imputation was used to deal with missing data. can i have guidance in analyzing this data using linear mixed model?

    Reply
    • Karen Grace-Martin says

      December 6, 2021 at 1:09 pm

      It’s definitely something we could help you with in our membership or in consulting. We’d have to dig into the details with you to give solid guidance.

      Reply
  8. Kim says

    May 24, 2019 at 2:51 pm

    Hi Karen,

    This article is very helpful. Given the information you’ve provided above, do you recommend a different statistical approach for handling missing data in a study using a pre-post design where data are missing at random?

    Thanks,
    Kim

    Reply
    • Karen Grace-Martin says

      May 31, 2019 at 11:24 am

      Hi Kim,
      Not necessarily. This is still going to give you the most unbiased results. The only other option is multiple imputation, and you only get limited information from that when you have the impute the outcome variable.

      Reply
  9. John says

    March 23, 2018 at 3:23 pm

    In the above you state that:

    “The difference in means from pre to post will be calculated based on the estimates at each time point. But the degrees of freedom for the difference will be based only on the number of subjects who have data at both time points.”

    Are these the estimates of those people with posttest-only data along with the people with full data?

    Similarly is the “mixed model” you described above the same as a random effects logit regression?

    Very best,

    John

    Reply
  10. Amy lin says

    December 22, 2017 at 6:15 am

    Is the second method handling the missing data called maximum likelihood method? or other name?

    Reply
    • Karen says

      December 22, 2017 at 10:24 am

      Hi Amy,

      Yes. Mixed models uses maximum likelihood, which handles the missing data.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT