There are two ways to run a repeated measures analysis.The traditional way is to treat it as a multivariate test–each response is considered a separate variable.The other way is to it as a mixed model.While the multivariate approach is easy to run and quite intuitive, there are a number of advantages to running a repeated measures analysis as a mixed model.

First I will explain the difference between the approaches, then briefly describe some of the advantages of using the mixed models approach.

Let’s use as an example a data set of students, who are measured at four time points across a school year.The children are given reading tests at the beginning of first grade and at three other time points evenly spaced across the school year. So each child has four observations for reading tests. Let’s assume the children are assigned to different experimental groups, and other covariates are measured.

In the multivariate approach, each child would have a single row of data in the data spreadsheet and four columns for the four reading scores. This is called the wide data form and the unit of observation is considered a child.Covariates that do not change across time, such as sex or age at time 1, would each appear in a column.

In the mixed model approach, each child would have four rows of data.One column would contain the time of measurement and another the reading score. This is called the long format, and the unit of observation is considered one time point per child.Covariates that don’t change would have repeated values across the four rows of data. A time-varying covariate would change values across the four rows, but only one column is needed for each one.

Some advantages of using a Mixed Models Approach:

**Missing Data:**The default approach to missing data in nearly all statistical packages is Listwise Deletion, which drops any observation with any missing data on any variable involved in the analysis. If the percentage missing is small and the missing data are a random sample of the data set, this is a reasonable approach.In the multivariate approach, if a child is missing one time point, they will be dropped from the entire analysis.In the mixed approach, only that time point will be dropped. The remaining data will be retained.**Post hoc tests:**Because of the way the Sums of Squares are calculated in the multivariate approach, post-hoc tests are not available for repeated measures factors. They are available, however, using the mixed approach.**Flexibility in treating time as continuous:**Depending on the design of the study, rather than consider time as four categories, it can be more accurate to treat time as a continuous variable. This allows you to model a regression line for time, rather than estimate four means.(You need at least 3 time points to do this, but more are better).This is not possible in the multivariate approach, but simple in the mixed approach.**A single dependent variable can be used in other analyses:**For example, in a study I’m currently working on, a two-factor (2×4) repeated measures design was used to study whether the impact of these two factors on an outcome was mediated by a third variable. Each subject has eight values of the mediator (one for each of the conditions) and eight values on the final outcome. The mediator is both an outcome and a predictor variable in two different models in this path analysis. Therefore, it was necessary to have a single outcome variable, not eight, in order to have a single path coefficient between the mediator and the outcome.**Easier to build into larger mixed models:**If our school design happened to also cluster children within teachers, we would need to include teacher as another level in the mixed model. Changing a two to a three level model is simple to do (in practice, if not conceptually) if the model is already set up as a mixed model.

And a bonus reason may be the most important one of all.You, the data analyst, becomes familiar with the terminology, concepts, and programming involved with mixed models in a simple repeated measures design. Then, when you encounter something more complicated (which you most likely will), your learning curve will be a single step, not a giant leap.

If you want to learn more about mixed models, check out the recording of my Random Intercept and Random Slope Models webinar. These two models are the basic building blocks of all mixed models. Get it all here. It’s free.

{ 21 comments… read them below or add one }

Hi Karen,

I am working with linear mixed models to look at seasonality of allergies. Our investigators have the problem of some subjects having two visits in the same season vs. most subjects having each visit in a different season. They would like to look at the difference in seasonality, not between visits. Do you know if a mixed model would be able to work with a time variable that contains two identical values? Does this question even make sense? We are using an AR(1) structure to allow for correlation varying between time points and I’m having trouble visualizing this if we’re acting as if some subjects had 2 measurements at the exact same time.

Thanks,

Melissa

Hi Melissa,

I would need more information about the details of the model and how the variables are being used. The fact that you’re using AR(1) indicates you’re using a repeated approach, not random effects. Having two observations per season should be fine, but it really depends on the design details.

Hi experts,

I am running an experiment involving 3 treatments and a control to look at the seedling growth over the years (I measure height and diameter). The trial is 4 years old now. I came across a problem. The few seedlings I planted died after first year, few in the second and third in the experiments. Since it is a repeated measures, the final fourth year measurements has several recordings with absent or zero values. Should I treat them as missing values? the dead one’s height and diameter or they should be treated as zero measurements.

Many thanks for your help

Peter

Hi Peter,

That is a tricky one, in that if you treat it as missing, you’re assuming that it’s missing at random, which it probabaly isn’t. The only way it would be is if, for example, they died due to being eaten by pests or because your assistant forgot to water them–something that has nothing to do with their height. To really give you advice about what you *should* do, I’d have to ask you a lot of questions.

Hi, I am building a linair mixed model. I tested for correlation between the Iv and my DVs. They seam to correlate significantly. Is this a problem? The r varies from .103 to -.20. To me those are very low correlations, but as I mentioned, they are significant….

Hello Karen:

I appreciate the way you explain these concepts as I am more epi vs. biostat, but I appreciate the interplay. I also have a question.

I am currently working with data from a cross-sectional study – it is establishing a baseline. However, I have determined that there is also a “longitudinal” component, some individuals will have repeated measures. I see the repeated measures accounting for less than 10% of the study population (hopefully). My first approach was thinking to just subset the data and present the repeaters’ analysis separately. However, I now need to determine a way to still pull them into the population’s overall analysis as well.

Do you think looking at the mixed model would hurt, or should I just stick with a general model and choose one measure to focus on per individual?

Hi Karen

Thanks for this resource. My question is about missing data. Most texts and state how that linear mixed models can handle missing data, i.e., no need to discount the case if one observation is missing. I have two questions.

1) Does this apply when there are only two observations and one is missing?

2) Does this ability to handle missing data also extend to missing values from predictor variables/covariates that were only measured once?

Many thanks

Amber, UK

Hi Amber,

I wrote something on just this question: https://www.theanalysisfactor.com/linear-mixed-models-for-missing-data-in-pre-post-studies/

This is wonderful site a lot of learning from FAQs I indeed want to join this site and would love to receive mails from here.

I am very much interested in statistics currently teach Epidemiology to Master students and simultaneously being enrolled in PhD program at Karolinska institutet Stockholm Sweden. Epidemiology and biostatistics go hand in hand i have little knowledge about basics as well as few advance concepts of stats like MLR, ALR survival etc. But my own interest lies in mixed models ans LDA

Hi Karen.

I’m confused to read in this interesting article that a categorical variable e.g. sex can be considered as a covariate. I remember in previous webinar it was mentioned that a covariate is a continuous variable.

Thank you.

Hi Ibrahim, “predictor” would have been a better choice than “covariate.” Technically a covariate can be

anycontrol predictor, but it’s not very precise wording. See this: https://www.theanalysisfactor.com/confusing-statistical-terms-5-covariate/What could be the advantage of the repeated measures MANOVA approach over the univariate repeated measures ANOVA

It has less restictive assumptions. It circumvents the circularity assumption usually violated in the univariate analysis.

Do you have a webinar/handouts for doing this in SAS?

Thanks!

Hi Kristin,

I don’t, but I should. I have a webinar with handouts for doing this in SPSS.

The LOGIC of what the random and repeated statements do in SAS proc mixed and SPSS mixed is the same. Here is a nice handout with the syntax for both. http://psych.unl.edu/psycrs/944/SAS_SPSS_Mixed.pdf

Hi Carsten,

Currently I am facing a problem with the mediation test in a two within-factor repeated measure model (with time-varying covariates). Could you please share more about how you deal with this problem?

Great thanks!^_^

Hi Luping,

I’m not sure exactly what the problem is, but if it’s just the general approach…

Just like in any mediation analysis, you’re going to run three models. The first tests the direct effect of the Independent Variables (your within-factors, controlling for covariates) you’re interested in on the Dependent Variable (DV). You will have to run it as a mixed model, rather than a multivariate repeated measures, to make sure that you have only one DV.

The second tests the effect of the same predictors on the Mediator.

The third is the same as the first model, but add in the Mediator as a covariate. See if the relevant effects in equation 1 go away. That’s mediation.

Is that helpful?

Karen

I have the same problem, I’m not sure if it’s possible to calculate mediation in within-subject designs (repeated measures). e.g.

Judd, Kenny, McClelland, 2001. pp 116.

[…] These analytic procedures (mediation), however, apply only to the situation in which the treatment variable varies between experimental units, with some units or participants receiving one treatment and others receiving another. However, treatment comparison often involves within-subject comparisons, examining every experimental unit or participant in each treatment condition […] A design in which the independent variable varies within participants rather than between them is particularly useful when the effects of the independent variable are relatively transitory and the independent variable is easily varied. Such a design can often result in a dramatic increase in statistical power relative to between-subjects designs, because variance due to stable individual differences contributes to the error variance in testing between-subjects effects but not to the error variance in within-subject analyses. As a result, within-subject designs often involve smaller samples than those typically used in between-subjects designs.

http://dionysus.psych.wisc.edu/lit/articles/juddc2001a.pdf

But !

Preacher & Hayes, 2007, pp 20-21.

[…]The causal steps approach is by far the most commonly used method for assessing mediation. The criteria have recently been extended for use in within-subject designs as well (Judd, Kenny, & McClelland, 2001). But despite its simplicity and intuitive appeal, the causal steps strategy suffers from serious limitations relative to other methods we discuss soon. First, it is possible to observe seemingly paradoxical effects using this approach. For example, a significant c and nonsignificant c′ may differ by a trivial amount in absolute terms, yet the causal steps criteria would indicate that mediation is occurring (a possible Type I error; Holmbeck, 2002) […].

http://www.sagepub.com/upm-data/23657_Chapter2.pdf

In my visual search experiment, the design is a 2 X 4 repeated measure ANOVA with 2 repeated factors of search type (Condition L / Condition X) and set-size (2, 4, 8, 12), with participants as random factor. Dependent variables: response time and an index of confidence.

All suggestions are welcome !

Thanks !

Regards

Gabriel

It’s a good point, Gabriel. The procedures for good mediation are definitely changing rapidly.

I don’t think there is a really great solution right now, unfortunately. The old Baron and Kenny approach, which I assume is what “causal steps” means, doesn’t cut it in many fields any more. But there aren’t good tests for the direct effects either in within subjects effects, like Preacher and Hayes’ bootstrapping.

This is a subject I intend to spend some time catching up on, but at this point, I don’t know of any great answers.

Karen

Thanks for the advice to shift to mixed models. I think it works like a charm. I am wondering though. Say we split the students into two groups after the first measurement we give them an intensive reading course during which measurement two may be taken. Now we know that students are forgetful so we expect the students that had the intensive course to fall back an perform similar as the control students in test 3 or 4. Now can we calculate the ‘post hoc’ on the interaction between time and student group?

Best Carsten

Hi Carsten,

Theoretically, you can. You might be limited by your software, though. I know that SAS proc mixed allows for a variety of post-hoc adjustments in the interactions in the lsmeans statement. I believe SPSS has fewer options. If you’re using another program, I would suggest checking the manual to see what your options are.

{ 3 trackbacks }