You In a recent post, I discussed the differences between repeated measures and longitudinal data, and some of the issues that come up in each one. I want to expand on that discussion, and discuss the three approaches you can take to analyze repeated measures data: repeated measures ANOVA, Mixed Models, and Marginal Models.
For a few, very specific designs, you can get the exact same results from all three approaches. This, I find, has always made it difficult to figure out what each one is doing, and how to apply them to OTHER designs.
For the purposes of discussion here, I’m going to define repeated measures data as repeated measurements of the same outcome variable on the same individual. The individual is often a person, but could just as easily be a plant, animal, colony, company, etc. For simplicity, I’ll use “individual.”
Beyond that, anything goes. Measurements can repeat over time or space; time can itself be an important factor in the experiment or not; each individual can have 2 or 20 measurements.
Approach 1: Repeated Measures ANOVA
When most researchers think of repeated measures, they think ANOVA. In my personal experience, repeated measures designs are usually taught in ANOVA classes, and this is how it is taught.
You set up the data with one row per individual, so individual is the focus of the unit of analysis. This is the wide format.
The multiple measures of the outcome variable are in multiple columns of data. That means we consider each a different variable. It’s a multivariate approach and is run as a MANOVA, so the model equation had multiple dependent variables and multiple residuals. (SPSS users-this is the approach taken by the Repeated Measures (RM) GLM procedure).
The biggest advantage of this approach is its conceptual simplicity. It makes sense. But it has a lot of assumptions that can be very difficult to meet in all but very limited experimental situations.
These include balanced data (if even one observation is missing, the subject will get dropped) and equal correlations among response variables. It also has the limitation that it cannot do post-hoc tests on the repeated measures factor, which I consider a huge limitation.
It tends to work well in many experimental situations, where each measurement is taken under a different experimental condition.
Approach 2: The Marginal Model
The second approach treats the repeated responses as multilevel data. The outcome is a single variable, and another variable indicates the condition or time measurement. This requires that each subject have multiple rows of data in the spreadsheet. You call this the long format, or stacked data. It changes the unit of analysis from the subject to each measurement occasion.
In a marginal model (AKA, the population averaged model), the model equation is written just like any linear model. There is a single response and a single residual. The difference, though, is that in the marginal model we do not assume residuals are independent with constant variance.
In a marginal model, we can directly estimate the correlations among each individual’s residuals. (We do assume the residuals across different individuals are independent of each other). We can specify that they are equally correlated, as in the RM ANOVA, but we’re not limited to that assumption.
Each correlation can be unique, or measurements closer in time can have higher correlations than those farther away. There are a number of common patterns that the residuals tend to take.
Likewise, the residual variances don’t have to be equal as they do in the RM ANOVA.
So in cases where the assumptions of equal variances and equal correlations are not met, we can get much better fitting models by using a marginal model. The other big advantage is by taking a univariate approach, we can do post-hoc tests on the repeated measures factor.
Approach 3: The Linear Mixed Model
Like the marginal model, the linear mixed model requires the data in the long format.
It too controls for non-independence among the repeated observations for each individual, but it does so in a conceptually different way. Rather than estimate the correlation among an individual’s repeated observations, it actually adds one or more random effects for Individuals to the model.
The model equation therefore includes extra parameters for any random effects. They take the form of additional residual terms, and the model estimates the variance of each.
This literally means the model controls for the effects of individual. The simplest mixed model, the random intercept model, controls for the fact that some individuals always have higher values than others. By controlling for this variation, we’ve taken it out of the original residual.
Individual growth curve models are a specific type of mixed model that uniquely models each individual’s value of the outcome over time. They are particularly useful when the research question is about how covariates affect not only the value of the dependent variable, but its change over time.
The biggest advantage of mixed models is their incredible flexibility. They handle clustered individuals as well as repeated measures (even in the same model). They handle crossed random factors as well as nested.
You can treat Time as continuous or categorical. You can measure covariates just once per individual or repeatedly at each observation. Unbalanced data are no problem, and even if some outcomes are missing for some individuals, they won’t be dropped from the model.
The biggest disadvantage of mixed models, at least for someone new to them, is their incredible flexibility. It’s easy to mis-specify a mixed model, and this is a place where a little knowledge is definitely dangerous.