In a recent post, I discussed the differences between repeated measures and longitudinal data, and some of the issues that come up in each one.
I want to expand on that discussion, and discuss the three approaches you can take to analyze repeated measures data.
For a few, very specific designs, you can get the exact same results from all three approaches. This, I find, has always made it difficult to figure out what each one is doing, and how to apply them to OTHER designs.
For the purposes of discussion here, I’m going to define repeated measures data as repeated measurements of the same outcome variable on the same individual. The individual is often a person, but could just as easily be a plant, animal, colony, company, etc. For simplicity, I’ll use “individual.”
Beyond that, anything goes. Measurements can be repeated over time or space; time can itself be an important factor in the experiment or not; each individual can have 2 or 20 measurements.
Approach 1: Repeated Measures Multivariate ANOVA/GLM
When most researchers think of repeated measures, they think ANOVA. In my personal experience, repeated measures designs are usually taught in ANOVA classes, and this is how it is taught.
The data is set up with one row per individual, so individual is the focus of the unit of analysis. This is called the wide format.
The multiple measures of the outcome variable are in multiple columns of data-each is considered a different variable. It’s a multivariate approach and is run as a MANOVA, so the model equation had multiple dependent variables and multiple residuals. (SPSS users-this is the approach taken by the Repeated Measures (RM) GLM procedure).
The biggest advantage of this approach is its conceptual simplicity. It makes sense. But it has a lot of assumptions that can be very difficult to meet in all but very limited experimental situations.
These include balanced data (if even one observation is missing, the subject will get dropped) and equal correlations among response variables. It also has the limitation that it cannot do post-hoc tests on the repeated measures factor, which I consider a huge limitation.
It tends to work well in many experimental situations, where each measurement is taken under a different experimental condition.
Approach 2: The Marginal Multilevel Model
The second approach assumes the repeated responses make up multilevel data. The outcome is a single variable, and another variable is needed to indicate the condition or time measurement. This requires that each subject have multiple rows of data in the spreadsheet. This is called the long format, or Stacked data, and this changes the unit of analysis from the subject to each measurement occasion.
In a marginal model (AKA, the population averaged model), the model equation is written just like any linear model. There is a single response and a single residual. The difference between the marginal model and a linear model is that the residuals are not assumed to be independent with constant variance.
In a marginal model, we can directly estimate the correlations among each individual’s residuals. (We do assume the residuals across different individuals are independent of each other). We can specify that they are equally correlated, as in the RM ANOVA, but we’re not limited to that assumption. Each correlation can be unique, or measurements closer in time can have higher correlations than those farther away. There are a number of common patterns that the residuals tend to take.
Likewise, the residual variances don’t have to be equal as they do in the RM ANOVA.
So in cases where the assumptions of equal variances and equal correlations are not met, we can get much better fitting models by using a marginal model. The other big advantage is by taking a univariate approach, we can do post-hoc tests on the repeated measures factor.
Approach 3: The Linear Mixed Model
Like the marginal model, the linear mixed model requires the data be set up in the long or stacked format.
It too controls for non-independence among the repeated observations for each individual, but it does so in a conceptually different way. Rather than just estimate the correlation among an individual’s repeated observations, it actually adds one or more random effects for Individuals to the model.
The model equation therefore includes extra parameters to include any random effects. They take the form of additional residual terms, each of which has its own variance to be estimated.
This literally means the model is controlling for the effects of individual. The simplest mixed model, the random intercept model, controls for the fact that some individuals always have higher values than others. By controlling for this variation, we’ve taken it out of the original residual.
Individual growth curve models are a specific type of mixed model that uniquely models each individual’s value of the outcome over time. They are particularly useful when the research question is about how covariates affect not only the value of the dependent variable, but its change over time.
The biggest advantage of mixed models is their incredible flexibility. They can handle clustered individuals as well as repeated measures (even in the same model). They can handle crossed random effects, where there are repeated measures not only on an individual, but also on each stimulus.
Time can easily be considered continuous or categorical, and covariates can be measured just once per individual or repeatedly at each observation. Unbalanced data are no problem, and even if some outcomes are missing for some individuals, they won’t be dropped from the model.
The biggest disadvantage of mixed models, at least for someone new to them, is their incredible flexibility. It’s easy to mis-specify a mixed model, and this is a place where a little knowledge is definitely dangerous.