“Because mixed models are more complex and more flexible than the general linear model, the potential for confusion and errors is higher.”
- Hamer & Simpson (2005)
Linear Mixed Models, as implemented in SAS’s Proc Mixed, SPSS Mixed, R’s LMER, and Stata’s xtmixed, are an extension of the general linear model. They use more sophisticated techniques for estimation of parameters (means, variances, regression coefficients, and standard errors), and as the quotation says, are much more flexible.
Here’s one example of the flexibility of mixed models, and its resulting potential for confusion and error.
In repeated measures and longitudinal studies, the observations are clustered within a subject. That means the observations, and their residuals, are not independent. They’re correlated. There are two ways to deal with this correlation.
The Marginal Model
One is to alter the covariance structure of the residuals. What this means is that instead of assuming that all observations are independent, as you do in a linear model, you assume the residuals from a single subject are related. Their covariances are non-zero. So you have to estimate the covariances among all the residuals from a single subject.
This approach is called a Marginal or Population Averaged approach. It’s not truly a mixed model, although you can use Mixed procedures to run them. You get these models in SAS Proc Mixed and SPSS Mixed by using a repeated statement instead of a random statement.
The Mixed Model
The other way to deal with non-independence of a subject’s residuals is to leave the residuals alone, but actually alter the model by controlling for subject. When you control for subject as a factor in the model, you literally redefine what a residual is. Instead of being the distance between a data point and the average for everyone, it’s the distance between a data point and the mean for that subject.
You could, theoretically, include Subject as a fixed factor, but that usually uses up most of the degrees of freedom. If instead, you treat Subject as a random factor, you are still controlling for Subject, you’re still able to redefine the residuals and deal with non-independence, while using up only a few degrees of freedom.
Putting them together
Most of the time, controlling for Subject is enough to deal with all the non-independence of the residuals for each subject.
But every once in a while it’s not. If there is extra non-independence (or even non-constant variance) among the residuals, you can still estimate those non-zero covariances by adding a Repeated statement.
It’s fine to include a Repeated statement right along with a Random statement, and is sometimes necessary to have a good fitting model. The repeated statement still controls the covariance structure of the residuals for a single subject. It’s just that now those residuals have been redefined as the distance between each point and the subject’s mean. In the Marginal model, they’re not. They still represent the distance between each point and the overall mean.