Designing experiments would always be simple if we could just randomly assign subjects to different treatment conditions with no other restrictions. Unfortunately, that doesn’t always work.
For example, there are many experimental situations where the subjects aren’t independent of each other. The subjects that are related to each other are combined into clusters called “blocks.” It can happen due to practicalities of running an experiment efficiently or you can intentionally plan it as a way to reduce random variance.
In either case, this is a randomized complete block design. It’s a great design to become familiar with because it will greatly expand your ability to create and analyze experiments.
How It Works
When you have subjects that share characteristics with one another, it can sometimes be difficult to isolate those characteristics directly. This makes it hard to record them as additional variables. By identifying the subjects that are similar, you can still capture how those characteristics affect the outcome. Subjects that are similar are grouped into “blocks.”
From there, you can make treatment assignments so that you put subjects from the same block into different treatment groups.
Why different treatment groups? Suppose subjects from the same block were assigned to the same treatment group. (more…)
What is the difference between Clustered, Longitudinal, and Repeated Measures Data? You can use mixed models to analyze all of them. But the issues involved and some of the specifications you choose will differ.
Just recently, I came across a nice discussion about these differences in West, Welch, and Galecki’s (2007) excellent book Linear Mixed Models.
It’s a common question, and there is a lot of overlap in both the study design and in how you will analyze the data from these designs.
West et al give a very nice summary of the three types. Here’s a paraphasing of the differences as they explain them:
In clustered data, the dependent variable is measured once for each subject, but the subjects themselves are somehow grouped (student grouped into classes, for example). There is no ordering to the subjects within the group, so their responses should be equally correlated.
In repeated measures data, the dependent variable is measured more than once for each subject. Usually, there is some independent variable (often called a within-subject factor) that changes with each measurement.
And in longitudinal data, the dependent variable is measured at several time points for each subject, often over a relatively long period of time.
A Few Observations
They also make the following good observations:
1. Dropout is usually not a problem in repeated measures studies, in which all data collection occurs in one sitting. It is a huge issue in longitudinal studies, which usually require multiple contacts with participants for data collection.
2. Longitudinal data can also be clustered. If you follow those students for two years, you have both clustered and longitudinal data. You have to deal with both.
3. It can be hard to distinguish between repeated measures and longitudinal data if the repeated measures occur over time. [My two cents: A pre/post/followup design is a classic example].
4. From an analysis point of view, it doesn’t really matter which one you have. All three are types of hierarchical, nested, or multilevel data. You would analyze them all with some sort of mixed or multilevel analysis. You may of course have extra issues (like dropout) to deal with in some of these.
My Own Observations
I agree with their observations, and I’d like to add a few from my own experience.
1. Repeated measures don’t have to be repeated over time. They can be repeated over space (the right knee gets the control operation and the left knee gets the experimental operation). They can also be repeated over condition (each subject gets both the high and low cognitive load condition. Longitudinal studies are pretty much always over time.
This becomes an issue mainly when you are choosing a covariance structure for the within-subject residuals (as determined by the Repeated statement in SAS’s Proc Mixed or SPSS Mixed). An auto-regressive structure is often needed when some repeated measurements are closer to each other than others (over either time or space). This is not an issue with purely clustered data, since there is no order to the observations within a cluster.
2. Time itself is often an important independent variable in longitudinal studies, but in repeated measures studies, it is usually confounded with some independent variable.
When you’re deciding on an analysis, it’s important to think about the role of time. Time is not important in an experiment, where each measurement is a different condition (with order often randomized). But it’s very important in a study designed to measure changes in a dependent variable over the course of 3 decades.
3. Time may be measured with some proxy like Age or Order. But it’s still really about time.
4. A longitudinal study does not have to be over years. You could be measuring reaction time every second for a minute. In cases like this, dropout isn’t an issue, although time is an important predictor.
5. Consider whether it makes sense to think about time as continuous or categorical. If you have only two time points, even if you have numerical measurements for them, there isn’t a point in treating it as continuous. You need at least three time points to fit a line, but more is always better.
6. Longitudinal datacan be analyzed with many statistical methods, including structural equation modeling and survival analysis. You only use multilevel modeling if the dependent variable is measured repeatedly and if the point of the model is to see how it changes (or differs).
Naming a data structure, design, or analysis is most helpful if it is so specific that it defines yours exactly. Your repeated measures analysis may not be like the repeated measures example you’re trying to follow. Rather than trying to name the analysis or the data structure, think about the issues involved in your design, your hypotheses, and your data. Work with them accordingly.
Go to the next article or see the full series on Easy-to-Confuse Statistical Concepts
Multinomial logistic regression is an important type of categorical data analysis. Specifically, it’s used when your response variable is nominal: more than two categories and not ordered.
Multilevel and Mixed models are essentially the same analysis. But they use different vocabulary, different notation, and approach the analysis considerations in different ways.
Some repeated measures designs make it quite challenging to specify within-subjects factors. Especially difficult is when the design contains two “levels” of repeat, but your interest is in testing just one.
Let’s look at a great example of what this looks like and how to deal with it in this question from a reader :
I want to do a GLM (repeated measures ANOVA) with the valence of some actions of my test-subjects (valence = desirability of actions) as a within-subject factor. My subjects have to rate a number of actions/behaviours in a pre-set list of 20 actions from ‘very likely to do’ to ‘will never do this’ on a scale from 1 to 7, and some of these actions are desirable (e.g. help a blind man crossing the street) and therefore have a positive valence (in psychology) and some others are non-desirable (e.g. play loud music at night) and therefore have negative valence in psychology.
My question is how I can use valence as a within-subjects factor in GLM. Is there a way to tell SPSS some actions have positive valence and others have negative valence ? I assume assigning labels to the actions will not do it, as SPSS does not make analyses based on labels …
Please help. Thank you.
A well-fitting regression model results in predicted values close to the observed data values.
The mean model, which uses the mean for every predicted value, generally would be used if there were no useful predictor variables. The fit of a proposed
regression model should therefore be better than the fit of the mean model. (more…)