Specifying Variables as Within-Subjects Factors in Repeated Measures

by Karen Grace-Martin

I recently received this great question* from a reader:

Jan Wrote:

I want to do a GLM (repeated measures ANOVA) with the valence of some actions of my test-subjects (valence = desirability of actions) as a within-subject factor. My subjects have to rate a number of actions/behaviours in a pre-set list of 20 actions from ‘very likely to do’ to ‘will never do this’ on a scale from 1 to 7, and some of these actions are desirable (e.g. help a blind man crossing the street) and therefore have a positive valence (in psychology) and some others are non-desirable (e.g. play loud music at night) and therefore have negative valence in psychology.

My question is how I can use valence as a within-subjects factor in GLM. Is there a way to tell SPSS some actions have positive valence and others have negative valence ? I assume assigning labels to the actions will not do it, as SPSS does not make analyses based on labels …
Please help. Thank you.

My answer:

Hi Jan,

You’re correct that the value labels aren’t going to do it.  There are a couple different ways that you can do it and it’s going to depend on the analysis you use.

My assumption is that the 20 actions are equally divided between positive and negative valence.  So each subject rates 10 actions of each.

The big question is whether you actually have any interest in comparing the actions themselves.  So do you really want to know if someone is more likely to help a blind man across the street as they are to do some other positive action (say, donate to charity)?  Or are these 10 positive actions just examples of positive actions and you just really want to compare all the positives to all the negatives?

The reason I ask is that when you do repeated measures ANOVA, you have to set it up in the wide format.  This means that each response (every action) is a separate column.  RM ANOVA wants each column to be a condition and will compare them.  There is no way to specify “compare these 10 columns to these other 10 columns.”

Now if it turns out that these 20 actions actually represent 10 levels of another factor x 2 levels of valence, that you can do.  But it has to calculate a mean for each cell of the 10×2 and it assumes that each column represents one combination of conditions.   In other words, RM ANOVA has no way of dealing with pure within-subjects replicates of the exact same condition.

The ad-hoc approach that data analysts have done in the past was to just average the 10 responses that are all in the same condition.  While this basically works, it’s really not the best approach for many reasons.

However, there is a better way to do it that requires a different way of thinking about it.  This is a linear mixed model.  So instead of RM ANOVA, you will run a linear mixed model.

The big advantage here is that Mixed uses the long data format.  This means that each person’s response to each of the 20 actions goes into a new row of data.  You have one variable (column) for the response, one that indicates the action, and another that indicates the valence of that action.

This allows you to separate out what’s being repeated (20 actions) from the variable you’re actually interested in (valence).

This is how I would recommend doing it and in fact, this kind of experiment is ideally run as a crossed random effects model.  This allows you to not only control for the fact that multiple responses across actions for the same subject are likely to be correlated, but also that multiple responses across subjects for the same action are likely to be correlated.

For example, there may be some negative actions (kicking puppies) that people generally rate less likely than other negative actions (playing loud music).  That consistency across people’s responses leads those responses to be correlated.  If you can control for that, it comes out of the unexplained error in the denominator of the F test.  This helps you find real effects in the comparisons you want to make.

I’ve added links in a few places above that will elaborate on some of the things I mentioned.  You may find them helpful.  If you want a little more info on mixed models, I would also suggest our free webinar recording “Random Intercept and Random Slope Models.”  Despite the difference in names, it’s about linear mixed models for repeated measures.

*while I try to answer questions left in comments quickly, the sheer volume of them sometimes just kicks my butt.  And some questions, like this one, just require a more thorough answer than I can do in comments.  If you are ever stuck with a stat question that you need an immediate answer to, please join our Statistically Speaking membership program.  We have a private forum there that our team of consultants answers within a business day. We also have weekly live Q&A sessions–many stats questions are much easier to answer in a conversation as we often need to clarify the context and details.


Fixed and Random Factors in Mixed Models
One of the hardest parts of mixed models is understanding which factors to make fixed and which to make random. Learn the important criteria to help you decide.

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: