Interpreting Interactions: When the F test and the Simple Effects disagree.

by Karen Grace-Martin

The way to follow up on a significant two-way interaction between two categorical variables is to check the simple effects.  Most of the time the simple effects tests give a very clear picture about the interaction.  Every so often, however, you have a significant interaction, but no significant simple effects.  It is not a logical impossibility. They are testing two different, but related hypotheses.

Assume your two independent variables are A and B.  Each has two values: 1 and 2.  The interaction is testing if A1 – B1 = A2 – B2 (the null hypothesis). The simple effects are testing whether A1-B1=0 and A2-B2=0 (null) or not.

If you have a crossover interaction, you can have A1-B1 slightly positive and A2-B2 slightly negative. While neither is significantly different from 0, they are significantly different from each other.

And it is highly useful for answering many research questions to know if the differences in the means in one condition equal the differences in the means for the other. It might be true that it’s not testing a hypothesis you’re interested in, but in many studies, all the interesting effects are in the interactions.

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

{ 91 comments… read them below or add one }

Hege Maasø

Hei Karen!
I really need some help interpreting my findings. I have a multilevel model, depended variabel is Political trust and the most important independed variabel is corruption (CPI). The effect of corruption included alone is neagtive and significant. I also included an cross-level interaction between satisfied with government and corruption. This make the corruption variable positive and not significant, the interaction however is negative and significant (0,000). I don`t really understand this.. How should this be interpreted? Is it right to asume that corruption have a significant effect solely on high values of satisfied with government? How should I decide if the effect is significant? Really hope you can help me 🙂

Reply

Laura

Hi Karen,

Thank you very much for such a useful website. I have run a mixed design anova with time as the within subjects variable (3 time points) and group as the between subjects variable (2 groups). I have found a significant interaction effect but am unsure how to find out at what point the two groups differ. Can you advise on how to proceed? Many thanks.

Reply

Lily

Hi Karen,
Really happy to find this blog. I’m confused by the two-way mixed ANOVA results of my study. Would like to seek for help from you :).

I conducted a 3 *3 two-way mixed-designed ANOVA on SPSS. The within subject variable is the ‘time point pre-and post- treatment’ and it has 3 levels (pre-treatment, 15 min post- and 30 min post-). The between subject variable is the ‘disease condition’ which also has 3 levels (healthy control, mild asthma, severe asthma). Dependent variable is a parameter from pulmonary function test (DV).The sample size of each group is different (HC=8, MA=10, SA=17). The assumptions are ok (normality, homogeneity of variation and Sphericity).

The results from this two-way mixed anova show that 1) ‘disease condition’ has significant main effect (SA lower than the other two on averaged DV across 3 time points); 2) ‘time point of treatment’ does not have significant main effect; 3) there is no interaction.

The treatment effect via time on different groups is my most interested bit. I was a little disappointing about this result 2) and 3), especially lacking interaction. Then I noticed that on the interpreting plots, the lung function parameter (DV) did show a clear decrease trend after treatment in severe asthma group (HS and MA also showed reduced trend but the slope is much gentler). I don’t know whether the absence of significant interaction and main effect of time point is ascribed to the small sample size ( which might not be able to provide enough statistic power for two-way mixed to pick up the significance in the change). So, I split the data according to disease condition and ran 3 independent one-way repeated ANOVA in each of disease groups to test the lung function parameter change via time point. The multiple one-way repeated ANOVA showed that in severe asthma group but not in mild asthma and healthy group, the ‘treatment via time’ is a main effect. The DV decreases significantly over the 3 time points in severe asthmatics but not in mild asthmatics and healthy controls.

It seems there is a conflict between two-way mixed and multiple repeated one-way ANOVA in this study. Two way-mixed ANOVA tends to show that “DV did not differ between time points when ignoring disease condition (from 2)) and the effects of treatment via time on DV in different disease groups are not different (from 3)” while the 3 seperate one-way repeated ANOVA seem to show that “treatment significantly alters the DV in severe asthma group but not in MA and HC —kind of a reflection of different effect of treatment on DV in different disease group, i.e. interaction?”

I know two-way mixed ANOVA should be the test of choice however I’m worried about the low statics power due to small and unequaled sample size. In addition, the multiple one-way repeated ANOVA does give me the results that I expect. I don’t know whether it is appropriate to perform three independent one-way repeated ANOVA rather than two-way mixed ANOVA in my study? If you suggest me to report the results of both two-way mixed and 3-independent one-way repeated, could you please give some suggestion in interpretation? Thank you

Looking forward to your reply
Lily

Reply

Megan

Hi Karen,

I am going to conduct a 2x2x2 between subject AVONVA involving factors A , B and C . I hypothesize that the value of dependent variable in B1xC1 at A1 level is significantly higher than B1xC1 at A2 level whereas the value in B2xC2 at A2 level is higher than B2xC2 at A1 level. The analyses show 2 significant two way interactions for BxC and AxC but the 3 way interactions was non-significant. Is it ok to reject those hypotheses by the non-significant three-way interaction effect? Does it mean that the patterns of simple interaction of BxC are the same at A1 and A2 level? Or do I need to conduct any other tests to find out what’s happened regarding to my hypotheses?

Thank you.
Megan

Reply

Karen

Hi Megan,

Yes, the 3 way says that the pattern of the BxC interaction is the same at all values of A. If you’re actually just comparing a few specific means to answer your RQ, then you may want to do contrasts instead.

Reply

emily warren

Hi Karen, would really appreciate your help.
I have used a 2 x 2 x 2 mixed anova (two between subjects, and one within subjects). I have found significant main effects for the two within subjects but not for the between subjects, and no significant interactions. Would it be appropriate to run paired samples t-tests on each group separately to see if the within subject factors are still significant within each group, rather than over all? And also, I am intrigued to see why the between subjects factor did not have a main effect so would it be appropriate to run an independent samples t-test to compare the two groups in each condition for a finer understanding, or would this yield significant results from running too many analyses.
Thank you in advance, emily.

Reply

Karen

Hi Emily,

You’re right to worry about running too many tests. Rather than running a bunch of tests, I would start with graphing means to get an idea of what is going on.

Reply

Emma Black

HI Karen,

Thanks for your website which is really helpful. I ran a 2 x 3 mixed ANOVA, the within factor being time and the between factor age categories. I have found no significant main effects but I do have a significant interaction.

When I split the fine by time points and run one way ANOVAs there are no significant differences between age groups for time point one and there are significant differences between the ages at time point two. This was the only way I could see to explore the results further, as the post hoc analysis from my first set of results did not indicate where the differences lay.

Was this an appropriate follow-up analysis and if not can you point me in the right direction?

Many thanks in advance,

Emma

Reply

Karen

Hi Emma,

Yes, that’s exactly what you want to do to explore the interaction. It’s called simple effects testing.

Reply

Maria

Hi Karen,
I have used a 2×2 mixed ANOVA, the within-subjects factor was time and between subjects factor was the levels of processing. Both the factors was significant however there was no interaction between both of the factors. Is there anything I can report about the interaction, other then say there was no interaction?

Thank you

Reply

Karen

Generally, no. You can, and often should, report some soft of effect size, but that’s about it.

Reply

Cat

Hi Karen,
Thanks so much for taking the time to answer all these questions, it is extremely useful!
Here is my question. I have a significant three way, no significant two-ways or main effects. Based on plots I could tell where the differences was most likely coming from. Split the file, ran a two way…but it just neared significance…p=.066. I ran simple effects just to see what was going on and I had a significant difference between two groups…but I can’t report it can I? At this point all I have is this significant three way…is that where my analysis ends?
thanks!
Cat

Reply

Lee

Hi Karen,

I have read through all of the questions on here and found them very helpful – thanks. However, I am still having difficulty reporting a two-way crossover interaction between two between-groups factors, say Gender (M/F) and Employment (Employed/Unemployed).

When I graph the interaction I see that the change/slope from G1E1 (MaleEmployed) to G1E2 (MaleUnemployed) is steeper than the G2E1 (FemaleEmployed) to G2E2 (FemaleUnemployed) slope i.e that unemployment has more of an effect on my DV for males than it does on females.

No matter how I split this up I do not get significant differences between any of the four cells as seems to be recommended here. I have tried:
– Splitting file to look at effect of employment for males and females separately. For both gender groups, there is no significant difference in my DV by employment. And why should there be; the interaction doesn’t tell me that E1-E2 slope is significant for either of the gender groups. What it is tells me is that the E1-E2 slope is not the same BETWEEN the two gender groups
– Creating four pairwise comparisons by recoding the two factors into a new variable i.e. one that labels the following: G1E1=1;G1E2=2;G2E1=3;G2E2=4. One-way ANOVA fails to show any difference (nor do the post-hoc).

Neither seem to appropriate because what I want to test is the slopes, i.e. whether (G1E1 minus G1E2) is different from (G2E1 minus G2E2). This would be easy to do if the design was repeated measures, because all of the levels would be in different variables that could be subtracted, but this can’t be done when they are all independent measures.

Any advice much appreciated!

Reply

Karen

Hi Lee,

This is very common in crossover interactions. Neither slope is different from zero, but they are different from each other.

The significant interaction IS saying that the two slopes are unequal–you’ve already tested it. If one slope is positive and the other negative, you don’t really need the simple effects tests.

Reply

Lee

Thank you, that’s such a big help!

Reply

Abigail

Hi Karen,

Do you have any references to cite that it is not necessary to test for simple effects when the interaction term is significant and one slope is positive and the other negative?

Thank you so much!

Reply

Rebecca

Hi Karen.

I am having difficulty understanding how to interpret some of my repeated measures outputs. I have a condition (2) x time (9) analysis. There was a significant main effect for condition but no significant effect for time and no significant interaction. I didn’t expect differences over time, but I am trying to figure out the interaction. Can I look at contrasts for the interactions or am I done once it’s not significant? Secondly, when writing the results (for publication), can I say there were differences between conditions or is this not true since there was no interaction?

Thank you!
Rebecca

Reply

Karen

Hi Rebecca,

1. You’re done. The general consensus is there is no reason to investigate the nature of a non-significant interactio.
2. Yes, there were differences between conditions. What you’ve found is this difference between conditions exists across all time points.

Reply

Rose

Hi,
I have a case in which the F term for the interaction and the simple effects disagree, but in the opposite direction.

I have a 2(difficulty) X 3(time) design. The interaction did not came out siginificnat. However, the simple effects reflect exactly what I predicted: 1-way ANOVA for time was significant in one difficulty level, but not in the other difficulty level.
I am afraid that the non-significant interaction will be an obstacle in publishing the results.
Since this specific direction of the interaction was planned (the time factor would influence the data in one difficulty level but not in the other one), I wondered if there is any alternative, planned analysis that would yield a statistic for an interaction term, with more power (so that the interaction term would hopefully be significant).

Alternatively, is it meaningful/correct to report the simple effects without reporting the interaction?

Many thanks!

Reply

Lily

Hi Rose,
I’ve got the exactly the same problem as you. Have you figured out the answer?

Reply

Nicolas

Hi Rose and Lily,

I was looking at the answers to Karen’s post to know whether someone discussed the case she mentioned (a significant interaction with no significant simple effects). I found your questions unanswered and I thought I can help you so here is my answer. Imagine you had a 2(A) x 2(B) factorial design. Testing the interaction consists of comparing the difference between the simple effect of A for B1 and the simple effect of A for B2 to the predicted effect size under H0, which is usually 0. (Note that this is completely equivalent to comparing the difference between the simple effect of B for A1 and the simple effect of B for A2 to the predicted effect size under H0.) Testing the simple effect of A for B1 consists of comparing the difference between A1 and A2 for B1 to the predicted effect size under H0. The same logic holds for testing other simple effects. Thus, testing interaction and simple effects address two different questions. Concluding that an interaction is statistically significant at .05 requires that the 95% confidence interval (CI) of the difference between the simple effects excludes the predicted effect size under H0. Concluding that a simple effect is statistically significant at .05 requires that the 95% CI of the simple effect excludes the predicted effect size under H0. Having a statistically significant simple effect and a non-statistically significant simple effect does not necessarily imply that the two simple effects are significantly different. This seems to be your case. This seems to be the case in your situation. You do not need to tests each simple effect separately to conclude you have an interaction effect.

I am not sure whether I am clear enough or whether it helps. Nevertheless, you can look at this paper from Gelman & Stern (2006):

Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.

Reply

Maria

Hello,

I have a significant interaction between a 2-level continuous variable (mood 1 = positive, 0 = neutral) and a continuous variable (length of music played). I would like to do a simple effects test to see if the effect of mood is significant at +1 and -1SD of my continuous variable.

I started by standardizing the continuous outcome variable (money spent) and the continuous variable (length of music played).

Then I created the following variables:

Zbelow = lmusic + 1;
xzbelow = mood*zbelow;

Zabove = lmusic – 1;
xzabove = mood*xzabove

Then I tried doing the following regressions

proc reg; model mspent = mood zbelow xzbelow;

proc reg; model mspent = mood zabove xzabove;

I am not sure what is going on but both regressions give exactly the same value even though one should test the effect of mood at +1SD of lmusic and the other should test the effect of mood at -1SD of lmusic. Any suggestions?

Reply

Karen

It looks like you didn’t divide by SD in creating your Z scores.

Reply

D

I have a question that I may be able to ask in two ways, the first way might require the least effort on your part. My study compares a treatment and control group on one outcome variable over 3 periods (pre- post- and follow-up). I use IBM SPSS 20, and I have an interaction but no main effect. I found a resource that recommends a MANOVA syntax (see below) to explore “disordinal interactions” but they seem to refer only to between-groups (like gender vs. group), not mixed-model designs. The short question: can I still use the resource, simply using the non-repeated factor (that is, time) in place of one of the factors (like gender in the above 2 X 2)? I’m guessing “no.” More in-depth question: I’ve set up my data in SPSS to accommodate a mixed-model, so Tukey post-hocs can’t be done as there are only 2 groups (treatment and control). Could I run a different post-hoc with SPSS menus? If not, what kind of syntax might you recommend? Or am I just stuck trying to visually interpreting the graphs? Thanks.

Resource says this syntax would work for 2X2.

MANOVA score BY gender (1,2) treatment (1,2)
/PRINT=CELLINFO (MEANS)
/DESIGN = gender treatment gender BY treatment
/DESIGN = gender treatment WITHIN gender(1), treatment WITHIN gender(2)
/DESIGN = treatment gender WITHIN treatment(1), gender WITHIN treatment(2).

Reply

Karen

It’s hard for me to evaluate that code without seeing the full resource. I’m not sure what they’re trying to do in the MANOVA.

If it were me, I’d run it in MIXED, and include an EMMEANS statement for the interaction, with a COMPARE option. That will allow you to compare the groups at each time point.

You can never do a post-hoc for a 2-level main effect because you don’t need one. If the F test says there is a difference between the groups, you know which two groups are different.

Reply

Nahla Ibrahim

Hi,
I have two questions that I would highly appreciate of you can help please.

y= b0+ b1*D+ b2*X +b3*D*X

First, the interaction between a dummy variable and a continuous variable is significant at 10 level. I noticed in some papers they do F test b2+b3=0 (i.e. testing the sum of the two slopes).
Is that test necessary when the interaction is significant?

In my case b2 was negative and b3 was positive , so the F test of the sum was not significant. In all the papers I read which done F test, the sign of b2 and b3 were positive. So is this test the right one when we have coefficients of different signs?

Many thanks in advance

Nahla

Reply

Karen

Hi Nahla,

I suspect that those papers want to test if the slope of the comparison group (where D=1) is significantly different from zero. If both b2 and b3 are positive AND if b2 is significantly different from zero, this is overkill.

But in your case where b2 and b3 are of opposite signs, it would be useful.

Reply

Joe

Hi Karem,

I did a ANCOVA with one IV, which was grade level (3nd and 5th grade); one DV, word reading post-test, and the covariate was word reading pre-test. But, then I realised that the assumption does not reach, because the IV is not independent from the covariate. Am I right?

Thank you, Joe

Reply

Karen

Hi Joe,

Probably. But see this article: When Assumptions of ANCOVA are Irrelevant

Reply

Nuria

Hi Karen,

I did a Repeated measures ANOVA with 1 within factor: Time (pre,post) and 2 between factor: Grade (2, 4) and Method (M1, M2 y M3). Should I report all the interactions? because the interaccion Time x Grade is not interesting.

Thanks!

Reply

Leila

Hi,

I had a quick question about interaction contrasts. I ran a 2(A) x 2(B) MANOVA and have 6 significant interactions. I want to follow them up with interaction contrast (A+ vs A-) vs (B+ vs B-). How do you run the interaction contrast in SPSS? Also, if the contrast is significant, would I follow it up with simple main effects, simple ANOVAs for A+ vs A- and B+ vs B- ?

Reply

Karen

Hi Leila, to answer the second question first, no, you don’t need to follow up with ANOVAs after a contrast.

To get SPSS to do all this, you have to use the EMMeans statement with a COMPARE option. This is one of those things you can only do in syntax, not the menus. See the Command Syntax Reference for details.

Reply

Tia

Hi Karen,

I’m doing a 2×1 factorial between-subjects ANOVA. Two independent levels were gender and age on attitude behavior. My results were ALL non-significant. I’ve tried searching in books and online but nothing much is written on ‘what to do when your main effects AND interaction are both non-significant’. Please help me out, do I still do a simple effects test? What steps do I need to take after having done descriptives and having found a non-significant result.
Thank you.

Reply

Karen

Hi Tia. The good news is you’re done. Just write it up as is. It’s possible you will still need to report the means, but no follow up tests are required.

Reply

Nuria

Hi! I made a qualitative fluency scale with 7 items, each item with 4 opcions. There were 4 raters,to do inter-rater realiability analyses, it is better to do cronbach or Kappa statictic?

Thanks in advance!

Reply

Karen

Hi Nuria,

There are so many details with the different inter-rater reliability stats that I have to look it up each time. I just wrote about this recently, including the links of where I look them up: https://www.theanalysisfactor.com/inter-rater-reliability-a-few-good-resources/

Karen

Reply

Nuria

Very helpfull, thanks a lot!

Reply

Anja

Hi Karen,
this seems a great website to hopefully answer my question:
I have two studies testing a 2*2*2 mixed model design. in Study 1 the 3-way is not significant, in study 2 (more power etc), it is.
The question is whether I can still interpret the simple effects (pairwise comparisons using the Emmeans syntax), if the 3-way is not significant. Our research is really interested in the actual discrepancy between the different scores and not so much at the actual 3-way interaction.
– I saw the answer you gave to a similar question before (Jiang), but specifically again:
1) can we interpret the simple effects if the 3-way is not significant and we are actually not really interested in the 3-way, but only the actual simple effects?
2) Is the emmeans-syntax (compare… adj (bonferroni)) also the correct method for simple effects in mixed models? (rather than MANOVA… mwithin…)
–> Where is literature that specifies these kind of particular cases? (i.e. that I could use as references in the paper, and also to understand the details…)

I would be really grateful for an answer because I already searched multiple books / stats texts etc and consulted with colleagues, but nobody really knows the answer…

Reply

Karen

HI Anja,

1) Just like I replied to Jiang, no, but why run a 3-way if you’re not interested in it? Every statistical test should attempt to answer a research question.
2) That’s how I would do it.

Literature? Hmm, I would start with Keppel, but you’re right–it may not be there. Textbooks generally cover what *to* do, not “can I do this other thing.” 🙂

I would also look under Planned Contrasts, not simple effects. Simple effects are *always* about interpreting signficant interactions.

Reply

Nuria

Do you know if there is a difference between η2p and η2, is it just the same indicator, isnt it?

Thanks!

Reply

Karen

Nuria,

Do you mean eta squared and partial eta squared? There is a difference: https://www.theanalysisfactor.com/effect-size/

Karen

Reply

Nuria

Thanks a lot!

Reply

Nuria

Thanks Karen for your quick answer.

I don’t really know how to do syntax, do you know a web page or where could I look how to do a two-way ANCOVA?

Thanks you vey much again. Nuria

Reply

Karen

Hi Nuria,

I don’t. I would suggest running the model through the menus and paste the syntax instead of hitting “Okay.” Read the Command Syntax Reference manual for UNIANOVA.

We do go over this in detail in one of my workshops (Running Regressions and ANCOVAs in SPSS GLM), but it’s too involved to explain here. You have to see it.

Karen

Reply

Nuria

Thanks Karen!

Reply

Nuria

Hi Karen,

I would like to ask you a question. Doing an ANCOVA with 3 GROUPS (G1, G2, G3) X 2 GRADES (2nd, 4th) with SPSS, if you get a significant interaction GROUP X GRADE, how can you analyze the interaction? do I need syntax?

Thanks you very much! Nuria

Reply

Karen

Hi Nuria,

Yes, to do, say pairwise comparisons across levels of one of the variables, you will need to use syntax. It’s one of those very useful things you can only do in syntax, not the menus.

Karen

Reply

ingrid

Hi Karen,
I love your site and explanations, hope you can help me with this one, I’m confused… I’m running a 3-Way mixed model ANOVA A(within subject, 2 levels)xB(within subject, 2 levels)xC (between subjects, 5 levels). I have specific hypothesis about the between subject variable (is C, I’ll call it group), so did 4 orthogonal planned contrast. In the first planned comparison I contrasted 2 groups versus the other 3 groups (design: +3 +3 -2 -2 -2) I used the syntax in SPSS to do this. I find no significant effect of Axgroup, a significant effect of Bxgroup and a significant interaction effect of AxBxgroup.

So now I’d like to follow up on this significant interaction and test the simple effects, but I can’t find any recommendations about simple effects within a single contrast after planned comparisons. Also no idea how to make SPSS do it. I did do a bunch of separate T-tests to look into the interesting effects (after plotting everything), but the reviewers want me to control for multiple comparisons (and probably rightfully so, since I did do 7 t-tests). Also the separate T-tests are probably sub-optimal, since they use the error term within the 2 (pooled) groups and not the population error (over the 5 groups).

I did 4 paired t-test testing
within C(1), within A(1): B1 versus B2, P < 0.001
within C(1), within A(2): B1 vs B2, P < 0.001
within C(2), within A(1): B1 vs B2, P=0.046
within C(2), within A(2): B1 vs B2, P=0.34
And 3 unpaired t-tests:
within A(1), B2-B1, tested C1 versus C2 (where C1 are the 2 pooled groups and C2 are the 3 pooled groups), P=0.03
within A(2), B2-B1, tested C1 versus C2, P<0.001
within B(1), A2-A1, tested C1 versus C2, P=0.063

So my questions are:
1) is there a better way for follow up analysis instead of the t-tests? I think I need to do simple effects analysis, but since it's after the planned comparisons I already have, I don't know how to do this in SPSS, and if this even makes sense. What should my dfs be for the 7 follow up tests, so I can check if the commands work? The df for the planned contrast is total N (over 5 groups) – 5, but what for the follow up simple effects?
2) What about correction for multiple comparisons? I'm confused in determining which are orthogonal. The 7 follow up tests were planned after looking at the data, to interpret the interaction, so they are post-hoc in a way. Note, I did not correct for the contrasts, since those are orthogonal and were pre-planned, based on my hypotheses. Also, for the other contrasts I don't need any follow up comparisons. All effects are non-significant (which is as should be, since those are the control conditions which should not differ).

Hope this makes sense.
Thanks!
Ingrid

Reply

Karen

Hi Ingrid,

There are just enough specific details in there that I would really have to talk with you in consultation to answer those questions accurately. I have no idea off the top of my head what the best approach would be.

But I’ll try to give you a few nudges that may head you in the right direction. If you need more than that, feel free to set up a consultation.

Your planned contrast sounds like a main effect, not an interaction effect, although I may be misunderstanding it.

It does indeed sound like your t-tests are simple effects tests, but to truly make them simple effects, you need to replace the MSE from the overall ANOVA. That is extremely tricky (and I don’t remember if it can be done) with within-subjects effects because there isn’t a single MSE for the entire model.

You could always just do a Bonferroni correction to adjust for the multiple comparisons. This may lead to type 2 errors, of course, especially for the p values that are just below .05. Actually, any correction will make a p=.046 move beyond .05, and likely the .03 as well. The first two t-tests are definitely orthogonal to each other, as the the third and fourth, but beyond that, I’d have to figure it out as well. 🙂

Karen

Reply

James Brown

Hi Karen,

Just to clarify- can you explain why simple effects do not need adjustments for multiple comparisons? If there are any sources other than the Keppel book (which I cant get hold of at the moment) that would be great. Thank you.

Reply

Karen

Hi James,

The simple answer is the point here is not to compare every possible mean to each other post-hoc. It’s to make several planned comparisons, which should be orthogonal to each other.

It’s when you’re using the same group mean in multiple comparisons that you have to make adjustments.

Karen

Reply

Nuria

Hi Karen,

Do you know if it is possible to obtain a standard desviation higher than your max score?

In a word reading test with max score 40, I got a mean of 39.17 with an SD of 1.10. But I don’t really understand how you can obtain that.

Thanks in advance!

Reply

Pieter

HI Karen,

Quick question (hopefully you won’t need too much info for this). I have 10 different treatments tested over 5 times, so that would give me a 10×5 design. I did a repeated measures ANOVA and in some cases, do not have an interaction effect. Can you still measure for simple main effects?

By not doing them, the main effects is just too broad to help illustrate the data, and the simple effects allow me to see where significant differences occur. Is this possible?

Thanks for the help in advance!

Reply

Karen

Hi Pieter,

I assume you mean you’d like to compare the treatments at each individual time point, even though there’s no significant interaction and no main effects for treatment.

It really depends on what you want to test. If it really only makes theoretical sense to compare the treatments at each time point, then really, you can do those simple effects tests as a priori contrasts. If you’re just running them post hoc because you didn’t have significant main effects or interactions, that’s problematic.

So you’ll have to really make a case that this analysis is answering the research question. Reviewers will assume you’re doing the latter, so you’ll have to be very convincing. 🙂

Karen

Reply

Luce

Hi Karen,
You seem to answer questions much faster than the teacher I consult for stats, and my thesis advisor is not much better than me for this, so I will try to explain my problem as simply as possible.
I am working on the question of coping style as moderator of the effect of abuse experiences on emotional disorders with a sample of 330 adolescents, (120 girls, 210 boys).
Variables involve types of coping (task-oriented, avoidance, etc.), types of abuse (emotional, physical)…
Was working with a design involving looking first if the interaction Coping x Abuse is different for girls vs boys:
so first Sex x Coping x Abuse (for each type of abuse, coping, and with Sex x Coping, Sex x Abuse, Coping x Abuse tested before the 3 way)

If the 3 way interaction is significant, means Coping x Abuse should be different for boys or girls, ie significant for one and not the other… So I do 2 separate regressions to test it.
If the 3 way interaction is not significant, then I test Coping x Abuse for the whole sample.
Did all this, found a few 3 way, and mostly 2 ways interactions, tested the slopes, etc.

But then: exploring outside of this box, so to say, I found for ex that Task oriented Coping x Emotional Abuse x Sex was not significant, but checking the CopingxAbuse interaction in a split file Girls\Boys: CopingxAbuse was significant for girls (p=.019) and not for boys (p=.17). In other regressions (other types of coping\abuse), the same happened, but with even more different significance levels (.03 vs .50), even if, again, the 3 way interaction was not significant at first.

I wonder, then if I shouldn’t just do separate regressions for each gender, instead of testing if CopingxAbuse is different depending on gender by testing the CopingxAbusexSex. But separating the sample for each regression implies a loss of power. But then the main effects and interaction effects are quite different between boys and girls for most coping and abuse variables…

I wonder if the differences between the samples of boys and girls (N, but also much more abuse in the girl sample), and the different patterns of correlations (some abuse more related to one sex, some to the other), wouldn’t justify to do separate moderation analysis and forget about the whole 3 way interaction testing method, which seems to provide dubious results in this situation.

Sorry to be so long, I wanted it to be clear and am quite confused about the best way to analyze this.

Thanks if you have time to think it over
Luce

Reply

Karen

Hi Luce,

Depends on the day how fast I am at responding, but I do my best. 🙂

And yes, thanks for being so clear. It really helps.

Theoretically, you *should* be getting the same results both ways, and if there are any differences, you should have more power, and more significant effects when the samples are combined.

It is of course, often much easier to interpret and explain separate models. What you really lose by running them separately is the ability to compare the coefficients. So by running them separately, you can say, the effect of X is positive for girls, and negative for boys, but you can’t say those coefficients are significantly different.

So if the research question is *about* the girl/boy comparisons, I’d run them together, even if the p-values don’t come out. But if you’re really just interested in the sex*coping interaction, and were just checking for sex differences, it would be fine to run them separately. It’s more like you’re replicating your study with boys and girls.

Karen

Reply

Karen

Hi Nuria,

Not sure what you mean–compare the simple effects between what?

Karen

Reply

Nuria

Some author (Pardo, Garrido, Ruiz and Martín, 2007) suggest to compare the simple effects between them by doing an unifactor ANOVA, and then look the post hoc.

Nuria

Reply

Nuria

Thanks Karen, I will do the simple effetcs.

Do you think it is good idea to compare afterwards the simple effects beetween them?

Thanks again! Nuria

Reply

Nuria

Thanks Karen for your reply. My question is how is the best way to analyze the interaction in my study, to do simple effects analysis by syntax? or planned comparisions?

Thanks again! Nuria

Reply

Karen

Hi Nuria,

If you already have a significant interaction and it’s a 2×2, I would just do the simple effects.

Karen

Reply

Nuria

Hi!

I would like to ask you a question about the interaction in ANOVA repeated measures. In my research I have a 3(method intervention) x 2
(time) design. I did ANOVA repeated measures I got a significant main
effect for time, but not for method, as well as a significant interaction.
Then, what should I do to analyze the interaction? It is required to do
simple main analysis because you dont have signicant main effect in one of the VI? Could you look planned comparations too?

Thanks!

Reply

Karen

Hi Nuria,

You don’t need a significant main effect to test the simple effects analysis. Simple effects are very similar to planned comparisons–they’re just specific contrasts you do to understand the nature of the interaction. So if you got an interaction for time*method, but no main effect for method, you probably have a crossover interaction.

Karen

Reply

Rohan Puri

Hey Karen,

I have a significant 3 way interaction effect. For 2 way interaction effect, I am aware of using the following syntax code for simple effect analyses and then conducting pairwise comparisons.

/EMMEANS=TABLES(A*B) COMPARE(A) ADJ(BONFERRONI/SIDAK)
/EMMEANS=TABLES(A*B) COMPARE(B) ADJ(BONFERRONI/SIDAK)

However, for a 3 way interaction, is there a method to use code rather than conducting multiple 2 way ANOVAs at each level of the remaining factor manually ?

Thank you, your help is highly appreciated, warm regards

Reply

Karen

Hi Rohan,

The 3-way interaction basically says that the 2 way interactions differ for each value of the 3rd variable. So I would first of all start by graphing it, then look at where the mean differences in the two-ways differ.

You can use /EMMEANS=TABLES(A*B*C) COMPARE(A) ADJ(BONFERRONI/SIDAK)

This will give you the mean difference between each pair of means across A for every combination of B& C. It’s not a direct test, but it can give you an idea of the nature of the interaction.

Karen

Reply

Melissa

Hi Karen,

I have a 3(age group) x 2 (gender) design, which yielded a significant main effect for age, but not for gender, as well as a significant interaction. I then did a Tukey HSD test which told me that each of the age groups was significantly different from each other. I then used the syntax method for calculating simple effects comparing the interaction with age groups, and discovered a significant gender difference in one of my age groups, which didn’t show up obviously in the overall analyses. I got curious, so I isolated each of my genders and ran one-way ANOVAs and then Tukey HSD on each of the genders, and confirmed my finding of a significant difference between gender for one particular age group. My problem is that I obtained different significance values with the simple effects analysis than I did for the Tukey HSD tests with the one-way ANOVAs, and am unsure as to which I should be reporting. I am confused by looking at the output from SPSS for the simple effects analysis as I’m not sure what figure I would be reporting in any case.

I have not read anywhere about verifying further effects with using multiple one-way ANOVAs, and not sure if this is frowned upon. Any help would be greatly appreciated, thanks!

Reply

Karen

Hi Melissa,

This is a great question.

The fact that you have a gender difference in one age group, but not the others is exactly what the significant interaction is telling you. An interaction says that the effect of one factor is not the same at all values of the second factor. In other words, the effect of gender isn’t the same in every age group. It also says the effect of age isn’t the same for both genders.

Okay, so which followup test do you report? Technically, it doesn’t matter whether you compare genders within age group or age groups within gender. The p-values won’t agree as one requires Tukey adjustments and the other doesn’t.

It’s a little simpler to report the gender comparisons within age group, because there are only two genders and three age groups, but sometimes it just makes more conceptual sense to answer your research question doing it the other way. I generally default to whichever comparison make the most sense intuitively and best answers the research question.

Reply

Oana

Hi Karen,

What do you recommend in the case of 3-way or 4-way interactions that include both repeated and between-participants measures?
Can effects be used at all? The approach I have been using so far to deal with (interesting) 4-way interactions is to run two ANOVAs in order to assess the 3-way interaction at each of the two levels of the fourth variable. But then I would have to repeat this for any 3-way interactions that I come across. Is there any way around this?

Thank you

Oana

Reply

Karen

Hi Oana,

Great question. If you’re using a repeated measures GLM approach, it’s much harder to run any simple effects because there isn’t a single MSE for a within subjects effect.

The approach you’re taking is truly a simple effects test of sorts, because as long as you’re replacing that MSE. So that sounds great. You may have to repeat it for the 3 way, but once you get down to the 2 way, you can just compare mean differences using EMMeans.

Reply

Cat

Hi Karen,

I have a significant three-way interaction surgery(2) X treatment(3) X days (4) but no significant two-ways or main effects. I decided to split my file across days and on day 1, I got a significant surgery X treatment interaction which I then pursued to investigate with simple effects tests…how do I type this up for an article? Do I mention splitting the file or do I just report that following a significant three-way, simple effects revealed that on Day 1, this and that happened…? basically I’m just wondering what is expected by reviewers to be reported and what can be omitted because it is implied. thanks,
Cat

Reply

Karen

Hi Cat,

It’s sometimes hard to say what reviewers want–it depends a lot by field as well as by individual. Dissertations require more specifics about the how that journals, in general. Partially b/c space is more of an issue and more b/c there is an assumption that journal readers should know (or find out) what simple effects tests are. – Karen

Reply

Melissa

Hi Karen,
I have a significant interaction 2×2 (2 between-subjects). My question: when running simple effects testing by running a split-file in SPSS on one IV & using the other one as my factor IV) , do I need to correct for familywise error since, essentially, these are post-hoc tests?
Also, how do I approach simple effects testing if I have a 2x2x2 (2 between- & 1 within-subjects)?
Thank you,
Melissa

Reply

Karen

Hi Melissa,

Technically simple effects tests aren’t post-hocs and don’t need multiple comparison adjustments. There is a really great chapter on simple effects testing in Geoffrey Keppel’s Design and Analysis book. I would suggest getting a hold of it.

For a 2x2x2, a significant interaction means the two 2x2s are significantly different. So you need to do simple effects testing separately on each 2×2. This is much easier to explain if I could draw!

Karen

Reply

Rachel

Hi Karen,

Apologies, I found a significant 2×3 interaction of the between subjects factors in the question left above.
Thanks
Rachna

Reply

Karen

Hi Rachel,

You can do simple effects tests of either the three pairwise comparisons (i.e. are the two means different in each of the three levels) or the two three-group comparisons. It really depends on your variable and which makes more sense.

If there isn’t one direction that makes more sense theoretically, doing the three pairwise comparisons is easiest. The fact that both these factors are between subjects means it will be easier in any case. It gets trickier when the within-subjects factor is involved.

A great reference is Geoffrey Keppel’s Design and Analysis book. There is a detailed chapter on simple effects testing.

Karen

Reply

Rachel

Hi Karen,
The design of my experiment involves a 2x2x3 analysis (2 between subjects, 1 within subjects factor). My repeated measures ANOVA revelaed no main effects but a significant 2×2 interaction of the between subjects factors. I’m unsure what kind of simple effects analysis I can do to uncover this interaction?
Thanks
Rachel

Reply

JIANG

Hi Karent,

thank you anyway for taking time to answer my questions and your forum is very interesting and helpful! For me the essential is that the expert can help me to find my errors in understanding and using statistics while the obtained number just above 0.05 or not doesn’t worry me so much. I dropped some groups not in aim to fish significant interaction but just to understand whether it makes sens (logical) or no in one situation i have to examine simple effect while at another situation i hav no “right” to examine the same phenomenon. In another term, i ‘m preparing to answer the reviewers if refused….
I’m reading Rosenthal’s contrast test in ANOVA where i have found an example similar to mine even though it’s difficult to find high-way anova with more than two levels for each factor…
with my respects !
tao

Reply

JIANG

Hi Karent,

Thank you so much for your answer. But I’m still wondered: I have 5 age classes, if i have put all age classes in analysis, i don’t get age related interaction while I eliminate class 2 and 4, i can get age related interaction while the simple effect for the class age 1 is always the same in these two situations. That means whether i can test effect for age 1 depending wheather i have measured class 2 and 4 , that’s not so logic. Yes, i was warned to be blocked by reviewers or alternatiely to associate a well-known person to the publication (yes, we have found in well-known journal the application of statistics on three data recorded!). But for me, to understand my data and publication are two things. In the text book, it’s rather difficult to find high way analysis example and analysis them in deep. But the effect is so clear, if i have just measured the age class 1 population, i could reclain easily the effect but now i have measured too data.
You have clearly said very well that we tested two different related hypotheses. My question is whether to explore my data in deep, i can test these two hypotheses at the same time, i have any intention to not mention the absence of the interaction, but it’s another question: statistically significant dependance bewteen the factors,
sincerely thanks !
Tao

Reply

Karen

Hi Tao,

Yes, it’s always a good idea to explore your data deeply enough to know what is going on, whether or not you end up reporting those results. It’s really hard for me to make suggestions about what comparisons are appropriate and what are not. Dropping some groups may be very interesting or it may be data fishing. Basing it solely on significance without considering power, the research questions, and the design isn’t a good idea.

So I’m sorry I can’t be more helpful, but I’d really have to make sure I understood the entire context before I could advise. This is exactly why I have Quick Question consultations. 🙂

Karen

Reply

JIANG

Dear sir,

I met usually the contrary situation: without significant interaction but significant simple effect. With my exprimental design SEX(2)xSTATE(2)xAGE(5)xCATEGROY(3) (sex, age between-subjects), my question is that without signficant interaction (with age), can I still test simple STATE effects at each AGE, and if i find really a significant effect (planned contrast), how to interpret ?

Thank you very much for your insight !

Reply

Karen

Hi Jiang,

It really depends on the hypotheses you’re testing. If you set up the factorial design because you really need to test the various situations, then you can’t jump from a non-significant interaction to significant simple effects.

If these really are the contrasts you want to test, though, theoretically there is no problem with testing them directly. In my experience, though, most reviewers will be wary. They will think you’re jumping to the contrasts to avoid the non-significant interaction. So you will have to really make your case for not testing the interaction directly.

Karen

Reply

Amiyah Plyler

I loved your article.Really thank you! Much obliged.

Reply

Karen

Hi Tony,

You may not need a follow-up test. The regression coefficient for the interaction tell you the size of the difference in the slope of one variable for each one unit change in the other. That may be enough to interpret the interaction.

Karen

Reply

Tony

Hi Karen,

I have a question about a significiant interacation between 2 continuous variable as well. For my data, I found a significant interaction for 2 continuous variables and this interaction was testing a possible moderating effect. In this case, how do I conduct a follow-up test? Also, can I conclude that there is a moderating effect if the follow-up test is not significant?

Thanks!

Reply

FS

Hi Karen,

I am dealing with a continuous terms interaction. My interaction (between two continuous terms) is significant, but now my professor is interested in knowing whether at a higher value of one predictor, the level (lower vs higher levels of 2nd predictor) means of response are significantly different from each other. I have no idea. Everywhere I see, I only see stuff for two categorical predictors. I don’t see any examples for two continuous interaction term.
thank you.

Reply

Karen

Hi FS,

When you have two continuous predictors that interact, the interaction is saying that the slope of one variable differs at every value of the other.

The means your professor wants are really predicted values. You could use the EMMeans command in spss glm (this will only work in syntax, not the menus) or the lsmeans with a slice command in sas proc glm to get that comparison. So I would start with reading the manual to see how to do that in your software (you don’t mention which one you use).

It’s something we cover in the Interpreting (Even Tricky) Regression Coefficients Workshop. It would take me a while to explain, but you could either look into that workshop:
http://www.theanalysisinstitute.com/workshops/IRC/index.html

or sign up for a Quick Question consultation:
https://www.theanalysisfactor.com/statistical-consulting-services/quick-question-consultations/

if you want me to walk you through it.

Karen

Reply

Karen

Hi Sam,

It’s hard to explain this in the abstract, without looking at the same means together, but I’ll do my best. 🙂

First of all, I would suggest plotting your 12 means. I find interactions much easier to interpret if I can see the patterns. You have to be careful, because what looks different on a graph isn’t always significant, but even so, seeing the relationships among the means is very helpful.

The way to graph this is to have two separate graphs. Each one will be a 2×3 graph of A*C at one value of B (B1 and B2).

The three way interaction (if indeed that’s the one that’s significant) is saying the 2-way interactions in these two graphs are different. Maybe they’re *both* crossovers, but in different directions. If so, you’d have lots of significant interactions, but no main effects.

Your simple effects tests would then be comparing A1 to A2 at each value of C (C1, C2, C3) first in the B1 graph and then in B2.

I hope that helps!
Karen

Reply

Sam

Hello Karen,

I have been reading about this issue throughout today and feel I understand the situation on the 2×2 model but just cannot seem to apply this to my own model, a 2x2x3 mixed design, with A as a within-subjects factor; and B and C as both between-subjects factors.

I have this significant 3-way interaction but no significant simple effects when followed up, and although it is good to see this issue has arisen for others I can only seem to find examples/explanations on 2×2’s.

On the 2×2 as I understand I could look to see if it is a crossover interaction, and see if A1-B1 = 0 and if B1-B2 = 0; if one is positve and the other negative it is explained as a crossover interaction, and that it is known where the interaction is because there is only 2 differences in means, which were found to be significant.

On my 2x2x3 model: I have the means for each group, a total of 12. Would I need to do something like this for the mean differences:
A1-B1-C1; A1-B1-C2; A1-B1-C3
A2-B2-C1; A2-B2-C2; A2-B2-C3. Which gives me 6 mean differences scores, and then compare the paired sets, e.g:
A1-B1-C1 compared with A2-B2-C1 — to see if in each of these 3 pairs have the crossovers or not?

If this is the case and I do have crossovers, how would I then go about explaining it? In the answer above on here and on another forum, its mentioned anything more than the 2×2 (e.g. a 2×3) the case is not as simple in explaining – I think my case fits that, so would I explain that I have found the 3-way interaction, with further investigation the interaction is due to a crossover effect of the mean differences, however due to the model it is unsure excatly where it lies?!
If the crossover is not the case, can I then explain the inteaction due to sampling error or complex design, i.e the interaction getting lost in the noise of the data?

Thank you in advanced for any time you can spare on this issue,
Kind Regards,
Sam

Reply

David

Thanks Karen

I really appreciate your help, it is very easy to understand.

Reply

David

Hi there,

Thanks for this information, but Im having some trouble getting my head around the interpretations that are possible. I have found no main effects but a significant interaction

A = non significant
B = non significant
A x B = significant (p. = .03, effect size = .30)
The interaction is disordinal (A1 is higher than A2, but B1 is lower than B2).
No follow up tests are significant when using t tests, and I understand why based upon your discussion above. So I proceeded to do the following
A1-A2 vs B1-B2: significant
But A1-B1 vs A2-B2 was also significant: in fact it resulted in exactly the same t value and significance level. So both of these tests suggested I was somehow testing the same vales, although I did not.
Furthermore, assuming that I focus on A1-B1 vs A2-B2 as the comparison Im interested in, how do I actually interpret and write this up in a valid manner for a scientific journal? Do you have any examples of this in journals I could cite?

Thanks for the help, it is very much appreciated

Reply

Karen

David,

I can’t think of an example in the literature (example, anyone?) — I don’t always see that end of things.

The best way to write it up is to display the means (either with a graph or table–I like graphs, personally) and report the F statistic with p-value. Describe it as a crossover interaction.

Because it’s a 2×2, you don’t technically need any simple effects tests. The only way for the F stat for the interaction to be significant is for the differences in means to be significantly different.

So in your write up, you can focus on either A1-A2 not equal to B1-B2, or on A1-B1 not equal to A2-B2. The interaction says both are true, but usually in research, one of these comparisons is more meaningful.

I think your two t-tests are both testing the exact same thing as the F.

If you had, say, a 2×3 interaction, this wouldn’t be exactly the same.

Karen

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: