Interpreting Interactions: When the F test and the Simple Effects disagree.

Stage 2The way to follow up on a significant two-way interaction between two categorical variables is to check the simple effects.  Most of the time the simple effects tests give a very clear picture about the interaction.  Every so often, however, you have a significant interaction, but no significant simple effects.  It is not a logical impossibility. They are testing two different, but related hypotheses.

Assume your two independent variables are A and B.  Each has two values: 1 and 2.  The interaction is testing if A1 – B1 = A2 – B2 (the null hypothesis). The simple effects are testing whether A1-B1=0 and A2-B2=0 (null) or not.

If you have a crossover interaction, you can have A1-B1 slightly positive and A2-B2 slightly negative. While neither is significantly different from 0, they are significantly different from each other.

And it is highly useful for answering many research questions to know if the differences in the means in one condition equal the differences in the means for the other. It might be true that it’s not testing a hypothesis you’re interested in, but in many studies, all the interesting effects are in the interactions.


Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions


  1. Nicolas says

    Dear Karen,

    thanks for your blog and for this particular post. You said that having a significant interaction effect with no significant simple effects “is not a logical impossibility”. However, I was wondering whether it is very plausible in practice and to what extent such a situation can occur in real life. Indeed, you illustrated your post with a situation in which there is a significant crossover interaction with no significant simple effects, A1-B1 being “slightly positive” and A2-B2 being “slightly negative”. However, in such a situation the size of the interaction would not be very large and should require a very small 95% CI to rject H0. Have you some real data to illustrate such a situation?

    Thanks for your help



  2. Karina Kav says

    Hi Karen,

    Is it possible to give an example of a hypothesis which requires a simple effects analysis and an example of a hypothesis which requires a contrast analysis?

    Thanks in advance,

    Best regards

  3. Saranea says

    Hi Karen,
    Could you please help me with this question. I have run a 3*3 ANOVA. I have also chosen a simple planned contrast within SPSS to compare each variable to the first variable. I am not too sure if this is the results i should be reporting after looking at the main effects and interactions. Please help.

  4. Sophie says

    How do I interpret my results, when the interaction is significant and the means show the effect I want, but testing the simple effects only reveals a p = .80? Can I draw conclusions from my significant interaction or the ‘marginal significant’ p value (I don’t know if it is allowed to say that)?

  5. Emil says

    This website regularly provides amazingly clear/useful advice. Yet again, you helped me climb out of a bind. Keep up the good work!

  6. Nola says

    Hi Karen,
    This website is helpful and much needed for people like me.
    I am trying to determine whether I am meant to do a follow up power analysis for a one way anova and if so, how to interpret it.
    After I had a run a 3 x (emotional support) by 2 (sex) analysis of variance, the data showed significant main effects and interaction, so I split the file on the and did a one- way anova on sex (male and female) to explore the difference and contrast. I found that there was no significant difference for males on the DV but there was for females and I have written a report describing the effect as well as the F-ratio for the non significant data. So my dilemma is, do I now do a power analysis on the non significant male data? If so I would get only 68% chance of detecting a moderate effect size of n2= .06. If I was to conclude this, I would have to state that there would need to be more male participants in the study, however I already have 120 male and 120 female so I cant really conclude that there this is not sufficiently powerful because there is not enough male participants.
    Or do I just state in the conclusion that there is no significant evidence to suggest that males are effected by emotional support.
    Hope I’m making sense. Not sure how to go about this one.

  7. Hege Maasø says

    Hei Karen!
    I really need some help interpreting my findings. I have a multilevel model, depended variabel is Political trust and the most important independed variabel is corruption (CPI). The effect of corruption included alone is neagtive and significant. I also included an cross-level interaction between satisfied with government and corruption. This make the corruption variable positive and not significant, the interaction however is negative and significant (0,000). I don`t really understand this.. How should this be interpreted? Is it right to asume that corruption have a significant effect solely on high values of satisfied with government? How should I decide if the effect is significant? Really hope you can help me 🙂

  8. Laura says

    Hi Karen,

    Thank you very much for such a useful website. I have run a mixed design anova with time as the within subjects variable (3 time points) and group as the between subjects variable (2 groups). I have found a significant interaction effect but am unsure how to find out at what point the two groups differ. Can you advise on how to proceed? Many thanks.

  9. Lily says

    Hi Karen,
    Really happy to find this blog. I’m confused by the two-way mixed ANOVA results of my study. Would like to seek for help from you :).

    I conducted a 3 *3 two-way mixed-designed ANOVA on SPSS. The within subject variable is the ‘time point pre-and post- treatment’ and it has 3 levels (pre-treatment, 15 min post- and 30 min post-). The between subject variable is the ‘disease condition’ which also has 3 levels (healthy control, mild asthma, severe asthma). Dependent variable is a parameter from pulmonary function test (DV).The sample size of each group is different (HC=8, MA=10, SA=17). The assumptions are ok (normality, homogeneity of variation and Sphericity).

    The results from this two-way mixed anova show that 1) ‘disease condition’ has significant main effect (SA lower than the other two on averaged DV across 3 time points); 2) ‘time point of treatment’ does not have significant main effect; 3) there is no interaction.

    The treatment effect via time on different groups is my most interested bit. I was a little disappointing about this result 2) and 3), especially lacking interaction. Then I noticed that on the interpreting plots, the lung function parameter (DV) did show a clear decrease trend after treatment in severe asthma group (HS and MA also showed reduced trend but the slope is much gentler). I don’t know whether the absence of significant interaction and main effect of time point is ascribed to the small sample size ( which might not be able to provide enough statistic power for two-way mixed to pick up the significance in the change). So, I split the data according to disease condition and ran 3 independent one-way repeated ANOVA in each of disease groups to test the lung function parameter change via time point. The multiple one-way repeated ANOVA showed that in severe asthma group but not in mild asthma and healthy group, the ‘treatment via time’ is a main effect. The DV decreases significantly over the 3 time points in severe asthmatics but not in mild asthmatics and healthy controls.

    It seems there is a conflict between two-way mixed and multiple repeated one-way ANOVA in this study. Two way-mixed ANOVA tends to show that “DV did not differ between time points when ignoring disease condition (from 2)) and the effects of treatment via time on DV in different disease groups are not different (from 3)” while the 3 seperate one-way repeated ANOVA seem to show that “treatment significantly alters the DV in severe asthma group but not in MA and HC —kind of a reflection of different effect of treatment on DV in different disease group, i.e. interaction?”

    I know two-way mixed ANOVA should be the test of choice however I’m worried about the low statics power due to small and unequaled sample size. In addition, the multiple one-way repeated ANOVA does give me the results that I expect. I don’t know whether it is appropriate to perform three independent one-way repeated ANOVA rather than two-way mixed ANOVA in my study? If you suggest me to report the results of both two-way mixed and 3-independent one-way repeated, could you please give some suggestion in interpretation? Thank you

    Looking forward to your reply

  10. Megan says

    Hi Karen,

    I am going to conduct a 2x2x2 between subject AVONVA involving factors A , B and C . I hypothesize that the value of dependent variable in B1xC1 at A1 level is significantly higher than B1xC1 at A2 level whereas the value in B2xC2 at A2 level is higher than B2xC2 at A1 level. The analyses show 2 significant two way interactions for BxC and AxC but the 3 way interactions was non-significant. Is it ok to reject those hypotheses by the non-significant three-way interaction effect? Does it mean that the patterns of simple interaction of BxC are the same at A1 and A2 level? Or do I need to conduct any other tests to find out what’s happened regarding to my hypotheses?

    Thank you.

    • Karen says

      Hi Megan,

      Yes, the 3 way says that the pattern of the BxC interaction is the same at all values of A. If you’re actually just comparing a few specific means to answer your RQ, then you may want to do contrasts instead.

  11. emily warren says

    Hi Karen, would really appreciate your help.
    I have used a 2 x 2 x 2 mixed anova (two between subjects, and one within subjects). I have found significant main effects for the two within subjects but not for the between subjects, and no significant interactions. Would it be appropriate to run paired samples t-tests on each group separately to see if the within subject factors are still significant within each group, rather than over all? And also, I am intrigued to see why the between subjects factor did not have a main effect so would it be appropriate to run an independent samples t-test to compare the two groups in each condition for a finer understanding, or would this yield significant results from running too many analyses.
    Thank you in advance, emily.

    • Karen says

      Hi Emily,

      You’re right to worry about running too many tests. Rather than running a bunch of tests, I would start with graphing means to get an idea of what is going on.

  12. Emma Black says

    HI Karen,

    Thanks for your website which is really helpful. I ran a 2 x 3 mixed ANOVA, the within factor being time and the between factor age categories. I have found no significant main effects but I do have a significant interaction.

    When I split the fine by time points and run one way ANOVAs there are no significant differences between age groups for time point one and there are significant differences between the ages at time point two. This was the only way I could see to explore the results further, as the post hoc analysis from my first set of results did not indicate where the differences lay.

    Was this an appropriate follow-up analysis and if not can you point me in the right direction?

    Many thanks in advance,


  13. Maria says

    Hi Karen,
    I have used a 2×2 mixed ANOVA, the within-subjects factor was time and between subjects factor was the levels of processing. Both the factors was significant however there was no interaction between both of the factors. Is there anything I can report about the interaction, other then say there was no interaction?

    Thank you

  14. Cat says

    Hi Karen,
    Thanks so much for taking the time to answer all these questions, it is extremely useful!
    Here is my question. I have a significant three way, no significant two-ways or main effects. Based on plots I could tell where the differences was most likely coming from. Split the file, ran a two way…but it just neared significance…p=.066. I ran simple effects just to see what was going on and I had a significant difference between two groups…but I can’t report it can I? At this point all I have is this significant three way…is that where my analysis ends?

  15. Lee says

    Hi Karen,

    I have read through all of the questions on here and found them very helpful – thanks. However, I am still having difficulty reporting a two-way crossover interaction between two between-groups factors, say Gender (M/F) and Employment (Employed/Unemployed).

    When I graph the interaction I see that the change/slope from G1E1 (MaleEmployed) to G1E2 (MaleUnemployed) is steeper than the G2E1 (FemaleEmployed) to G2E2 (FemaleUnemployed) slope i.e that unemployment has more of an effect on my DV for males than it does on females.

    No matter how I split this up I do not get significant differences between any of the four cells as seems to be recommended here. I have tried:
    – Splitting file to look at effect of employment for males and females separately. For both gender groups, there is no significant difference in my DV by employment. And why should there be; the interaction doesn’t tell me that E1-E2 slope is significant for either of the gender groups. What it is tells me is that the E1-E2 slope is not the same BETWEEN the two gender groups
    – Creating four pairwise comparisons by recoding the two factors into a new variable i.e. one that labels the following: G1E1=1;G1E2=2;G2E1=3;G2E2=4. One-way ANOVA fails to show any difference (nor do the post-hoc).

    Neither seem to appropriate because what I want to test is the slopes, i.e. whether (G1E1 minus G1E2) is different from (G2E1 minus G2E2). This would be easy to do if the design was repeated measures, because all of the levels would be in different variables that could be subtracted, but this can’t be done when they are all independent measures.

    Any advice much appreciated!

    • Karen says

      Hi Lee,

      This is very common in crossover interactions. Neither slope is different from zero, but they are different from each other.

      The significant interaction IS saying that the two slopes are unequal–you’ve already tested it. If one slope is positive and the other negative, you don’t really need the simple effects tests.

        • Abigail says

          Hi Karen,

          Do you have any references to cite that it is not necessary to test for simple effects when the interaction term is significant and one slope is positive and the other negative?

          Thank you so much!

  16. Rebecca says

    Hi Karen.

    I am having difficulty understanding how to interpret some of my repeated measures outputs. I have a condition (2) x time (9) analysis. There was a significant main effect for condition but no significant effect for time and no significant interaction. I didn’t expect differences over time, but I am trying to figure out the interaction. Can I look at contrasts for the interactions or am I done once it’s not significant? Secondly, when writing the results (for publication), can I say there were differences between conditions or is this not true since there was no interaction?

    Thank you!

    • Karen says

      Hi Rebecca,

      1. You’re done. The general consensus is there is no reason to investigate the nature of a non-significant interactio.
      2. Yes, there were differences between conditions. What you’ve found is this difference between conditions exists across all time points.

  17. Rose says

    I have a case in which the F term for the interaction and the simple effects disagree, but in the opposite direction.

    I have a 2(difficulty) X 3(time) design. The interaction did not came out siginificnat. However, the simple effects reflect exactly what I predicted: 1-way ANOVA for time was significant in one difficulty level, but not in the other difficulty level.
    I am afraid that the non-significant interaction will be an obstacle in publishing the results.
    Since this specific direction of the interaction was planned (the time factor would influence the data in one difficulty level but not in the other one), I wondered if there is any alternative, planned analysis that would yield a statistic for an interaction term, with more power (so that the interaction term would hopefully be significant).

    Alternatively, is it meaningful/correct to report the simple effects without reporting the interaction?

    Many thanks!

      • Nicolas says

        Hi Rose and Lily,

        I was looking at the answers to Karen’s post to know whether someone discussed the case she mentioned (a significant interaction with no significant simple effects). I found your questions unanswered and I thought I can help you so here is my answer. Imagine you had a 2(A) x 2(B) factorial design. Testing the interaction consists of comparing the difference between the simple effect of A for B1 and the simple effect of A for B2 to the predicted effect size under H0, which is usually 0. (Note that this is completely equivalent to comparing the difference between the simple effect of B for A1 and the simple effect of B for A2 to the predicted effect size under H0.) Testing the simple effect of A for B1 consists of comparing the difference between A1 and A2 for B1 to the predicted effect size under H0. The same logic holds for testing other simple effects. Thus, testing interaction and simple effects address two different questions. Concluding that an interaction is statistically significant at .05 requires that the 95% confidence interval (CI) of the difference between the simple effects excludes the predicted effect size under H0. Concluding that a simple effect is statistically significant at .05 requires that the 95% CI of the simple effect excludes the predicted effect size under H0. Having a statistically significant simple effect and a non-statistically significant simple effect does not necessarily imply that the two simple effects are significantly different. This seems to be your case. This seems to be the case in your situation. You do not need to tests each simple effect separately to conclude you have an interaction effect.

        I am not sure whether I am clear enough or whether it helps. Nevertheless, you can look at this paper from Gelman & Stern (2006):

        Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.

  18. Maria says


    I have a significant interaction between a 2-level continuous variable (mood 1 = positive, 0 = neutral) and a continuous variable (length of music played). I would like to do a simple effects test to see if the effect of mood is significant at +1 and -1SD of my continuous variable.

    I started by standardizing the continuous outcome variable (money spent) and the continuous variable (length of music played).

    Then I created the following variables:

    Zbelow = lmusic + 1;
    xzbelow = mood*zbelow;

    Zabove = lmusic – 1;
    xzabove = mood*xzabove

    Then I tried doing the following regressions

    proc reg; model mspent = mood zbelow xzbelow;

    proc reg; model mspent = mood zabove xzabove;

    I am not sure what is going on but both regressions give exactly the same value even though one should test the effect of mood at +1SD of lmusic and the other should test the effect of mood at -1SD of lmusic. Any suggestions?

  19. D says

    I have a question that I may be able to ask in two ways, the first way might require the least effort on your part. My study compares a treatment and control group on one outcome variable over 3 periods (pre- post- and follow-up). I use IBM SPSS 20, and I have an interaction but no main effect. I found a resource that recommends a MANOVA syntax (see below) to explore “disordinal interactions” but they seem to refer only to between-groups (like gender vs. group), not mixed-model designs. The short question: can I still use the resource, simply using the non-repeated factor (that is, time) in place of one of the factors (like gender in the above 2 X 2)? I’m guessing “no.” More in-depth question: I’ve set up my data in SPSS to accommodate a mixed-model, so Tukey post-hocs can’t be done as there are only 2 groups (treatment and control). Could I run a different post-hoc with SPSS menus? If not, what kind of syntax might you recommend? Or am I just stuck trying to visually interpreting the graphs? Thanks.

    Resource says this syntax would work for 2X2.

    MANOVA score BY gender (1,2) treatment (1,2)
    /DESIGN = gender treatment gender BY treatment
    /DESIGN = gender treatment WITHIN gender(1), treatment WITHIN gender(2)
    /DESIGN = treatment gender WITHIN treatment(1), gender WITHIN treatment(2).

    • Karen says

      It’s hard for me to evaluate that code without seeing the full resource. I’m not sure what they’re trying to do in the MANOVA.

      If it were me, I’d run it in MIXED, and include an EMMEANS statement for the interaction, with a COMPARE option. That will allow you to compare the groups at each time point.

      You can never do a post-hoc for a 2-level main effect because you don’t need one. If the F test says there is a difference between the groups, you know which two groups are different.

  20. Nahla Ibrahim says

    I have two questions that I would highly appreciate of you can help please.

    y= b0+ b1*D+ b2*X +b3*D*X

    First, the interaction between a dummy variable and a continuous variable is significant at 10 level. I noticed in some papers they do F test b2+b3=0 (i.e. testing the sum of the two slopes).
    Is that test necessary when the interaction is significant?

    In my case b2 was negative and b3 was positive , so the F test of the sum was not significant. In all the papers I read which done F test, the sign of b2 and b3 were positive. So is this test the right one when we have coefficients of different signs?

    Many thanks in advance


    • Karen says

      Hi Nahla,

      I suspect that those papers want to test if the slope of the comparison group (where D=1) is significantly different from zero. If both b2 and b3 are positive AND if b2 is significantly different from zero, this is overkill.

      But in your case where b2 and b3 are of opposite signs, it would be useful.

  21. Joe says

    Hi Karem,

    I did a ANCOVA with one IV, which was grade level (3nd and 5th grade); one DV, word reading post-test, and the covariate was word reading pre-test. But, then I realised that the assumption does not reach, because the IV is not independent from the covariate. Am I right?

    Thank you, Joe

  22. Nuria says

    Hi Karen,

    I did a Repeated measures ANOVA with 1 within factor: Time (pre,post) and 2 between factor: Grade (2, 4) and Method (M1, M2 y M3). Should I report all the interactions? because the interaccion Time x Grade is not interesting.


  23. Leila says


    I had a quick question about interaction contrasts. I ran a 2(A) x 2(B) MANOVA and have 6 significant interactions. I want to follow them up with interaction contrast (A+ vs A-) vs (B+ vs B-). How do you run the interaction contrast in SPSS? Also, if the contrast is significant, would I follow it up with simple main effects, simple ANOVAs for A+ vs A- and B+ vs B- ?

    • Karen says

      Hi Leila, to answer the second question first, no, you don’t need to follow up with ANOVAs after a contrast.

      To get SPSS to do all this, you have to use the EMMeans statement with a COMPARE option. This is one of those things you can only do in syntax, not the menus. See the Command Syntax Reference for details.

  24. Tia says

    Hi Karen,

    I’m doing a 2×1 factorial between-subjects ANOVA. Two independent levels were gender and age on attitude behavior. My results were ALL non-significant. I’ve tried searching in books and online but nothing much is written on ‘what to do when your main effects AND interaction are both non-significant’. Please help me out, do I still do a simple effects test? What steps do I need to take after having done descriptives and having found a non-significant result.
    Thank you.

    • Karen says

      Hi Tia. The good news is you’re done. Just write it up as is. It’s possible you will still need to report the means, but no follow up tests are required.

  25. Nuria says

    Hi! I made a qualitative fluency scale with 7 items, each item with 4 opcions. There were 4 raters,to do inter-rater realiability analyses, it is better to do cronbach or Kappa statictic?

    Thanks in advance!

  26. Anja says

    Hi Karen,
    this seems a great website to hopefully answer my question:
    I have two studies testing a 2*2*2 mixed model design. in Study 1 the 3-way is not significant, in study 2 (more power etc), it is.
    The question is whether I can still interpret the simple effects (pairwise comparisons using the Emmeans syntax), if the 3-way is not significant. Our research is really interested in the actual discrepancy between the different scores and not so much at the actual 3-way interaction.
    – I saw the answer you gave to a similar question before (Jiang), but specifically again:
    1) can we interpret the simple effects if the 3-way is not significant and we are actually not really interested in the 3-way, but only the actual simple effects?
    2) Is the emmeans-syntax (compare… adj (bonferroni)) also the correct method for simple effects in mixed models? (rather than MANOVA… mwithin…)
    –> Where is literature that specifies these kind of particular cases? (i.e. that I could use as references in the paper, and also to understand the details…)

    I would be really grateful for an answer because I already searched multiple books / stats texts etc and consulted with colleagues, but nobody really knows the answer…

    • Karen says

      HI Anja,

      1) Just like I replied to Jiang, no, but why run a 3-way if you’re not interested in it? Every statistical test should attempt to answer a research question.
      2) That’s how I would do it.

      Literature? Hmm, I would start with Keppel, but you’re right–it may not be there. Textbooks generally cover what *to* do, not “can I do this other thing.” 🙂

      I would also look under Planned Contrasts, not simple effects. Simple effects are *always* about interpreting signficant interactions.

  27. Nuria says

    Thanks Karen for your quick answer.

    I don’t really know how to do syntax, do you know a web page or where could I look how to do a two-way ANCOVA?

    Thanks you vey much again. Nuria

    • Karen says

      Hi Nuria,

      I don’t. I would suggest running the model through the menus and paste the syntax instead of hitting “Okay.” Read the Command Syntax Reference manual for UNIANOVA.

      We do go over this in detail in one of my workshops (Running Regressions and ANCOVAs in SPSS GLM), but it’s too involved to explain here. You have to see it.


  28. Nuria says

    Hi Karen,

    I would like to ask you a question. Doing an ANCOVA with 3 GROUPS (G1, G2, G3) X 2 GRADES (2nd, 4th) with SPSS, if you get a significant interaction GROUP X GRADE, how can you analyze the interaction? do I need syntax?

    Thanks you very much! Nuria

    • Karen says

      Hi Nuria,

      Yes, to do, say pairwise comparisons across levels of one of the variables, you will need to use syntax. It’s one of those very useful things you can only do in syntax, not the menus.


  29. ingrid says

    Hi Karen,
    I love your site and explanations, hope you can help me with this one, I’m confused… I’m running a 3-Way mixed model ANOVA A(within subject, 2 levels)xB(within subject, 2 levels)xC (between subjects, 5 levels). I have specific hypothesis about the between subject variable (is C, I’ll call it group), so did 4 orthogonal planned contrast. In the first planned comparison I contrasted 2 groups versus the other 3 groups (design: +3 +3 -2 -2 -2) I used the syntax in SPSS to do this. I find no significant effect of Axgroup, a significant effect of Bxgroup and a significant interaction effect of AxBxgroup.

    So now I’d like to follow up on this significant interaction and test the simple effects, but I can’t find any recommendations about simple effects within a single contrast after planned comparisons. Also no idea how to make SPSS do it. I did do a bunch of separate T-tests to look into the interesting effects (after plotting everything), but the reviewers want me to control for multiple comparisons (and probably rightfully so, since I did do 7 t-tests). Also the separate T-tests are probably sub-optimal, since they use the error term within the 2 (pooled) groups and not the population error (over the 5 groups).

    I did 4 paired t-test testing
    within C(1), within A(1): B1 versus B2, P < 0.001
    within C(1), within A(2): B1 vs B2, P < 0.001
    within C(2), within A(1): B1 vs B2, P=0.046
    within C(2), within A(2): B1 vs B2, P=0.34
    And 3 unpaired t-tests:
    within A(1), B2-B1, tested C1 versus C2 (where C1 are the 2 pooled groups and C2 are the 3 pooled groups), P=0.03
    within A(2), B2-B1, tested C1 versus C2, P<0.001
    within B(1), A2-A1, tested C1 versus C2, P=0.063

    So my questions are:
    1) is there a better way for follow up analysis instead of the t-tests? I think I need to do simple effects analysis, but since it's after the planned comparisons I already have, I don't know how to do this in SPSS, and if this even makes sense. What should my dfs be for the 7 follow up tests, so I can check if the commands work? The df for the planned contrast is total N (over 5 groups) – 5, but what for the follow up simple effects?
    2) What about correction for multiple comparisons? I'm confused in determining which are orthogonal. The 7 follow up tests were planned after looking at the data, to interpret the interaction, so they are post-hoc in a way. Note, I did not correct for the contrasts, since those are orthogonal and were pre-planned, based on my hypotheses. Also, for the other contrasts I don't need any follow up comparisons. All effects are non-significant (which is as should be, since those are the control conditions which should not differ).

    Hope this makes sense.

    • Karen says

      Hi Ingrid,

      There are just enough specific details in there that I would really have to talk with you in consultation to answer those questions accurately. I have no idea off the top of my head what the best approach would be.

      But I’ll try to give you a few nudges that may head you in the right direction. If you need more than that, feel free to set up a consultation.

      Your planned contrast sounds like a main effect, not an interaction effect, although I may be misunderstanding it.

      It does indeed sound like your t-tests are simple effects tests, but to truly make them simple effects, you need to replace the MSE from the overall ANOVA. That is extremely tricky (and I don’t remember if it can be done) with within-subjects effects because there isn’t a single MSE for the entire model.

      You could always just do a Bonferroni correction to adjust for the multiple comparisons. This may lead to type 2 errors, of course, especially for the p values that are just below .05. Actually, any correction will make a p=.046 move beyond .05, and likely the .03 as well. The first two t-tests are definitely orthogonal to each other, as the the third and fourth, but beyond that, I’d have to figure it out as well. 🙂


  30. James Brown says

    Hi Karen,

    Just to clarify- can you explain why simple effects do not need adjustments for multiple comparisons? If there are any sources other than the Keppel book (which I cant get hold of at the moment) that would be great. Thank you.

    • Karen says

      Hi James,

      The simple answer is the point here is not to compare every possible mean to each other post-hoc. It’s to make several planned comparisons, which should be orthogonal to each other.

      It’s when you’re using the same group mean in multiple comparisons that you have to make adjustments.


  31. Nuria says

    Hi Karen,

    Do you know if it is possible to obtain a standard desviation higher than your max score?

    In a word reading test with max score 40, I got a mean of 39.17 with an SD of 1.10. But I don’t really understand how you can obtain that.

    Thanks in advance!

  32. Pieter says

    HI Karen,

    Quick question (hopefully you won’t need too much info for this). I have 10 different treatments tested over 5 times, so that would give me a 10×5 design. I did a repeated measures ANOVA and in some cases, do not have an interaction effect. Can you still measure for simple main effects?

    By not doing them, the main effects is just too broad to help illustrate the data, and the simple effects allow me to see where significant differences occur. Is this possible?

    Thanks for the help in advance!

    • Karen says

      Hi Pieter,

      I assume you mean you’d like to compare the treatments at each individual time point, even though there’s no significant interaction and no main effects for treatment.

      It really depends on what you want to test. If it really only makes theoretical sense to compare the treatments at each time point, then really, you can do those simple effects tests as a priori contrasts. If you’re just running them post hoc because you didn’t have significant main effects or interactions, that’s problematic.

      So you’ll have to really make a case that this analysis is answering the research question. Reviewers will assume you’re doing the latter, so you’ll have to be very convincing. 🙂


  33. Luce says

    Hi Karen,
    You seem to answer questions much faster than the teacher I consult for stats, and my thesis advisor is not much better than me for this, so I will try to explain my problem as simply as possible.
    I am working on the question of coping style as moderator of the effect of abuse experiences on emotional disorders with a sample of 330 adolescents, (120 girls, 210 boys).
    Variables involve types of coping (task-oriented, avoidance, etc.), types of abuse (emotional, physical)…
    Was working with a design involving looking first if the interaction Coping x Abuse is different for girls vs boys:
    so first Sex x Coping x Abuse (for each type of abuse, coping, and with Sex x Coping, Sex x Abuse, Coping x Abuse tested before the 3 way)

    If the 3 way interaction is significant, means Coping x Abuse should be different for boys or girls, ie significant for one and not the other… So I do 2 separate regressions to test it.
    If the 3 way interaction is not significant, then I test Coping x Abuse for the whole sample.
    Did all this, found a few 3 way, and mostly 2 ways interactions, tested the slopes, etc.

    But then: exploring outside of this box, so to say, I found for ex that Task oriented Coping x Emotional Abuse x Sex was not significant, but checking the CopingxAbuse interaction in a split file Girls\Boys: CopingxAbuse was significant for girls (p=.019) and not for boys (p=.17). In other regressions (other types of coping\abuse), the same happened, but with even more different significance levels (.03 vs .50), even if, again, the 3 way interaction was not significant at first.

    I wonder, then if I shouldn’t just do separate regressions for each gender, instead of testing if CopingxAbuse is different depending on gender by testing the CopingxAbusexSex. But separating the sample for each regression implies a loss of power. But then the main effects and interaction effects are quite different between boys and girls for most coping and abuse variables…

    I wonder if the differences between the samples of boys and girls (N, but also much more abuse in the girl sample), and the different patterns of correlations (some abuse more related to one sex, some to the other), wouldn’t justify to do separate moderation analysis and forget about the whole 3 way interaction testing method, which seems to provide dubious results in this situation.

    Sorry to be so long, I wanted it to be clear and am quite confused about the best way to analyze this.

    Thanks if you have time to think it over

    • Karen says

      Hi Luce,

      Depends on the day how fast I am at responding, but I do my best. 🙂

      And yes, thanks for being so clear. It really helps.

      Theoretically, you *should* be getting the same results both ways, and if there are any differences, you should have more power, and more significant effects when the samples are combined.

      It is of course, often much easier to interpret and explain separate models. What you really lose by running them separately is the ability to compare the coefficients. So by running them separately, you can say, the effect of X is positive for girls, and negative for boys, but you can’t say those coefficients are significantly different.

      So if the research question is *about* the girl/boy comparisons, I’d run them together, even if the p-values don’t come out. But if you’re really just interested in the sex*coping interaction, and were just checking for sex differences, it would be fine to run them separately. It’s more like you’re replicating your study with boys and girls.


    • Nuria says

      Some author (Pardo, Garrido, Ruiz and Martín, 2007) suggest to compare the simple effects between them by doing an unifactor ANOVA, and then look the post hoc.


  34. Nuria says

    Thanks Karen, I will do the simple effetcs.

    Do you think it is good idea to compare afterwards the simple effects beetween them?

    Thanks again! Nuria

  35. Nuria says

    Thanks Karen for your reply. My question is how is the best way to analyze the interaction in my study, to do simple effects analysis by syntax? or planned comparisions?

    Thanks again! Nuria

  36. Nuria says


    I would like to ask you a question about the interaction in ANOVA repeated measures. In my research I have a 3(method intervention) x 2
    (time) design. I did ANOVA repeated measures I got a significant main
    effect for time, but not for method, as well as a significant interaction.
    Then, what should I do to analyze the interaction? It is required to do
    simple main analysis because you dont have signicant main effect in one of the VI? Could you look planned comparations too?


    • Karen says

      Hi Nuria,

      You don’t need a significant main effect to test the simple effects analysis. Simple effects are very similar to planned comparisons–they’re just specific contrasts you do to understand the nature of the interaction. So if you got an interaction for time*method, but no main effect for method, you probably have a crossover interaction.


  37. Rohan Puri says

    Hey Karen,

    I have a significant 3 way interaction effect. For 2 way interaction effect, I am aware of using the following syntax code for simple effect analyses and then conducting pairwise comparisons.


    However, for a 3 way interaction, is there a method to use code rather than conducting multiple 2 way ANOVAs at each level of the remaining factor manually ?

    Thank you, your help is highly appreciated, warm regards

    • Karen says

      Hi Rohan,

      The 3-way interaction basically says that the 2 way interactions differ for each value of the 3rd variable. So I would first of all start by graphing it, then look at where the mean differences in the two-ways differ.


      This will give you the mean difference between each pair of means across A for every combination of B& C. It’s not a direct test, but it can give you an idea of the nature of the interaction.


  38. Melissa says

    Hi Karen,

    I have a 3(age group) x 2 (gender) design, which yielded a significant main effect for age, but not for gender, as well as a significant interaction. I then did a Tukey HSD test which told me that each of the age groups was significantly different from each other. I then used the syntax method for calculating simple effects comparing the interaction with age groups, and discovered a significant gender difference in one of my age groups, which didn’t show up obviously in the overall analyses. I got curious, so I isolated each of my genders and ran one-way ANOVAs and then Tukey HSD on each of the genders, and confirmed my finding of a significant difference between gender for one particular age group. My problem is that I obtained different significance values with the simple effects analysis than I did for the Tukey HSD tests with the one-way ANOVAs, and am unsure as to which I should be reporting. I am confused by looking at the output from SPSS for the simple effects analysis as I’m not sure what figure I would be reporting in any case.

    I have not read anywhere about verifying further effects with using multiple one-way ANOVAs, and not sure if this is frowned upon. Any help would be greatly appreciated, thanks!

    • Karen says

      Hi Melissa,

      This is a great question.

      The fact that you have a gender difference in one age group, but not the others is exactly what the significant interaction is telling you. An interaction says that the effect of one factor is not the same at all values of the second factor. In other words, the effect of gender isn’t the same in every age group. It also says the effect of age isn’t the same for both genders.

      Okay, so which followup test do you report? Technically, it doesn’t matter whether you compare genders within age group or age groups within gender. The p-values won’t agree as one requires Tukey adjustments and the other doesn’t.

      It’s a little simpler to report the gender comparisons within age group, because there are only two genders and three age groups, but sometimes it just makes more conceptual sense to answer your research question doing it the other way. I generally default to whichever comparison make the most sense intuitively and best answers the research question.

  39. Oana says

    Hi Karen,

    What do you recommend in the case of 3-way or 4-way interactions that include both repeated and between-participants measures?
    Can effects be used at all? The approach I have been using so far to deal with (interesting) 4-way interactions is to run two ANOVAs in order to assess the 3-way interaction at each of the two levels of the fourth variable. But then I would have to repeat this for any 3-way interactions that I come across. Is there any way around this?

    Thank you


    • Karen says

      Hi Oana,

      Great question. If you’re using a repeated measures GLM approach, it’s much harder to run any simple effects because there isn’t a single MSE for a within subjects effect.

      The approach you’re taking is truly a simple effects test of sorts, because as long as you’re replacing that MSE. So that sounds great. You may have to repeat it for the 3 way, but once you get down to the 2 way, you can just compare mean differences using EMMeans.

  40. Cat says

    Hi Karen,

    I have a significant three-way interaction surgery(2) X treatment(3) X days (4) but no significant two-ways or main effects. I decided to split my file across days and on day 1, I got a significant surgery X treatment interaction which I then pursued to investigate with simple effects tests…how do I type this up for an article? Do I mention splitting the file or do I just report that following a significant three-way, simple effects revealed that on Day 1, this and that happened…? basically I’m just wondering what is expected by reviewers to be reported and what can be omitted because it is implied. thanks,

    • Karen says

      Hi Cat,

      It’s sometimes hard to say what reviewers want–it depends a lot by field as well as by individual. Dissertations require more specifics about the how that journals, in general. Partially b/c space is more of an issue and more b/c there is an assumption that journal readers should know (or find out) what simple effects tests are. – Karen

  41. Melissa says

    Hi Karen,
    I have a significant interaction 2×2 (2 between-subjects). My question: when running simple effects testing by running a split-file in SPSS on one IV & using the other one as my factor IV) , do I need to correct for familywise error since, essentially, these are post-hoc tests?
    Also, how do I approach simple effects testing if I have a 2x2x2 (2 between- & 1 within-subjects)?
    Thank you,

    • Karen says

      Hi Melissa,

      Technically simple effects tests aren’t post-hocs and don’t need multiple comparison adjustments. There is a really great chapter on simple effects testing in Geoffrey Keppel’s Design and Analysis book. I would suggest getting a hold of it.

      For a 2x2x2, a significant interaction means the two 2x2s are significantly different. So you need to do simple effects testing separately on each 2×2. This is much easier to explain if I could draw!


  42. Rachel says

    Hi Karen,

    Apologies, I found a significant 2×3 interaction of the between subjects factors in the question left above.

    • Karen says

      Hi Rachel,

      You can do simple effects tests of either the three pairwise comparisons (i.e. are the two means different in each of the three levels) or the two three-group comparisons. It really depends on your variable and which makes more sense.

      If there isn’t one direction that makes more sense theoretically, doing the three pairwise comparisons is easiest. The fact that both these factors are between subjects means it will be easier in any case. It gets trickier when the within-subjects factor is involved.

      A great reference is Geoffrey Keppel’s Design and Analysis book. There is a detailed chapter on simple effects testing.


  43. Rachel says

    Hi Karen,
    The design of my experiment involves a 2x2x3 analysis (2 between subjects, 1 within subjects factor). My repeated measures ANOVA revelaed no main effects but a significant 2×2 interaction of the between subjects factors. I’m unsure what kind of simple effects analysis I can do to uncover this interaction?

  44. JIANG says

    Hi Karent,

    thank you anyway for taking time to answer my questions and your forum is very interesting and helpful! For me the essential is that the expert can help me to find my errors in understanding and using statistics while the obtained number just above 0.05 or not doesn’t worry me so much. I dropped some groups not in aim to fish significant interaction but just to understand whether it makes sens (logical) or no in one situation i have to examine simple effect while at another situation i hav no “right” to examine the same phenomenon. In another term, i ‘m preparing to answer the reviewers if refused….
    I’m reading Rosenthal’s contrast test in ANOVA where i have found an example similar to mine even though it’s difficult to find high-way anova with more than two levels for each factor…
    with my respects !

  45. JIANG says

    Hi Karent,

    Thank you so much for your answer. But I’m still wondered: I have 5 age classes, if i have put all age classes in analysis, i don’t get age related interaction while I eliminate class 2 and 4, i can get age related interaction while the simple effect for the class age 1 is always the same in these two situations. That means whether i can test effect for age 1 depending wheather i have measured class 2 and 4 , that’s not so logic. Yes, i was warned to be blocked by reviewers or alternatiely to associate a well-known person to the publication (yes, we have found in well-known journal the application of statistics on three data recorded!). But for me, to understand my data and publication are two things. In the text book, it’s rather difficult to find high way analysis example and analysis them in deep. But the effect is so clear, if i have just measured the age class 1 population, i could reclain easily the effect but now i have measured too data.
    You have clearly said very well that we tested two different related hypotheses. My question is whether to explore my data in deep, i can test these two hypotheses at the same time, i have any intention to not mention the absence of the interaction, but it’s another question: statistically significant dependance bewteen the factors,
    sincerely thanks !

    • Karen says

      Hi Tao,

      Yes, it’s always a good idea to explore your data deeply enough to know what is going on, whether or not you end up reporting those results. It’s really hard for me to make suggestions about what comparisons are appropriate and what are not. Dropping some groups may be very interesting or it may be data fishing. Basing it solely on significance without considering power, the research questions, and the design isn’t a good idea.

      So I’m sorry I can’t be more helpful, but I’d really have to make sure I understood the entire context before I could advise. This is exactly why I have Quick Question consultations. 🙂


  46. JIANG says

    Dear sir,

    I met usually the contrary situation: without significant interaction but significant simple effect. With my exprimental design SEX(2)xSTATE(2)xAGE(5)xCATEGROY(3) (sex, age between-subjects), my question is that without signficant interaction (with age), can I still test simple STATE effects at each AGE, and if i find really a significant effect (planned contrast), how to interpret ?

    Thank you very much for your insight !

    • Karen says

      Hi Jiang,

      It really depends on the hypotheses you’re testing. If you set up the factorial design because you really need to test the various situations, then you can’t jump from a non-significant interaction to significant simple effects.

      If these really are the contrasts you want to test, though, theoretically there is no problem with testing them directly. In my experience, though, most reviewers will be wary. They will think you’re jumping to the contrasts to avoid the non-significant interaction. So you will have to really make your case for not testing the interaction directly.


  47. Karen says

    Hi Tony,

    You may not need a follow-up test. The regression coefficient for the interaction tell you the size of the difference in the slope of one variable for each one unit change in the other. That may be enough to interpret the interaction.


  48. Tony says

    Hi Karen,

    I have a question about a significiant interacation between 2 continuous variable as well. For my data, I found a significant interaction for 2 continuous variables and this interaction was testing a possible moderating effect. In this case, how do I conduct a follow-up test? Also, can I conclude that there is a moderating effect if the follow-up test is not significant?


  49. FS says

    Hi Karen,

    I am dealing with a continuous terms interaction. My interaction (between two continuous terms) is significant, but now my professor is interested in knowing whether at a higher value of one predictor, the level (lower vs higher levels of 2nd predictor) means of response are significantly different from each other. I have no idea. Everywhere I see, I only see stuff for two categorical predictors. I don’t see any examples for two continuous interaction term.
    thank you.

    • Karen says

      Hi FS,

      When you have two continuous predictors that interact, the interaction is saying that the slope of one variable differs at every value of the other.

      The means your professor wants are really predicted values. You could use the EMMeans command in spss glm (this will only work in syntax, not the menus) or the lsmeans with a slice command in sas proc glm to get that comparison. So I would start with reading the manual to see how to do that in your software (you don’t mention which one you use).

      It’s something we cover in the Interpreting (Even Tricky) Regression Coefficients Workshop. It would take me a while to explain, but you could either look into that workshop:

      or sign up for a Quick Question consultation:

      if you want me to walk you through it.


  50. Karen says

    Hi Sam,

    It’s hard to explain this in the abstract, without looking at the same means together, but I’ll do my best. 🙂

    First of all, I would suggest plotting your 12 means. I find interactions much easier to interpret if I can see the patterns. You have to be careful, because what looks different on a graph isn’t always significant, but even so, seeing the relationships among the means is very helpful.

    The way to graph this is to have two separate graphs. Each one will be a 2×3 graph of A*C at one value of B (B1 and B2).

    The three way interaction (if indeed that’s the one that’s significant) is saying the 2-way interactions in these two graphs are different. Maybe they’re *both* crossovers, but in different directions. If so, you’d have lots of significant interactions, but no main effects.

    Your simple effects tests would then be comparing A1 to A2 at each value of C (C1, C2, C3) first in the B1 graph and then in B2.

    I hope that helps!

  51. Sam says

    Hello Karen,

    I have been reading about this issue throughout today and feel I understand the situation on the 2×2 model but just cannot seem to apply this to my own model, a 2x2x3 mixed design, with A as a within-subjects factor; and B and C as both between-subjects factors.

    I have this significant 3-way interaction but no significant simple effects when followed up, and although it is good to see this issue has arisen for others I can only seem to find examples/explanations on 2×2’s.

    On the 2×2 as I understand I could look to see if it is a crossover interaction, and see if A1-B1 = 0 and if B1-B2 = 0; if one is positve and the other negative it is explained as a crossover interaction, and that it is known where the interaction is because there is only 2 differences in means, which were found to be significant.

    On my 2x2x3 model: I have the means for each group, a total of 12. Would I need to do something like this for the mean differences:
    A1-B1-C1; A1-B1-C2; A1-B1-C3
    A2-B2-C1; A2-B2-C2; A2-B2-C3. Which gives me 6 mean differences scores, and then compare the paired sets, e.g:
    A1-B1-C1 compared with A2-B2-C1 — to see if in each of these 3 pairs have the crossovers or not?

    If this is the case and I do have crossovers, how would I then go about explaining it? In the answer above on here and on another forum, its mentioned anything more than the 2×2 (e.g. a 2×3) the case is not as simple in explaining – I think my case fits that, so would I explain that I have found the 3-way interaction, with further investigation the interaction is due to a crossover effect of the mean differences, however due to the model it is unsure excatly where it lies?!
    If the crossover is not the case, can I then explain the inteaction due to sampling error or complex design, i.e the interaction getting lost in the noise of the data?

    Thank you in advanced for any time you can spare on this issue,
    Kind Regards,

  52. David says

    Hi there,

    Thanks for this information, but Im having some trouble getting my head around the interpretations that are possible. I have found no main effects but a significant interaction

    A = non significant
    B = non significant
    A x B = significant (p. = .03, effect size = .30)
    The interaction is disordinal (A1 is higher than A2, but B1 is lower than B2).
    No follow up tests are significant when using t tests, and I understand why based upon your discussion above. So I proceeded to do the following
    A1-A2 vs B1-B2: significant
    But A1-B1 vs A2-B2 was also significant: in fact it resulted in exactly the same t value and significance level. So both of these tests suggested I was somehow testing the same vales, although I did not.
    Furthermore, assuming that I focus on A1-B1 vs A2-B2 as the comparison Im interested in, how do I actually interpret and write this up in a valid manner for a scientific journal? Do you have any examples of this in journals I could cite?

    Thanks for the help, it is very much appreciated

    • Karen says


      I can’t think of an example in the literature (example, anyone?) — I don’t always see that end of things.

      The best way to write it up is to display the means (either with a graph or table–I like graphs, personally) and report the F statistic with p-value. Describe it as a crossover interaction.

      Because it’s a 2×2, you don’t technically need any simple effects tests. The only way for the F stat for the interaction to be significant is for the differences in means to be significantly different.

      So in your write up, you can focus on either A1-A2 not equal to B1-B2, or on A1-B1 not equal to A2-B2. The interaction says both are true, but usually in research, one of these comparisons is more meaningful.

      I think your two t-tests are both testing the exact same thing as the F.

      If you had, say, a 2×3 interaction, this wouldn’t be exactly the same.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.