Testing and Dropping Interaction Terms in Regression and ANOVA models

In a Regression model, should you drop interaction terms if they’re not significant?

In an ANOVA, adding interaction terms still leaves the main effects as main effects.  That is, as long as the data are balanced, the main effects and the interactions are independent.  The main effect is still telling you if there is an overall effect of that variable after accounting for other variables in the model.

But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects.  That means that the effect of one predictor is conditional on the value of the other.  The coefficient of the lower order term isn’t the effect of that term.  It’s the effect only when the other term in the interaction equals 0.

So if an interaction isn’t significant, should you drop it?

If you are just checking for the presence of an interaction to make sure you are specifying the model correctly, go ahead and drop it.  The interaction uses up df and changes the meaning of the lower order coefficients and complicates the model.  So if you were just checking for it, drop it.

But if you actually hypothesized an interaction that wasn’t significant, leave it in the model.  The insignificant interaction means something in this case–it helps you evaluate your hypothesis.  Taking it out can do more damage in specification error than in will in the loss of df.

The same is true in ANOVA models.

And as always, leave in any lower order terms, significant or not, for any higher order terms in the model.  That means you have to leave in all insignificant two-way interactions for any significant 3-ways.

 

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Reader Interactions

Comments

  1. Luo says

    I came across the following problem.
    An ordered logistic regression: Y= B1 *SEX+ B2*AGE+…….
    SEX is significant, but AGE not. This is good for me, because this confirms my hypothesis.

    I wanted to test another hypothesis, so I add an interaction term SEX*AGE to the regression. But now the interaction term is not significant. This is good for me, because the other hypothesis is confirmed. But meanwhile SEX is not significant.

    How to deal with thin kind of situation? So can you help me ?

  2. Maria says

    Hi, thanks a lot for this amazing website!
    What should I do if my main hypothesis is the threeway interaction and it is significant, but it becomes not significant when I add the two ways (not significant as weel). How can I justify that I want to keep only the three way? It is allowed? Why is this happening?
    Thanks a lot for your answer!

    • Jason S says

      Always include the lower order interactions, otherwise it is mis-attributing the variance to higher order interactions explained by lower order interactions (i.e. single factors and 2-way interactions). In your case, it sounds like the model is under-powered and results become insignificant with more DF used. Unfortunately, that just means it’s not reliable.

  3. Matt says

    Hi
    I’m still not sure how I should interpret this

    I have a model that includes an interaction term.

    r-squared = 0.7726
    gper15 p = 0.13
    iv p =0.00
    gper15 iv interaction p = 0.009
    constant p = 0.00

    How do I interpret the effect of gper15? Is this essentially telling me that gper15 is only significant when iv is greater than 0?

  4. Richard Anderson says

    I’m not sure I can make sense of this blog post.

    You wrote: “But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects. That means that the effect of one predictor is conditional on the value of the other.”

    However, “the effect of one predictor is conditional on the value of the other” is precisely what “interaction” means.

    • Karen Grace-Martin says

      Hi Richard,

      Yes, indeed. I was just reiterating what incarceration means there. The rest of that paragraph tells you the rest of the story–what the variable’s coefficient actually measures. It isn’t the entire effect of that predictor.

  5. Sarah says

    Hi, thank you very much for this informative content!
    Nevertheless, I have a question: in a multiple regression model, a (hypothesized) three-way-interaction was significant (b = 0.45, p = 0.033). When I tested this model vs. the model without the interaction-term (only main effects), the F-Test was not significant (delta R-squared = 0.04, F(4,112) = 1.431772, p = 0.2281229). Does this mean that the interaction term does not explain additional variance? Then my question would be: is my data consistent with the hypothesized three-way-interaction or is it not?
    Best regards and thanks for your help!

    • Karen Grace-Martin says

      Hi Sarah,

      This can get complicated. It’s possible that the three way is significant in the presence of the two-way interactions. But on its own, it isn’t adding a significant amount of variance. What you should do in this situation is hard to say without having a conversation about your research questions, design, and seeing your output.

  6. Hassen says

    Hi,
    I run the data on SAS Program, I find that the overall multiple regression model is not significant also none of the coefficients is significant. What should I do first to analyze my data?

  7. Kev Cogburn says

    I have time series data for males and females. I ran separate linear regressions (one for males, the other for females) and obtained the slopes and intercepts for both. I then ran a subsequent regression using sex, time, and sex*time and obtained the slopes and intercepts for each term. I noticed that these slopes and intercepts matched the output of the original / separate regressions. However, the interaction in this case was not significant, so it was removed and the regression was run once more. Is there a reason why the slopes and intercepts resulting from this analysis does NOT match those obtained from the original regression? Which regression output is actually “true” in this example?Thanks.

    • Karen Grace-Martin says

      Hi Kev,

      You don’t want to think of one model as “true” and another as “false.” No model exactly represents what is happening in the population, but you are trying to find one that is the most reasonable. When you take out the interaction, you are essentially setting it equal to zero. In other words, you are forcing the male and female slopes to be equal. Both running the groups separately and including an interaction allow each slope to be estimated uniquely.

  8. Tomas Andriotti says

    Hi ,

    When i have a model with a interaction term between a continuous variable and a discrete with 3 levels which I dummy-coded and chose level 1 as a reference. So, I have 2 interactions terms each for the 2 comparisons among the discrete variable levels – level2 versus 1 and level 3 versus 1. If in one of the level comparison shows a not significant result and in the other shows significant , Would this mean there is or not effect modification?

    Tomas

  9. KC Tam says

    Hi Karen,

    First, I just want to say THANK YOU for the website. I cannot overestimate how much I’ve learned from your posts. As a psychologist working with small sample sizes (great apes) and based in a highly diverse anthropology department, it is incredibly difficult for me to find proper stats training tailored to my research needs. But ever since I found your website, I’ve been recommending to all my colleagues. So, thank you!!

    In the post, you gave two answers to the question “So if an interaction isn’t significant, should you drop it?”

    If you are doing model selection, then drop it.
    If you are doing hypothesis testing and have hypothesized this interaction, then don’t.

    I wonder if there is a 3rd situation:
    Should I drop it If I am doing hypothesis testing but did NOT hypothesize the interaction?

    For example, I am interested in (1) a treatment x gender x country of origin 3-way interaction as well as (2) treatment itself. So all the 2-way interactions are there not because of a priori hypothesis. If the 3-way interaction is not significant, how should I test the effect of treatment then? Drop all 3- and 2-way interactions?

    Thank you!

  10. Benito says

    Dear Karen

    I have a main factor that is directly proportional to the response. I identified the significant and non-significant factors and interactions and then I dropped them. When I runned again the DOE with only significant ones, this main fator remained positive as waited. Strangely, the estimated coeficient for uncoded units of this fator became negative. Low and high levels for this factor are positive (110 and 130). Is it possible to happen? Why? The presence of interactions can cause this? PS.: all the factors and their levels are positives. Thanks in advance!

  11. Mo says

    Hi Karan!

    Thank you for very useful article about interpretation of interaction terms. I know If we are looking at the coefficients, those “main effects” are not main effects once the interaction is in the model. They’re marginal effects. coefficients can be used to interpret the magnitude of marginal effects. I wounder, how can I determine statistical significance of the marginal effects. Thank you in advance.

  12. Theo van Erp says

    Dear Karen,

    For some reason I am not able to see the other 11 comments to this post.

    However, the following reference strongly suggests that non-significant interaction terms should NOT be included in statistical models.

    Engqvist, L. (2005). The mistreatment of covariate interaction terms
    in linear model analyses of behavioural and evolutionary ecology studies, ANIMAL BEHAVIOUR, 2005, 70, 967–971.

    I would appreciate if you could comment on the discrepancy between your view and that presented in the article.

    Best,
    Theo

  13. Matt says

    Hi Karen,

    Thanks for a great article. I don’t quite understand the reason to leave in an insignificant term – how does it help you evaluate your hypothesis? The only hypothesis it would help evaluate is…whether there is interaction of those terms. Aren’t you more concerned with discerning the relationships (whatever they are) of the covariates to the dependent variable? Said another way: when and why would you NOT be trying to specify the model correctly?

    I suppose your answer will have to do with the specification error you mentioned, with which I am only basically familiar.

  14. Bushra Shafqat says

    Hi!
    hopefully you are doing good it was indeed a thoughtful reading but I was just wandering if i get insignificant interaction term but the overall model(ANOVA) is significant should i drop the interaction term from the regression model?
    anxiously waiting for your reply

    Regards..

  15. Tom says

    Dear Karen,

    In my analysis I am interested in three-way interaction including time. Basically it is X1*Time*X2; my dependent interacts with time and is dependend on the levels of my moderator. However, I also hypothesize a two-way interaction between X1 and time. This interaction is not significant, but my three-way interaction is significant. I know that you cannot interpret main effects when you include a two-way interaction. But what about interpreting two-way interactions when you include a three-way interaction? Thank you in advance,

    Tom

    • Karen says

      Hi Tom,

      You can often interpret a main effect when there are interactions. It depends on the pattern of means.

      This is also true for two way and three way interactions. Depending on the nature of the three-way, the two way may or may not make sense on its own. You are hypothesizing a two way interaction for a reason, so does it still make sense in the presence of the three way?

      • Cristina says

        Hi Karen and Tom,

        Thank you very much for the useful website!
        I have a question in line with the one that Tom posted. I am testing for two- and three-way interactions in my model (Time*InterventionGroup*Factor). I am of course also interested in the Time*Group interaction but I am not interested in the InterventionGroup*Factor interaction. Can I leave it out of the model? Or it is not recommended to do so as part of it is in another interaction?
        Hope it is clear. Thanks a lot in advance!
        Regards,
        Cristina

  16. muj says

    Hi, I was just wanting to know, what if my main effects turn insignficant or reverse the sign of the coefficient (such that it is counter-intuitive to theory) but the interaction terms are now significant and absolutely as you would expect them to be theoretically. I am not sure how to interpret this then.

  17. Lukas K. says

    Dear Karen,

    thank you for article. It was very helpful. But I am still a bit confused and it would be great if you could give me a hint. If I have a regression model with 4 variables. Two of them do not have a significant coefficient nor do they contribute to the adj.R2 or F / Fsign. So I droped them. Fruther there is one variable which has the greatest explaination power. The last one is insignificant. But adds something to adj.R2. So I thoguht I test if there is some interaction going on and as it turned out the interaction insignificant with a beta 0. In this case would it be reasonable to drop the interactions term and the idea of an interaction?

    Thank you Lukas

  18. saurabh jangid says

    thank u sir/mam a lots and lots….was highly confused regarding same point…but u r last line made all thing clear that’s dropping lower order terms for higher order interactions….leave 2 way insignificant interaction for 3 way significant interaction…and any significant main effect in 1 way for significant 2 way interaction…..as it consumes degree of freedom in type III error…
    thank a lot…was just writing back what i interpreted from article….
    thanks a ton….thank u very much…saurabh jangir


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.