In a Regression model, should you drop interaction terms if they’re not significant?
In an ANOVA, adding interaction terms still leaves the main effects as main effects. That is, as long as the data are balanced, the main effects and the interactions are independent. The main effect is still telling you if there is an overall effect of that variable after accounting for other variables in the model.
But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects. That means that the effect of one predictor is conditional on the value of the other. The coefficient of the lower order term isn’t the effect of that term. It’s the effect only when the other term in the interaction equals 0.
So if an interaction isn’t significant, should you drop it?
If you are just checking for the presence of an interaction to make sure you are specifying the model correctly, go ahead and drop it. The interaction uses up df and changes the meaning of the lower order coefficients and complicates the model. So if you were just checking for it, drop it.
But if you actually hypothesized an interaction that wasn’t significant, leave it in the model. The insignificant interaction means something in this case–it helps you evaluate your hypothesis. Taking it out can do more damage in specification error than in will in the loss of df.
The same is true in ANOVA models.
And as always, leave in any lower order terms, significant or not, for any higher order terms in the model. That means you have to leave in all insignificant two-way interactions for any significant 3-ways.
Hi, thanks a lot for this amazing website!
What should I do if my main hypothesis is the threeway interaction and it is significant, but it becomes not significant when I add the two ways (not significant as weel). How can I justify that I want to keep only the three way? It is allowed? Why is this happening?
Thanks a lot for your answer!
Jason S says
Always include the lower order interactions, otherwise it is mis-attributing the variance to higher order interactions explained by lower order interactions (i.e. single factors and 2-way interactions). In your case, it sounds like the model is under-powered and results become insignificant with more DF used. Unfortunately, that just means it’s not reliable.
I’m still not sure how I should interpret this
I have a model that includes an interaction term.
r-squared = 0.7726
gper15 p = 0.13
iv p =0.00
gper15 iv interaction p = 0.009
constant p = 0.00
How do I interpret the effect of gper15? Is this essentially telling me that gper15 is only significant when iv is greater than 0?
Karen Grace-Martin says
To interpret the interaction, you need the regression coefficients. The interaction tells you that the effect of gper15 is different for different values of iv.
Richard Anderson says
I’m not sure I can make sense of this blog post.
You wrote: “But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects. That means that the effect of one predictor is conditional on the value of the other.”
However, “the effect of one predictor is conditional on the value of the other” is precisely what “interaction” means.
Karen Grace-Martin says
Yes, indeed. I was just reiterating what incarceration means there. The rest of that paragraph tells you the rest of the story–what the variable’s coefficient actually measures. It isn’t the entire effect of that predictor.
Hi, thank you very much for this informative content!
Nevertheless, I have a question: in a multiple regression model, a (hypothesized) three-way-interaction was significant (b = 0.45, p = 0.033). When I tested this model vs. the model without the interaction-term (only main effects), the F-Test was not significant (delta R-squared = 0.04, F(4,112) = 1.431772, p = 0.2281229). Does this mean that the interaction term does not explain additional variance? Then my question would be: is my data consistent with the hypothesized three-way-interaction or is it not?
Best regards and thanks for your help!
Karen Grace-Martin says
This can get complicated. It’s possible that the three way is significant in the presence of the two-way interactions. But on its own, it isn’t adding a significant amount of variance. What you should do in this situation is hard to say without having a conversation about your research questions, design, and seeing your output.
I run the data on SAS Program, I find that the overall multiple regression model is not significant also none of the coefficients is significant. What should I do first to analyze my data?
Kev Cogburn says
I have time series data for males and females. I ran separate linear regressions (one for males, the other for females) and obtained the slopes and intercepts for both. I then ran a subsequent regression using sex, time, and sex*time and obtained the slopes and intercepts for each term. I noticed that these slopes and intercepts matched the output of the original / separate regressions. However, the interaction in this case was not significant, so it was removed and the regression was run once more. Is there a reason why the slopes and intercepts resulting from this analysis does NOT match those obtained from the original regression? Which regression output is actually “true” in this example?Thanks.
Karen Grace-Martin says
You don’t want to think of one model as “true” and another as “false.” No model exactly represents what is happening in the population, but you are trying to find one that is the most reasonable. When you take out the interaction, you are essentially setting it equal to zero. In other words, you are forcing the male and female slopes to be equal. Both running the groups separately and including an interaction allow each slope to be estimated uniquely.
Tomas Andriotti says
When i have a model with a interaction term between a continuous variable and a discrete with 3 levels which I dummy-coded and chose level 1 as a reference. So, I have 2 interactions terms each for the 2 comparisons among the discrete variable levels – level2 versus 1 and level 3 versus 1. If in one of the level comparison shows a not significant result and in the other shows significant , Would this mean there is or not effect modification?
I would say there is an interaction in that case.
KC Tam says
First, I just want to say THANK YOU for the website. I cannot overestimate how much I’ve learned from your posts. As a psychologist working with small sample sizes (great apes) and based in a highly diverse anthropology department, it is incredibly difficult for me to find proper stats training tailored to my research needs. But ever since I found your website, I’ve been recommending to all my colleagues. So, thank you!!
In the post, you gave two answers to the question “So if an interaction isn’t significant, should you drop it?”
If you are doing model selection, then drop it.
If you are doing hypothesis testing and have hypothesized this interaction, then don’t.
I wonder if there is a 3rd situation:
Should I drop it If I am doing hypothesis testing but did NOT hypothesize the interaction?
For example, I am interested in (1) a treatment x gender x country of origin 3-way interaction as well as (2) treatment itself. So all the 2-way interactions are there not because of a priori hypothesis. If the 3-way interaction is not significant, how should I test the effect of treatment then? Drop all 3- and 2-way interactions?
I have a main factor that is directly proportional to the response. I identified the significant and non-significant factors and interactions and then I dropped them. When I runned again the DOE with only significant ones, this main fator remained positive as waited. Strangely, the estimated coeficient for uncoded units of this fator became negative. Low and high levels for this factor are positive (110 and 130). Is it possible to happen? Why? The presence of interactions can cause this? PS.: all the factors and their levels are positives. Thanks in advance!
Thank you for very useful article about interpretation of interaction terms. I know If we are looking at the coefficients, those “main effects” are not main effects once the interaction is in the model. They’re marginal effects. coefficients can be used to interpret the magnitude of marginal effects. I wounder, how can I determine statistical significance of the marginal effects. Thank you in advance.
Theo van Erp says
For some reason I am not able to see the other 11 comments to this post.
However, the following reference strongly suggests that non-significant interaction terms should NOT be included in statistical models.
Engqvist, L. (2005). The mistreatment of covariate interaction terms
in linear model analyses of behavioural and evolutionary ecology studies, ANIMAL BEHAVIOUR, 2005, 70, 967–971.
I would appreciate if you could comment on the discrepancy between your view and that presented in the article.
Thanks for a great article. I don’t quite understand the reason to leave in an insignificant term – how does it help you evaluate your hypothesis? The only hypothesis it would help evaluate is…whether there is interaction of those terms. Aren’t you more concerned with discerning the relationships (whatever they are) of the covariates to the dependent variable? Said another way: when and why would you NOT be trying to specify the model correctly?
I suppose your answer will have to do with the specification error you mentioned, with which I am only basically familiar.
Bushra Shafqat says
hopefully you are doing good it was indeed a thoughtful reading but I was just wandering if i get insignificant interaction term but the overall model(ANOVA) is significant should i drop the interaction term from the regression model?
anxiously waiting for your reply
In my analysis I am interested in three-way interaction including time. Basically it is X1*Time*X2; my dependent interacts with time and is dependend on the levels of my moderator. However, I also hypothesize a two-way interaction between X1 and time. This interaction is not significant, but my three-way interaction is significant. I know that you cannot interpret main effects when you include a two-way interaction. But what about interpreting two-way interactions when you include a three-way interaction? Thank you in advance,
You can often interpret a main effect when there are interactions. It depends on the pattern of means.
This is also true for two way and three way interactions. Depending on the nature of the three-way, the two way may or may not make sense on its own. You are hypothesizing a two way interaction for a reason, so does it still make sense in the presence of the three way?
Hi Karen and Tom,
Thank you very much for the useful website!
I have a question in line with the one that Tom posted. I am testing for two- and three-way interactions in my model (Time*InterventionGroup*Factor). I am of course also interested in the Time*Group interaction but I am not interested in the InterventionGroup*Factor interaction. Can I leave it out of the model? Or it is not recommended to do so as part of it is in another interaction?
Hope it is clear. Thanks a lot in advance!
Karen Grace-Martin says
You generally want to leave a 2-way interaction in when you have a 3-way. If you take it out, you are setting it to 0.
Hi, I was just wanting to know, what if my main effects turn insignficant or reverse the sign of the coefficient (such that it is counter-intuitive to theory) but the interaction terms are now significant and absolutely as you would expect them to be theoretically. I am not sure how to interpret this then.
If you’re looking at the coefficients, those “main effects” are not main effects once the interaction is in the model. They’re marginal effects. See this: https://www.theanalysisfactor.com/interpreting-lower-order-coefficients-when-the-model-contains-an-interaction/
I know that the coefficients of main effects can´t be interpreted seperately anymore once the interaction is in the model. But how can I interprete the fact, that the main effect turns insignificant, when there´s a significant interaction.
Lukas K. says
thank you for article. It was very helpful. But I am still a bit confused and it would be great if you could give me a hint. If I have a regression model with 4 variables. Two of them do not have a significant coefficient nor do they contribute to the adj.R2 or F / Fsign. So I droped them. Fruther there is one variable which has the greatest explaination power. The last one is insignificant. But adds something to adj.R2. So I thoguht I test if there is some interaction going on and as it turned out the interaction insignificant with a beta 0. In this case would it be reasonable to drop the interactions term and the idea of an interaction?
Thank you Lukas
Yes the interaction tests something completely different from the main effects.
saurabh jangid says
thank u sir/mam a lots and lots….was highly confused regarding same point…but u r last line made all thing clear that’s dropping lower order terms for higher order interactions….leave 2 way insignificant interaction for 3 way significant interaction…and any significant main effect in 1 way for significant 2 way interaction…..as it consumes degree of freedom in type III error…
thank a lot…was just writing back what i interpreted from article….
thanks a ton….thank u very much…saurabh jangir
You’re welcome. Glad it was helpful. 🙂