Testing and Dropping Interaction Terms in Regression and ANOVA models

In a Regression model, should you drop interaction terms if they’re not significant?

In an ANOVA, adding interaction terms still leaves the main effects as main effects. That is, as long as the data are balanced, the main effects and the interactions are independent. The main effect is still telling you if there is an overall effect of that variable after accounting for other variables in the model.

But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects. That means that the effect of one predictor is conditional on the value of the other. The coefficient of the lower order term isn’t the effect of that term. It’s the effect only when the other term in the interaction equals 0.

So if an interaction isn’t significant, should you drop it?

If you are just checking for the presence of an interaction to make sure you are specifying the model correctly, go ahead and drop it. The interaction uses up df and changes the meaning of the lower order coefficients and complicates the model. So if you were just checking for it, drop it.

But if you actually hypothesized an interaction that wasn’t significant, leave it in the model. The insignificant interaction means something in this case–it helps you evaluate your hypothesis. Taking it out can do more damage in specification error than in will in the loss of df.

The same is true in ANOVA models.

And as always, leave in any lower order terms, significant or not, for any higher order terms in the model. That means you have to leave in all insignificant two-way interactions for any significant 3-ways.

Interpreting Linear Regression Coefficients: A Walk Through Output

Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Comments

Luo says

August 31, 2022 at 8:52 pm

I came across the following problem.
An ordered logistic regression: Y= B1 *SEX+ B2*AGE+…….
SEX is significant, but AGE not. This is good for me, because this confirms my hypothesis.

I wanted to test another hypothesis, so I add an interaction term SEX*AGE to the regression. But now the interaction term is not significant. This is good for me, because the other hypothesis is confirmed. But meanwhile SEX is not significant.

How to deal with thin kind of situation? So can you help me ?

Reply
Maria says

January 28, 2020 at 5:11 am

Hi, thanks a lot for this amazing website!
What should I do if my main hypothesis is the threeway interaction and it is significant, but it becomes not significant when I add the two ways (not significant as weel). How can I justify that I want to keep only the three way? It is allowed? Why is this happening?
Thanks a lot for your answer!

Reply
- Jason S says
  
  May 2, 2021 at 4:39 pm
  
  Always include the lower order interactions, otherwise it is mis-attributing the variance to higher order interactions explained by lower order interactions (i.e. single factors and 2-way interactions). In your case, it sounds like the model is under-powered and results become insignificant with more DF used. Unfortunately, that just means it’s not reliable.
  
  Reply
Matt says

October 2, 2019 at 6:35 pm

Hi
I’m still not sure how I should interpret this

I have a model that includes an interaction term.

r-squared = 0.7726
gper15 p = 0.13
iv p =0.00
gper15 iv interaction p = 0.009
constant p = 0.00

How do I interpret the effect of gper15? Is this essentially telling me that gper15 is only significant when iv is greater than 0?

Reply
- Karen Grace-Martin says
  
  October 28, 2019 at 10:10 am
  
  Hi Matt,
  
  To interpret the interaction, you need the regression coefficients. The interaction tells you that the effect of gper15 is different for different values of iv.
  
  Reply
Richard Anderson says

May 29, 2019 at 4:29 pm

I’m not sure I can make sense of this blog post.

You wrote: “But in regression, adding interaction terms makes the coefficients of the lower order terms conditional effects, not main effects. That means that the effect of one predictor is conditional on the value of the other.”

However, “the effect of one predictor is conditional on the value of the other” is precisely what “interaction” means.

Reply
- Karen Grace-Martin says
  
  May 31, 2019 at 11:22 am
  
  Hi Richard,
  
  Yes, indeed. I was just reiterating what incarceration means there. The rest of that paragraph tells you the rest of the story–what the variable’s coefficient actually measures. It isn’t the entire effect of that predictor.
  
  Reply
Sarah says

January 21, 2019 at 1:53 pm

Hi, thank you very much for this informative content!
Nevertheless, I have a question: in a multiple regression model, a (hypothesized) three-way-interaction was significant (b = 0.45, p = 0.033). When I tested this model vs. the model without the interaction-term (only main effects), the F-Test was not significant (delta R-squared = 0.04, F(4,112) = 1.431772, p = 0.2281229). Does this mean that the interaction term does not explain additional variance? Then my question would be: is my data consistent with the hypothesized three-way-interaction or is it not?
Best regards and thanks for your help!

Reply
- Karen Grace-Martin says
  
  March 4, 2019 at 11:30 am
  
  Hi Sarah,
  
  This can get complicated. It’s possible that the three way is significant in the presence of the two-way interactions. But on its own, it isn’t adding a significant amount of variance. What you should do in this situation is hard to say without having a conversation about your research questions, design, and seeing your output.
  
  Reply
Hassen says

November 17, 2018 at 12:54 pm

Hi,
I run the data on SAS Program, I find that the overall multiple regression model is not significant also none of the coefficients is significant. What should I do first to analyze my data?

Reply
Kev Cogburn says

July 10, 2018 at 1:18 pm

I have time series data for males and females. I ran separate linear regressions (one for males, the other for females) and obtained the slopes and intercepts for both. I then ran a subsequent regression using sex, time, and sex*time and obtained the slopes and intercepts for each term. I noticed that these slopes and intercepts matched the output of the original / separate regressions. However, the interaction in this case was not significant, so it was removed and the regression was run once more. Is there a reason why the slopes and intercepts resulting from this analysis does NOT match those obtained from the original regression? Which regression output is actually “true” in this example?Thanks.

Reply
- Karen Grace-Martin says
  
  October 12, 2018 at 10:21 am
  
  Hi Kev,
  
  You don’t want to think of one model as “true” and another as “false.” No model exactly represents what is happening in the population, but you are trying to find one that is the most reasonable. When you take out the interaction, you are essentially setting it equal to zero. In other words, you are forcing the male and female slopes to be equal. Both running the groups separately and including an interaction allow each slope to be estimated uniquely.
  
  Reply
Tomas Andriotti says

November 11, 2017 at 3:26 am

Hi ,

When i have a model with a interaction term between a continuous variable and a discrete with 3 levels which I dummy-coded and chose level 1 as a reference. So, I have 2 interactions terms each for the 2 comparisons among the discrete variable levels – level2 versus 1 and level 3 versus 1. If in one of the level comparison shows a not significant result and in the other shows significant , Would this mean there is or not effect modification?

Tomas

Reply
- Karen says
  
  January 29, 2018 at 12:20 pm
  
  I would say there is an interaction in that case.
  
  Reply
KC Tam says

March 12, 2017 at 9:35 pm

Hi Karen,

First, I just want to say THANK YOU for the website. I cannot overestimate how much I’ve learned from your posts. As a psychologist working with small sample sizes (great apes) and based in a highly diverse anthropology department, it is incredibly difficult for me to find proper stats training tailored to my research needs. But ever since I found your website, I’ve been recommending to all my colleagues. So, thank you!!

In the post, you gave two answers to the question “So if an interaction isn’t significant, should you drop it?”

If you are doing model selection, then drop it.
If you are doing hypothesis testing and have hypothesized this interaction, then don’t.

I wonder if there is a 3rd situation:
Should I drop it If I am doing hypothesis testing but did NOT hypothesize the interaction?

For example, I am interested in (1) a treatment x gender x country of origin 3-way interaction as well as (2) treatment itself. So all the 2-way interactions are there not because of a priori hypothesis. If the 3-way interaction is not significant, how should I test the effect of treatment then? Drop all 3- and 2-way interactions?

Thank you!

Reply
Benito says

February 16, 2017 at 6:50 pm

Dear Karen

I have a main factor that is directly proportional to the response. I identified the significant and non-significant factors and interactions and then I dropped them. When I runned again the DOE with only significant ones, this main fator remained positive as waited. Strangely, the estimated coeficient for uncoded units of this fator became negative. Low and high levels for this factor are positive (110 and 130). Is it possible to happen? Why? The presence of interactions can cause this? PS.: all the factors and their levels are positives. Thanks in advance!

Reply
Mo says

February 25, 2016 at 8:05 am

Hi Karan!

Thank you for very useful article about interpretation of interaction terms. I know If we are looking at the coefficients, those “main effects” are not main effects once the interaction is in the model. They’re marginal effects. coefficients can be used to interpret the magnitude of marginal effects. I wounder, how can I determine statistical significance of the marginal effects. Thank you in advance.

Reply
Theo van Erp says

October 23, 2015 at 11:28 am

Dear Karen,

For some reason I am not able to see the other 11 comments to this post.

However, the following reference strongly suggests that non-significant interaction terms should NOT be included in statistical models.

Engqvist, L. (2005). The mistreatment of covariate interaction terms
in linear model analyses of behavioural and evolutionary ecology studies, ANIMAL BEHAVIOUR, 2005, 70, 967–971.

I would appreciate if you could comment on the discrepancy between your view and that presented in the article.

Best,
Theo

Reply
Matt says

April 22, 2015 at 5:31 pm

Hi Karen,

Thanks for a great article. I don’t quite understand the reason to leave in an insignificant term – how does it help you evaluate your hypothesis? The only hypothesis it would help evaluate is…whether there is interaction of those terms. Aren’t you more concerned with discerning the relationships (whatever they are) of the covariates to the dependent variable? Said another way: when and why would you NOT be trying to specify the model correctly?

I suppose your answer will have to do with the specification error you mentioned, with which I am only basically familiar.

Reply
Bushra Shafqat says

August 1, 2014 at 7:09 pm

Hi!
hopefully you are doing good it was indeed a thoughtful reading but I was just wandering if i get insignificant interaction term but the overall model(ANOVA) is significant should i drop the interaction term from the regression model?
anxiously waiting for your reply

Regards..

Reply
Tom says

July 3, 2014 at 8:05 am

Dear Karen,

In my analysis I am interested in three-way interaction including time. Basically it is X1*Time*X2; my dependent interacts with time and is dependend on the levels of my moderator. However, I also hypothesize a two-way interaction between X1 and time. This interaction is not significant, but my three-way interaction is significant. I know that you cannot interpret main effects when you include a two-way interaction. But what about interpreting two-way interactions when you include a three-way interaction? Thank you in advance,

Tom

Reply
- Karen says
  
  July 14, 2014 at 11:50 am
  
  Hi Tom,
  
  You can often interpret a main effect when there are interactions. It depends on the pattern of means.
  
  This is also true for two way and three way interactions. Depending on the nature of the three-way, the two way may or may not make sense on its own. You are hypothesizing a two way interaction for a reason, so does it still make sense in the presence of the three way?
  
  Reply
  - Cristina says
    
    December 20, 2018 at 7:02 am
    
    Hi Karen and Tom,
    
    Thank you very much for the useful website!
    I have a question in line with the one that Tom posted. I am testing for two- and three-way interactions in my model (Time*InterventionGroup*Factor). I am of course also interested in the Time*Group interaction but I am not interested in the InterventionGroup*Factor interaction. Can I leave it out of the model? Or it is not recommended to do so as part of it is in another interaction?
    Hope it is clear. Thanks a lot in advance!
    Regards,
    Cristina
    
    Reply
    - Karen Grace-Martin says
      
      March 4, 2019 at 11:45 am
      
      Hi Cristina,
      
      You generally want to leave a 2-way interaction in when you have a 3-way. If you take it out, you are setting it to 0.
      
      Reply
muj says

February 27, 2014 at 11:28 am

Hi, I was just wanting to know, what if my main effects turn insignficant or reverse the sign of the coefficient (such that it is counter-intuitive to theory) but the interaction terms are now significant and absolutely as you would expect them to be theoretically. I am not sure how to interpret this then.

Reply
- Karen says
  
  March 10, 2014 at 5:22 pm
  
  Hi Muj,
  
  If you’re looking at the coefficients, those “main effects” are not main effects once the interaction is in the model. They’re marginal effects. See this: https://www.theanalysisfactor.com/interpreting-lower-order-coefficients-when-the-model-contains-an-interaction/
  
  Reply
  - May says
    
    August 25, 2015 at 1:33 pm
    
    Hi Karen,
    
    I know that the coefficients of main effects can´t be interpreted seperately anymore once the interaction is in the model. But how can I interprete the fact, that the main effect turns insignificant, when there´s a significant interaction.
    
    May
    
    Reply
Lukas K. says

November 30, 2013 at 12:22 pm

Dear Karen,

thank you for article. It was very helpful. But I am still a bit confused and it would be great if you could give me a hint. If I have a regression model with 4 variables. Two of them do not have a significant coefficient nor do they contribute to the adj.R2 or F / Fsign. So I droped them. Fruther there is one variable which has the greatest explaination power. The last one is insignificant. But adds something to adj.R2. So I thoguht I test if there is some interaction going on and as it turned out the interaction insignificant with a beta 0. In this case would it be reasonable to drop the interactions term and the idea of an interaction?

Thank you Lukas

Reply
- Karen says
  
  December 3, 2013 at 11:11 am
  
  Hi Lukas,
  
  Yes the interaction tests something completely different from the main effects.
  
  Reply
saurabh jangid says

April 2, 2011 at 4:31 am

thank u sir/mam a lots and lots….was highly confused regarding same point…but u r last line made all thing clear that’s dropping lower order terms for higher order interactions….leave 2 way insignificant interaction for 3 way significant interaction…and any significant main effect in 1 way for significant 2 way interaction…..as it consumes degree of freedom in type III error…
thank a lot…was just writing back what i interpreted from article….
thanks a ton….thank u very much…saurabh jangir

Reply
- Karen says
  
  May 13, 2011 at 10:50 am
  
  You’re welcome. Glad it was helpful. 🙂
  
  Reply

Reader Interactions

Comments

Leave a Reply Cancel reply