When to leave insignificant effects in a model

by Karen Grace-Martin

Share

You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model.

One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible.

The bigger effect is  on interpretation, and really the above cases are about whether it aids interpretation to leave them in. Models do get so cluttered it’s hard to figure out what’s going on, and it makes sense to eliminate effects that aren’t serving a purpose, but even insignificant effects can have a purpose.

So these are three situations where there is a purpose in showing that specific predictors were not significant and to measure their coefficient anyway:

1. Expected control variables.  You need to show that you’ve controlled for them.

In many fields, there are control variables that everyone expects to see.

  • Age in medical studies
  • Race, income, education in sociological studies
  • Socioeconomic status in education studies

The examples go on and on.

If you take these expected controls out, you will just get criticism for not including them.  And it may be interesting to show that in this sample and with these variables, these controls weren’t significant.

2. Predictors you have specific hypotheses about.

Another example is if the point of a model is to specifically test a predictor–you have a hypothesis about a predictor and it’s meaningful to show that it’s not significant. In that case, I would leave it in, even if not significant.

3. Items involved in higher-order terms

When you take out a term that is involved in something higher, like a two-way interaction that is part of a three-way interaction, you actually change the meaning of the higher order term.  The sums of squares for each higher-order term is based on comparisons to specific means and represents variation around that mean.

If you take out the lower order term, that variation has to be covered somewhere, and it’s usually not where you expect it.  For example, a two-way interaction represents the variation in cell means around the main effect means.  But if the variation between the main effect means isn’t measured with a main effect term, it ends up in the interaction, and that interaction doesn’t reflect the variation it did if the main effect were in the model.

So it’s not that it’s wrong, but it changes the meaning of the interaction.  For that reason, most people recommend leaving those lower-order effects in.

The main point here is there are often good reasons to leave insignificant effects in a model. The p-values are just one piece of information. You may be losing important information by automatically removing everything that isn’t significant.

Bookmark and Share

On the hunt for affordable statistical training with the best stats mentors around? Want to ask an expert all your burning stats questions? Check out Statistically Speaking, our exclusive membership program featuring monthly webinars and open Q&A sessions.


{ 9 comments… read them below or add one }

Bhargav October 14, 2015 at 7:19 pm

Hey Karen,
While building a model, when I add a function or an interaction effect there is an increase in my adjusted R2 value, However the term that I have added is not significant. In this case, can i still have the term or should i omit that term.

Reply

Joshua June 17, 2015 at 4:53 pm

I’m running an ARDL for factors affecting growth of exports but surprisingly world price is giving a wrong sign in the long run although it has a correct sign in the short run. However it is insignificant in both cases.Kindly advise on whether I can just ignore it and go ahead with writing my thesis .All the post estimation diagnostics are ok

Reply

Nitzan March 18, 2015 at 6:19 am

Hi Karen,
am performing linear regression analysis in Eviews and half of my variables are unfortunately with p.value larger than 0.05 , even though i dropped out one variable after detecting a multicollinearity it still doesnt change the p.value of the other variables.
My research dealing with the effect of explanatory variables such as : R&D Expenses, Company Size, repeated/new partner in alliances ,Total alliances and some more, on the number of patents per year in the pharma industry.My question is what is the implication in that kind of case? what can i do? this is a case study on 3 companies and the size of the sample is 66 observation .
Thank you in advance
Nitzan.

Reply

Kate September 5, 2012 at 4:09 am

Hi Karen,
A great article – however I’m having trouble applying it to my own data.
In cox regression survival analysis with two categorical (binary) IVs/factors, I try to include an interaction term between my two factors and all significance “disappears” – to explain:
CR Output (SPSS): With no interaction term specified:
Factor 1 B=-0.354, Sig=0.288
Factor 2 B=-0.753, Sig=0.025
So it looks like Factor 2 has a significant effect on my outcome variable.
However,
CR Output (SPSS): Including an interaction term:
Factor 1 B=-0.124, Sig=0.786
Factor 2 B=-0.528, Sig=0.246
Factor1*Factor2 B=-0.496, Sig=0.609
No significant effects of anything! 🙁

The research question asks about the effects of each factor indivdually, and also we are interested to see if there is an interaction between factors. I am unsure how to report this result, seeing as there does seem to be a significant effect for Factor 2, but I think I do need to leave the interaction term in the model which makes the Factor 2 effect insignificant.

Can I make any statements about how Factor 1 and (more importantly) Factor 2 affect survival while explaining there was no significant interaction between the two? Or is it reasonable to say that we tested for an interaction and didn’t find a significant one, and go back to the model without the interaction term?

Cheers

Reply

Karen September 11, 2012 at 4:44 pm

Hi Kate,

You don’t mention the coding of the factors, but I’m going to guess they’re dummy coded. When they are, and you have an interaction in the model, the values of the “main effects” are not main effects. So for example, Factor 1 B in the first model is the effect of F1 at ANY value of F2. But in the second model, Factor 1 B is the effect of factor 1 ONLY when Factor 2=0.

Here’s an article I wrote on it:
http://www.theanalysisfactor.com/interpreting-interactions-in-regression/

The other option is to use effect coding (1/-1) instead of dummy coding (1/0). It will allow each first-order effect to be at the mean of the other, aka, a main effect, but the Bs themselves aren’t very interpretable. This is what ANOVA uses.

Reply

JC April 24, 2012 at 4:18 pm

So if i retain those insignificant terms in my final model, do i need to interpret them? For example, in multiple regression I found age was not significant, do i need to interpret this in the usual way?

Thanks.

Reply

Karen April 25, 2012 at 4:04 pm

Hi JC,

You would interpret it as a null effect. So if you had a coefficient that was b=2.5, but insignificant, you would interpret that as being 0. So you would not worry if, for example, the sign was opposite what you expected.

Karen

Reply

Marc Fey April 12, 2012 at 6:12 pm

How do I reach your blog? I keep getting close and then turned away. I’d like to read 528.

Marc

Reply

Karen April 13, 2012 at 12:00 pm

HI Marc,

You’re on the blog. What do you mean by 528? It’s possible something got renamed when we did our website update last fall. I tried to catch everything, but maybe you found something I missed.

Karen

Reply

Leave a Comment

Please note that Karen receives hundreds of comments at The Analysis Factor website each week. Since Karen is also busy teaching workshops, consulting with clients, and running a membership program, she seldom has time to respond to these comments anymore. If you have a question to which you need a timely response, please check out our low-cost monthly membership program, or sign-up for a quick question consultation.

Previous post:

Next post: