You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model.
One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible.
The bigger effect is on interpretation, and really the above cases are about whether it aids interpretation to leave them in. Models do get so cluttered it’s hard to figure out what’s going on, and it makes sense to eliminate effects that aren’t serving a purpose, but even insignificant effects can have a purpose.
So these are three situations where there is a purpose in showing that specific predictors were not significant and to measure their coefficient anyway:
1. Expected control variables. You need to show that you’ve controlled for them.
In many fields, there are control variables that everyone expects to see.
- Age in medical studies
- Race, income, education in sociological studies
- Socioeconomic status in education studies
The examples go on and on.
If you take these expected controls out, you will just get criticism for not including them. And it may be interesting to show that in this sample and with these variables, these controls weren’t significant.
2. Predictors you have specific hypotheses about.
Another example is if the point of a model is to specifically test a predictor–you have a hypothesis about a predictor and it’s meaningful to show that it’s not significant. In that case, I would leave it in, even if not significant.
3. Items involved in higher-order terms
When you take out a term that is involved in something higher, like a two-way interaction that is part of a three-way interaction, you actually change the meaning of the higher order term. The sums of squares for each higher-order term is based on comparisons to specific means and represents variation around that mean.
If you take out the lower order term, that variation has to be covered somewhere, and it’s usually not where you expect it. For example, a two-way interaction represents the variation in cell means around the main effect means. But if the variation between the main effect means isn’t measured with a main effect term, it ends up in the interaction, and that interaction doesn’t reflect the variation it did if the main effect were in the model.
So it’s not that it’s wrong, but it changes the meaning of the interaction. For that reason, most people recommend leaving those lower-order effects in.
The main point here is there are often good reasons to leave insignificant effects in a model. The p-values are just one piece of information. You may be losing important information by automatically removing everything that isn’t significant.