When NOT to Center a Predictor Variable in Regression

There are two reasons to center predictor variables in any type of regression analysis–linear, logistic, multilevel, etc.

1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables (the ones that were multiplied).

2. To make interpretation of parameter estimates easier.

I was recently asked when is centering NOT a good idea?

Well, basically when it doesn’t help.

For reason #1, it will only help if you have multiplicative terms in a model.  If you don’t have any multiplicative terms–no interactions or polynomials–centering isn’t going to help.

For reason #2, centering especially helps interpretation of parameter estimates (coefficients) when:

a) you have an interaction in the model

b) particularly if that interaction includes a continuous and a dummy coded categorical variable and

c) if the continuous variable does not contain a meaningful value of 0

d) even if 0 is a real value, if there is another more meaningful value such as a threshold point.  (For example, if you’re doing a study on the amount of time parents work, with a predictor of Age of Youngest Child, an Age of 0 is meaningful and will be in the data set, but centering at 5, when kids enter school, might be more meaningful).

So when NOT to center:

1. If all continuous predictors have a meaningful value of 0.

2. If you have no interaction terms involving that predictor.

3. And if there are no values that are particularly meaningful.

 

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Reader Interactions

Comments

  1. Raza says

    I have panel data, and issue of multicollinearity is there, High VIF.

    1- I don’t have any interaction terms, and dummy variables
    2- I just want to reduce the multicollinearity and improve the coefficents

    Would it be helpful to center all of my explanatory variables, just to resolve the issue of multicollinarity (huge VIF values).

    Thank you

  2. Michiel says

    Dear Karen,

    Is it necessary to create centered-mean variables for the dummy variables when you are creating interactions between two dummy variables?

    Kind regards,
    Michiel

  3. Pascal says

    Hello and thank you for your explanation.
    There is still something that I don’t understand about centering in interactions, though. Let’s take the Bacteria (B) and Sun (S) example, assuming they are continuous variables with no possible 0 values. If we want to introduce interaction in a regression, it is recommended to mean-center both variables. But, then low value on B and S will become negative once centered, and therefore their interaction will become positive. In other words, lowB-lowS will have the same impact as highB-highS. What am I missing here? Thanks

    • Karen Grace-Martin says

      Hi Pascal,

      Since Sun is categorical, we wouldn’t center it. But even if you have two numerical predictors and center both, it doesn’t mean that lowB-lowS has the same *mean* as highB-highS. The interaction term will not change if both predictors are centered. The interaction always measures the *change* in the effect (aka slope) of one variable for each one-unit effect of the other.

  4. Scott Stanley says

    For those who might be interested (and this is not dealing with the complexity of multilevel models for questions about centering), Hayes (2017) has a great section (9.1) starting on page 304 about the impact of centering predictors when you are testing moderation (i.e., when you have an interaction term in a regression equation), which is an example of when KGM says above it may be useful. He notes that centering will not change anything about testing the interaction term, itself. It will only change what happens with the two variables that go into the product. So, assume variable X and variable W, and an interaction XW (W = moderator in Haye’s notation): centering X and W will not impact the test or interpretation of the term for XW. It will change what you get for the CONDITIONAL results for X and W, however. Centering changes the interpretation of the conditional betas from being what happens to Y with a change of 1 unit for variable X among those with the value of 0 (zero) on W to what happens to Y with a change of 1 unit on X among those with the value at the mean of W.

    I highly recommend that book as well as the treatment of this question in the simpler, non MLM cases. He also notes, consistent with what KGM says above, that centering can only be of much of any use at all (at least in non-MLM setting) if there is a multiplicative term or an interpretational issue, and apparently not because it changes the interaction test but because centering can make conditional effects that are non-sensical (e.g., one variable cannot be zero in real world) more interpretable. He says centering does indeed reduce the collinearity between X and XW, for example, but that collinearity is not really an issue when interpreting the finding for XW in the model, which of course, is the whole point of the moderation test. However, he notes it may still be useful if you have a model that just won’t run because the VIF for XW is so high that the software you are using will not run the model, but that the collinearity itself for XW is not a problem.

    As an aside, Hayes takes a dim view of people messing much with interpreting the conditional effects when you have an interaction term, in any case, because people often misconstrue them as main effects.

  5. Steve says

    Is it always necessary to center variables when using multilevel analysis (especially when it is a logit)? Might one be able to not center (especially when it seems to be change the significance of relevent variables). Thanks

    • Oliver says

      Hi Steve,

      Similar to you, I also had some multilevel models in which Level 2 predictors became non-significant once these predictors were (grand-mean) centered. People keep telling that it will only change the intercept value, but it’s not true. It seems from my experience that a Level 2 predictor initially significant may become no longer significant after being centered. This is frustrating, especially when you’re not interested in interpreting the meaning of the intercept…

      Any idea, Karen ?

  6. Antenor says

    Hello Karen,

    Good explanation, it was helpful to me.I was just wondering if you have some reference where I can find this statements, some paper I can cite in scientific papers.

  7. Elise says

    Does centering a variable change how you interpret the results? For example, if a Beta is positive or negative after a variable is centered?

    • Karen Grace-Martin says

      Hi Yan,

      That question has a very complicated answer. Most of the time, though, binary variables are dummy coded. If they are, then they have a specific meaning that works well in interactions. So you can change that coding to something that resembles centering for very specific reasons. But most of the time they are left as is.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.