There are two reasons to center predictor variables in any type of regression analysis–linear, logistic, multilevel, etc.
1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables (the ones that were multiplied).
2. To make interpretation of parameter estimates easier.
I was recently asked when is centering NOT a good idea?
Well, basically when it doesn’t help.
For reason #1, it will only help if you have multiplicative terms in a model. If you don’t have any multiplicative terms–no interactions or polynomials–centering isn’t going to help.
For reason #2, centering especially helps interpretation of parameter estimates (coefficients) when:
a) you have an interaction in the model
b) particularly if that interaction includes a continuous and a dummy coded categorical variable and
c) if the continuous variable does not contain a meaningful value of 0
d) even if 0 is a real value, if there is another more meaningful value such as a threshold point. (For example, if you’re doing a study on the amount of time parents work, with a predictor of Age of Youngest Child, an Age of 0 is meaningful and will be in the data set, but centering at 5, when kids enter school, might be more meaningful).
So when NOT to center:
1. If all continuous predictors have a meaningful value of 0.
2. If you have no interaction terms involving that predictor.
3. And if there are no values that are particularly meaningful.
I have panel data, and issue of multicollinearity is there, High VIF.
1- I don’t have any interaction terms, and dummy variables
2- I just want to reduce the multicollinearity and improve the coefficents
Would it be helpful to center all of my explanatory variables, just to resolve the issue of multicollinarity (huge VIF values).
Is it necessary to create centered-mean variables for the dummy variables when you are creating interactions between two dummy variables?
Karen Grace-Martin says
Hello and thank you for your explanation.
There is still something that I don’t understand about centering in interactions, though. Let’s take the Bacteria (B) and Sun (S) example, assuming they are continuous variables with no possible 0 values. If we want to introduce interaction in a regression, it is recommended to mean-center both variables. But, then low value on B and S will become negative once centered, and therefore their interaction will become positive. In other words, lowB-lowS will have the same impact as highB-highS. What am I missing here? Thanks
Karen Grace-Martin says
Since Sun is categorical, we wouldn’t center it. But even if you have two numerical predictors and center both, it doesn’t mean that lowB-lowS has the same *mean* as highB-highS. The interaction term will not change if both predictors are centered. The interaction always measures the *change* in the effect (aka slope) of one variable for each one-unit effect of the other.
Thank you so much!
Scott Stanley says
For those who might be interested (and this is not dealing with the complexity of multilevel models for questions about centering), Hayes (2017) has a great section (9.1) starting on page 304 about the impact of centering predictors when you are testing moderation (i.e., when you have an interaction term in a regression equation), which is an example of when KGM says above it may be useful. He notes that centering will not change anything about testing the interaction term, itself. It will only change what happens with the two variables that go into the product. So, assume variable X and variable W, and an interaction XW (W = moderator in Haye’s notation): centering X and W will not impact the test or interpretation of the term for XW. It will change what you get for the CONDITIONAL results for X and W, however. Centering changes the interpretation of the conditional betas from being what happens to Y with a change of 1 unit for variable X among those with the value of 0 (zero) on W to what happens to Y with a change of 1 unit on X among those with the value at the mean of W.
I highly recommend that book as well as the treatment of this question in the simpler, non MLM cases. He also notes, consistent with what KGM says above, that centering can only be of much of any use at all (at least in non-MLM setting) if there is a multiplicative term or an interpretational issue, and apparently not because it changes the interaction test but because centering can make conditional effects that are non-sensical (e.g., one variable cannot be zero in real world) more interpretable. He says centering does indeed reduce the collinearity between X and XW, for example, but that collinearity is not really an issue when interpreting the finding for XW in the model, which of course, is the whole point of the moderation test. However, he notes it may still be useful if you have a model that just won’t run because the VIF for XW is so high that the software you are using will not run the model, but that the collinearity itself for XW is not a problem.
As an aside, Hayes takes a dim view of people messing much with interpreting the conditional effects when you have an interaction term, in any case, because people often misconstrue them as main effects.
Karthik Srinivasan says
Read this article: http://psycnet.apa.org/journals/met/12/2/121/ .It answers all your questions.
Dear Karen, some claims you make in this article are not true. Please see http://orm.sagepub.com/content/15/3/339.abstract for more information.
Thank you, Thank you. This was just what I needed.
Is it always necessary to center variables when using multilevel analysis (especially when it is a logit)? Might one be able to not center (especially when it seems to be change the significance of relevent variables). Thanks
Similar to you, I also had some multilevel models in which Level 2 predictors became non-significant once these predictors were (grand-mean) centered. People keep telling that it will only change the intercept value, but it’s not true. It seems from my experience that a Level 2 predictor initially significant may become no longer significant after being centered. This is frustrating, especially when you’re not interested in interpreting the meaning of the intercept…
Any idea, Karen ?
The article is open source via Google scholar.
Good explanation, it was helpful to me.I was just wondering if you have some reference where I can find this statements, some paper I can cite in scientific papers.
Does centering a variable change how you interpret the results? For example, if a Beta is positive or negative after a variable is centered?
Centering a variable won’t change it’s own coefficient.
It will change the intercept, which may or may not be meaningful.
It can also change other coefficients if the centered variable is involved in an interaction.
I centered my independent variables to reduce collinearity and some of my variables went from being significant before centering to not significant after. The variables are all involved in interactions, so your last statement caught my eye. Can you recommend any resources for me to follow up on centering and interactions?
Karen Grace-Martin says
Sure. I would start here: https://www.theanalysisfactor.com/interpreting-interactions-in-regression/
Should we center a binary variable if we have an interaction between a binary variable and a continuous variable?
I would love to know the answer to this as well.
Karen Grace-Martin says
That question has a very complicated answer. Most of the time, though, binary variables are dummy coded. If they are, then they have a specific meaning that works well in interactions. So you can change that coding to something that resembles centering for very specific reasons. But most of the time they are left as is.