• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Centering for Multicollinearity Between Main effects and Quadratic terms

by Karen Grace-Martin 8 Comments

One of the most common causes of multicollinearity is when predictor variables are multiplied to create an interaction term or a quadratic or higher order terms (X squared, X cubed, etc.).

Why does this happen?  When all the X values are positive, higher values produce high products and lower values produce low products.  So the product variable is highly correlated with the component variable.  I will do a very simple example to clarify.  (Actually, if they are all on a negative scale, the same thing would happen, but the correlation would be negative).

In a small sample, say you have the following values of a predictor variable X, sorted in ascending order:

2, 4, 4, 5, 6, 7, 7, 8, 8, 8

It is clear to you that the relationship between X and Y is not linear, but curved, so you add a quadratic term, X squared (X2), to the model.  The values of X squared are:

4, 16, 16, 25, 49, 49, 64, 64, 64

The correlation between X and X2 is .987–almost perfect.

Plot of X vs. X squared
Plot of X vs. X squared

To remedy this, you simply center X at its mean.  The mean of X is 5.9.  So to center X, I simply create a new variable XCen=X-5.9.

These are the values of XCen:

-3.90, -1.90, -1.90, -.90, .10, 1.10, 1.10, 2.10, 2.10, 2.10

Now, the values of XCen squared are:

15.21, 3.61, 3.61, .81, .01, 1.21, 1.21, 4.41, 4.41, 4.41

The correlation between XCen and XCen2 is -.54–still not 0, but much more managable.  Definitely low enough to not cause severe multicollinearity.  This works because the low end of the scale now has large absolute values, so its square becomes large.

The scatterplot between XCen and XCen2 is:

Plot of Centered X vs. Centered X squared
Plot of Centered X vs. Centered X squared

If the values of X had been less skewed, this would be a perfectly balanced parabola, and the correlation would be 0.

Tonight is my free teletraining on Multicollinearity, where we will talk more about it.  Register to join me tonight or to get the recording after the call.

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Tagged With: centering, Correlation, linear regression, Multicollinearity

Related Posts

  • A Visual Description of Multicollinearity
  • Should You Always Center a Predictor on the Mean?
  • When NOT to Center a Predictor Variable in Regression
  • Using Marginal Means to Explain an Interaction to a Non-Statistical Audience

Reader Interactions

Comments

  1. Pamela Ferguson says

    April 15, 2021 at 1:37 pm

    Hi, I have an interaction between a continuous and a categorical predictor that results in multicollinearity in my multivariable linear regression model for those 2 variables as well as their interaction (VIFs all around 5.5). Whenever I see information on remedying the multicollinearity by subtracting the mean to center the variables, both variables are continuous. Should I convert the categorical predictor to numbers and subtract the mean? Thanks!

    Reply
    • Karen Grace-Martin says

      July 16, 2021 at 12:12 pm

      Hi Pamela,

      The equivalent of centering for a categorical predictor is to code it .5/-.5 instead of 0/1.

      Reply
  2. Kamo says

    March 21, 2019 at 12:47 pm

    I teach a multiple regression course. I tell me students not to worry about centering for two reasons.
    1. It doesn’t work for cubic equation.
    2. Whether they center or not, we get identical results (t, F, predicted values, etc.).
    Any comments?

    Reply
    • Karen Grace-Martin says

      March 21, 2019 at 3:26 pm

      Hi Kamo,

      You’re right that it won’t help these two things. The biggest help is for interpretation of either linear trends in a quadratic model or intercepts when there are dummy variables or interactions. See these:

      https://www.theanalysisfactor.com/interpret-the-intercept/
      https://www.theanalysisfactor.com/glm-in-spss-centering-a-covariate-to-improve-interpretability/

      Reply
    • Charlie says

      October 13, 2021 at 1:28 pm

      If you center and reduce multicollinearity, isn’t that affecting the t values?

      Reply
  3. Julien says

    April 7, 2015 at 6:04 am

    Does it really make sense to use that technique in an econometric context ?

    To me the square of mean-centered variables has another interpretation than the square of the original variable. Imagine your X is number of year of education and you look for a square effect on income: the higher X the higher the marginal impact on income say. So you want to link the square value of X to income. If X goes from 2 to 4, the impact on income is supposed to be smaller than when X goes from 6 to 8 eg. When capturing it with a square value, we account for this non linearity by giving more weight to higher values. A move of X from 2 to 4 becomes a move from 4 to 16 (+12) while a move from 6 to 8 becomes a move from 36 to 64 (+28). If we center, a move of X from 2 to 4 becomes a move from -15.21 to -3.61 (+11.60) while a move from 6 to 8 becomes a move from 0.01 to 4.41 (+4.4). So moves with higher values of education become smaller, so that they have less weigh in effect if my reasoning is good. It seems to me that we capture other things when centering.

    Reply
  4. george says

    July 12, 2013 at 5:37 pm

    I have a question on calculating the threshold value or value at which the quad relationship turns. The formula for calculating the turn is at x = -b/2a; following from ax2+bx+c. My question is this: when using the mean centered quadratic terms, do you add the mean value back to calculate the threshold turn value on the non-centered term (for purposes of interpretation when writing up results and findings).

    Reply
    • Karen says

      July 15, 2013 at 3:50 pm

      Yes, the x you’re calculating is the centered version. So to get that value on the uncentered X, you’ll have to add the mean back in.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT