Linear Regression Analysis – 3 Common Causes of Multicollinearity and What Do to About Them

by Karen Grace-Martin

Multicollinearity in regression is one of those issues that strikes fear into the hearts of researchers. You’ve heard about its dangers in statistics classes, and colleagues and journal reviews question your results because of it.

Multicollinearity is simply redundancy in the information contained in predictor variables. If the redundancy is moderate, it usually only affects the interpretation of regression coefficients. But if it is severe-at or near perfect redundancy, it causes the model to “blow up.” (And yes, that’s a technical term).

But the reality is that there are only five situations where it commonly occurs. And three of them have very simple solutions. These are:

1. Improper dummy coding.

When you change a categorical variable into dummy variables, you will have one fewer dummy variable than you had categories. That’s because the last category is already indicated by having a 0 on all other dummy variables. Including the last category just adds redundant information, resulting in multicollinearity. So always check your dummy coding if it seems you’ve got a multicollinearity problem.

2. Including a predictor that is computed from other predictors.

For example, I once had a client who was trying to test if larger birds had higher probability of finding a mate. This bird had a special tail, and he wondered if the size of the whole bird or the tail was more helpful to the bird in finding a mate. To compare them, he put three measures of size into the model: Body length, tail length, and total length of bird. Total length was the sum of the first two. The model blew up. Include two, but not all three.

3. Using the same or nearly the same variable twice.

A similar situation occurs when two measures of the same concept are included in a model. Sometimes researchers want to see which predicts an outcome better. For example, does personal income or household income predict stress level better? If they are both just measuring income, combine them into a single income variable using Principal Components Analysis.

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: