OptinMon 04 - Interpreting Linear Regression Coefficients

Interpreting Lower Order Coefficients When the Model Contains an Interaction

February 23rd, 2009 by

A Linear Regression Model with an interaction between two predictors (X1 and X2) has the form: 

Y = B0 + B1X1 + B2X2 + B3X1*X2.

It doesn’t really matter if X1 and X2 are categorical or continuous, but let’s assume they are continuous for simplicity.

One important concept is that B1 and B2 are not main effects, the way they would be if (more…)


When NOT to Center a Predictor Variable in Regression

February 9th, 2009 by

There are two reasons to center predictor variables in any type of regression analysis–linear, logistic, multilevel, etc.

1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables (the ones that were multiplied).

2. To make interpretation of parameter estimates easier.

I was recently asked when is centering NOT a good idea? (more…)


Order affects Regression Parameter Estimates in SPSS GLM

February 6th, 2009 by

Stage 2I just discovered something in SPSS GLM that I never knew.

When you have an interaction in the model, the order you put terms into the Model statement affects which parameters SPSS gives you.

The default in SPSS is to automatically create interaction terms among all the categorical predictors.  But if you want fewer than all those interactions, or if you want to put in an interaction involving a continuous variable, you need to choose Model–>Custom Model.

In the specific example of an interaction between a categorical and continuous variable, to interpret this interaction you need to output Regression Coefficients. Do this by choosing  Options–>Regression Parameter Estimates.

If you put the main effects into the model first, followed by interactions, you will find the usual output–the regression coefficients (column B) for the continuous variable is the slope for the reference group.  The coefficients for the interactions in the other categories tell you the difference between the slope for that category and the slope for the reference group.  The coefficient for the reference group here in the interaction is 0.

What I was surprised to find is that if the interactions are put into the model first, you don’t get that.

Instead, the coefficients for the interaction of each category is the actual slope for that group, NOT the difference.

This is actually quite useful–it can save a bit of calculating and now you have a p-value for whether each slope is different from 0.  However, it also means you have to be cautious and make sure you realize what each parameter estimate is actually estimating.

 


Interpreting Interactions in Regression

January 19th, 2009 by

Adding interaction terms to a regression model has real benefits. It greatly expands your understanding of the relationships among the variables in the model. And you can test more specific hypotheses.  But interpreting interactions in regression takes understanding of what each coefficient is telling you.

The example from Interpreting Regression Coefficients was a model of the height of a shrub (Height) based on the amount of bacteria in the soil (Bacteria) and whether the shrub is located in partial or full sun (Sun). Height is measured in cm, Bacteria is measured in thousand per ml of soil, and Sun = 0 if the plant is in partial sun, and Sun = 1 if the plant is in full sun.

(more…)


SPSS GLM: Choosing Fixed Factors and Covariates

December 30th, 2008 by

The beauty of the Univariate GLM procedure in SPSS is that it is so flexible.  You can use it to analyze regressions, ANOVAs, ANCOVAs with all sorts of interactions, dummy coding, etc.

The down side of this flexibility is it is often confusing what to put where and what it all means.

So here’s a quick breakdown.

The dependent variable I hope is pretty straightforward.  Put in your continuous dependent variable.

Fixed Factors are categorical independent variables.  It does not matter if the variable is (more…)


Centering for Multicollinearity Between Main effects and Quadratic terms

December 10th, 2008 by

One of the most common causes of multicollinearity is when predictor variables are multiplied to create an interaction term or a quadratic or higher order terms (X squared, X cubed, etc.).

Why does this happen?  When all the X values are positive, higher values produce high products and lower values produce low products.  So the product variable is highly correlated with the component variable.  I will do a very simple example to clarify.  (Actually, if they are all on a negative scale, the same thing would happen, but the correlation would be negative).

In a small sample, say you have the following values of a predictor variable X, sorted in ascending order:

2, 4, 4, 5, 6, 7, 7, 8, 8, 8

It is clear to you that the relationship between X and Y is not linear, but curved, so you add a quadratic term, X squared (X2), to the model.  The values of X squared are:

4, 16, 16, 25, 49, 49, 64, 64, 64

The correlation between X and X2 is .987–almost perfect.

Plot of X vs. X squared
Plot of X vs. X squared

To remedy this, you simply center X at its mean.  The mean of X is 5.9.  So to center X, I simply create a new variable XCen=X-5.9.

These are the values of XCen:

-3.90, -1.90, -1.90, -.90, .10, 1.10, 1.10, 2.10, 2.10, 2.10

Now, the values of XCen squared are:

15.21, 3.61, 3.61, .81, .01, 1.21, 1.21, 4.41, 4.41, 4.41

The correlation between XCen and XCen2 is -.54–still not 0, but much more managable.  Definitely low enough to not cause severe multicollinearity.  This works because the low end of the scale now has large absolute values, so its square becomes large.

The scatterplot between XCen and XCen2 is:

Plot of Centered X vs. Centered X squared
Plot of Centered X vs. Centered X squared

If the values of X had been less skewed, this would be a perfectly balanced parabola, and the correlation would be 0.

Tonight is my free teletraining on Multicollinearity, where we will talk more about it.  Register to join me tonight or to get the recording after the call.