linear regression

SPSS GLM: Choosing Fixed Factors and Covariates

December 30th, 2008 by

The beauty of the Univariate GLM procedure in SPSS is that it is so flexible.  You can use it to analyze regressions, ANOVAs, ANCOVAs with all sorts of interactions, dummy coding, etc.

The down side of this flexibility is it is often confusing what to put where and what it all means.

So here’s a quick breakdown.

The dependent variable I hope is pretty straightforward.  Put in your continuous dependent variable.

Fixed Factors are categorical independent variables.  It does not matter if the variable is (more…)


Centering for Multicollinearity Between Main effects and Quadratic terms

December 10th, 2008 by

One of the most common causes of multicollinearity is when predictor variables are multiplied to create an interaction term or a quadratic or higher order terms (X squared, X cubed, etc.).

Why does this happen?  When all the X values are positive, higher values produce high products and lower values produce low products.  So the product variable is highly correlated with the component variable.  I will do a very simple example to clarify.  (Actually, if they are all on a negative scale, the same thing would happen, but the correlation would be negative).

In a small sample, say you have the following values of a predictor variable X, sorted in ascending order:

2, 4, 4, 5, 6, 7, 7, 8, 8, 8

It is clear to you that the relationship between X and Y is not linear, but curved, so you add a quadratic term, X squared (X2), to the model.  The values of X squared are:

4, 16, 16, 25, 49, 49, 64, 64, 64

The correlation between X and X2 is .987–almost perfect.

Plot of X vs. X squared
Plot of X vs. X squared

To remedy this, you simply center X at its mean.  The mean of X is 5.9.  So to center X, I simply create a new variable XCen=X-5.9.

These are the values of XCen:

-3.90, -1.90, -1.90, -.90, .10, 1.10, 1.10, 2.10, 2.10, 2.10

Now, the values of XCen squared are:

15.21, 3.61, 3.61, .81, .01, 1.21, 1.21, 4.41, 4.41, 4.41

The correlation between XCen and XCen2 is -.54–still not 0, but much more managable.  Definitely low enough to not cause severe multicollinearity.  This works because the low end of the scale now has large absolute values, so its square becomes large.

The scatterplot between XCen and XCen2 is:

Plot of Centered X vs. Centered X squared
Plot of Centered X vs. Centered X squared

If the values of X had been less skewed, this would be a perfectly balanced parabola, and the correlation would be 0.

Tonight is my free teletraining on Multicollinearity, where we will talk more about it.  Register to join me tonight or to get the recording after the call.

 


Regression Through the Origin

November 13th, 2008 by

I just wanted to follow up on my last post about Regression without Intercepts.Stage 2

Regression through the Origin means that you purposely drop the intercept from the model.  When X=0, Y must = 0.

The thing to be careful about in choosing any regression model is that it fit the data well.  Pretty much the only time that a regression through the origin will fit better than a model with an intercept is if the point X=0, Y=0 is required by the data.

Yes, leaving out the intercept will increase your df by 1, since you’re not estimating one parameter.  But unless your sample size is really, really small, it won’t matter.  So it really has no advantages.