Regression models

Is Multicollinearity the Bogeyman?

April 8th, 2009 by

Stage 2Multicollinearity occurs when two or more predictor variables in a regression model are redundant.  It is a real problem, and it can do terrible things to your results.  However, the dangers of multicollinearity seem to have been so drummed into students’ minds that it created a panic.

True multicolllinearity (the kind that messes things up) is pretty uncommon.  High correlations among predictor variables may indicate multicollinearity, but it is NOT a reliable indicator that it exists.  It does not necessarily indicate a problem.  How high is too high depends on (more…)


Regression Models:How do you know you need a polynomial?

April 3rd, 2009 by

A polynomial term–a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve.  But because it is X that is squared or cubed, not the Beta coefficient, it still qualifies as a linear model.  This makes it a nice, straightforward way to model curves without having to model complicated non-linear models.

But how do you know if you need one–when a linear model isn’t the best model? (more…)


Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 2

March 31st, 2009 by

Part 1 outlined one issue in deciding whether to put a categorical predictor variable into Fixed Factors or Covariates in SPSS GLM.  That issue dealt with how SPSS automatically creates dummy variables from any variable in Fixed Factors.

There is another key default to keep in mind. SPSS GLM will automatically create interactions between any and all variables you specify as Fixed Factors.

If you put 5 variables in Fixed Factors, you’ll get a lot of interactions. SPSS will automatically create all 2-way, 3-way, 4-way, and even a 5-way interaction among those 5 variables. (more…)


Logistic Regression Models: Reversed odds ratios in SAS Proc Logistic–Use ‘Descending’

March 18th, 2009 by

If you’ve ever been puzzled by odds ratios in a logistic regression that seem backward, stop banging your head on the desk.

Odds are (pun intended) you ran your analysis in SAS Proc Logistic.

Proc logistic has a strange (I couldn’t say odd again) little default.  If your dependent variable Y is coded 0 and 1, SAS will model the probability of Y=0.  Most of us are trying to model the probability that Y=1.  So, yes, your results ARE backward, but only because SAS is testing a hypothesis opposite yours.

Luckily, SAS made the solution easy.  Simply add the ‘Descending’ option right in the proc logisitic command line.  For example:

PROC LOGISTIC DESCENDING;
MODEL Y = X1 X2;
RUN;

All of your parameter estimates (B) will reverse signs, although p-values will not be affected.

 

[Logistic_Regression_Workshop]


Why ANOVA and Linear Regression are the Same Analysis

March 11th, 2009 by

Stage 2If your graduate statistical training was anything like mine, you learned ANOVA in one class and Linear Regression in another.  My professors would often say things like “ANOVA is just a special case of Regression,” but give vague answers when pressed.

It was not until I started consulting that I realized how closely related ANOVA and regression are.  They’re not only related, they’re the same thing.  Not a quarter and a nickel–different sides of the same coin.

So here is a very simple example that shows why.  When someone showed me this, a light bulb went on, even though I already knew both ANOVA and multiple linear (more…)


Testing and Dropping Interaction Terms in Regression and ANOVA models

February 26th, 2009 by

In a Regression model, should you drop interaction terms if they’re not significant?

In an ANOVA, adding interaction terms still leaves the main effects as main effects.  That is, as long as the data are balanced, the main effects and the interactions are independent.  The main effect is still telling (more…)