Regression models

Why Logistic Regression for Binary Response?

May 5th, 2009 by

Logistic regression models can seem pretty overwhelming to the uninitiated.  Why not use a regular regression model?  Just turn Y into an indicator variable–Y=1 for success and Y=0 for failure.

For some good reasons.

1.It doesn’t make sense to model Y as a linear function of the parameters because Y has only two values.  You just can’t make a line out of that (at least not one that fits the data well).

2. The predicted values can be any positive or negative number, not just 0 or 1.

3. The values of 0 and 1 are arbitrary.The important part is not to predict the numerical value of Y, but the probability that success or failure occurs, and the extent to which that probability depends on the predictor variables.

So okay, you say.  Why not use a simple transformation of Y, like probability of success–the probability that Y=1.

Well, that doesn’t work so well either.

Why not?

1. The right hand side of the equation can be any number, but the left hand side can only range from 0 to 1.

2. It turns out the relationship is not linear, but rather follows an S-shaped (or sigmoidal) curve.

To obtain a linear relationship, we need to transform this response too, Pr(success).

As luck would have it, there are a few functions that:

1. are not restricted to values between 0 and 1

2. will form a linear relationship with our parameters

These functions include:

Arcsine

Probit

Logit

All three of these work just as well, but (believe it or not) the Logit function is the easiest to interpret.

But as it turns out, you can’t just run the transformation then do a regular linear regression on the transformed data.  That would be way too easy, but also give inaccurate results.  Logistic Regression uses a different method for estimating the parameters, which gives better results–better meaning unbiased, with lower variances.

 


SPSS GLM or Regression? When to use each

April 23rd, 2009 by

Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions.  It is what I usually use.

But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other.  How do you decide when to use GLM and when to use Regression?

GLM has these options that Regression doesn’t: (more…)


Checking Assumptions in ANOVA and Linear Regression Models: The Distribution of Dependent Variables

April 10th, 2009 by

Here’s a little reminder for those of you checking assumptions in regression and ANOVA:

The assumptions of normality and homogeneity of variance for linear models are not about Y, the dependent variable.    (If you think I’m either stupid, crazy, or just plain nit-picking, read on.  This distinction really is important). (more…)


The Distribution of Independent Variables in Regression Models

April 9th, 2009 by

I often hear concern about the non-normal distributions of independent variables in regression models, and I am here to ease your mind.Stage 2

There are NO assumptions in any linear model about the distribution of the independent variables.  Yes, you only get meaningful parameter estimates from nominal (unordered categories) or numerical (continuous or discrete) independent variables.  But no, the model makes no assumptions about them.  They do not need to be normally distributed or continuous.

It is useful, however, to understand the distribution of predictor variables to find influential outliers or concentrated values.  A highly skewed independent variable may be made more symmetric with a transformation.

 


Is Multicollinearity the Bogeyman?

April 8th, 2009 by

Stage 2Multicollinearity occurs when two or more predictor variables in a regression model are redundant.  It is a real problem, and it can do terrible things to your results.  However, the dangers of multicollinearity seem to have been so drummed into students’ minds that it created a panic.

True multicolllinearity (the kind that messes things up) is pretty uncommon.  High correlations among predictor variables may indicate multicollinearity, but it is NOT a reliable indicator that it exists.  It does not necessarily indicate a problem.  How high is too high depends on (more…)


Regression Models:How do you know you need a polynomial?

April 3rd, 2009 by

A polynomial term–a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve.  But because it is X that is squared or cubed, not the Beta coefficient, it still qualifies as a linear model.  This makes it a nice, straightforward way to model curves without having to model complicated non-linear models.

But how do you know if you need one–when a linear model isn’t the best model? (more…)