Regression models

Logistic Regression Models: Reversed odds ratios in SAS Proc Logistic–Use ‘Descending’

March 18th, 2009 by

If you’ve ever been puzzled by odds ratios in a logistic regression that seem backward, stop banging your head on the desk.

Odds are (pun intended) you ran your analysis in SAS Proc Logistic.

Proc logistic has a strange (I couldn’t say odd again) little default.  If your dependent variable Y is coded 0 and 1, SAS will model the probability of Y=0.  Most of us are trying to model the probability that Y=1.  So, yes, your results ARE backward, but only because SAS is testing a hypothesis opposite yours.

Luckily, SAS made the solution easy.  Simply add the ‘Descending’ option right in the proc logisitic command line.  For example:

PROC LOGISTIC DESCENDING;
MODEL Y = X1 X2;
RUN;

All of your parameter estimates (B) will reverse signs, although p-values will not be affected.

 

[Logistic_Regression_Workshop]


Why ANOVA and Linear Regression are the Same Analysis

March 11th, 2009 by

Stage 2If your graduate statistical training was anything like mine, you learned ANOVA in one class and Linear Regression in another.  My professors would often say things like “ANOVA is just a special case of Regression,” but give vague answers when pressed.

It was not until I started consulting that I realized how closely related ANOVA and regression are.  They’re not only related, they’re the same thing.  Not a quarter and a nickel–different sides of the same coin.

So here is a very simple example that shows why.  When someone showed me this, a light bulb went on, even though I already knew both ANOVA and multiple linear (more…)


Testing and Dropping Interaction Terms in Regression and ANOVA models

February 26th, 2009 by

In a Regression model, should you drop interaction terms if they’re not significant?

In an ANOVA, adding interaction terms still leaves the main effects as main effects.  That is, as long as the data are balanced, the main effects and the interactions are independent.  The main effect is still telling (more…)


Interpreting Lower Order Coefficients When the Model Contains an Interaction

February 23rd, 2009 by

A Linear Regression Model with an interaction between two predictors (X1 and X2) has the form: 

Y = B0 + B1X1 + B2X2 + B3X1*X2.

It doesn’t really matter if X1 and X2 are categorical or continuous, but let’s assume they are continuous for simplicity.

One important concept is that B1 and B2 are not main effects, the way they would be if (more…)


Problems Caused by Categorizing Continuous Variables

February 20th, 2009 by

I just came across this great article by Frank Harrell:  Problems Caused by Categorizing Continuous VariablesStage 2

It’s from the Vanderbilt University biostatistics department, so the examples are all medical, but the points hold for any field.

It goes right along with my recent post, Continuous and Categorical Variables: The Trouble with Median Splits.

 


3 Reasons Psychology Researchers should Learn Regression

February 17th, 2009 by

Stage 2Back when I was doing psychology research, I knew ANOVA pretty well.  I’d taken a number of courses on it and could run it backward and forward.  I kept hearing about ANCOVA, but in every ANOVA class that was the last topic on the syllabus, and we always ran out of time.

The other thing that drove me crazy was those stats professors kept saying “ANOVA is just a special case of Regression.”  I could not for the life of me figure out why or how.

It was only when I switched over to statistics that I finally took a regression class and figured out what ANOVA was all about. And only when I started consulting, and seeing hundreds of different ANOVA and regression models, that I finally made the connection.

But if you don’t have the driving curiosity about ANOVA and regression, why should you, as a researcher in Psychology, Education, or Agriculture, who is trained in ANOVA, want to learn regression?  There are 3 main reasons.

1. There a many, many continuous independent variables and covariates that need to be included in models.  Without the tools to analyze them as continuous, you are left forcing them into ANOVA using an arbitrary technique like median splits.  At best, you’re losing power.  At worst, you’re not publishing your article because you’re missing real effects.

2. Having a solid understanding of the General Linear Model in its various forms equips you to really understand your variables and their relationships.  It allows you to try a model different ways–not for data fishing, but for discovering the true nature of the relationships.  Having the capacity to add an interaction term or a squared term  allows you to listen to your data and makes you a better researcher.

3. The multiple linear regression model is the basis for many other statistical techniques–logistic regression, multilevel and mixed models, Poisson regression, Survival Analysis, and so on.  Each of these is a step (or small leap) beyond multiple regression.  If you’re still struggling with what it means to center variables or interpret interactions, learning one of these other techniques becomes arduous, if not painful.

Having guided thousands of researchers through their statistical analysis over the past 10 years, I am convinced that having a strong, intuitive understanding of the general linear model in its variety of forms is the key to being an effective and confident statistical analyst.  You are then free to learn and explore other methodologies as needed.