Regression models

Making Dummy Codes Easy to Keep Track of

January 14th, 2010 by

Here’s a little tip.Stage 2

When you construct Dummy Variables, make it easy on yourself  to remember which code is which.  Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results.

Make the codes inherent in the Dummy variable name.

So instead of a variable named Gender with values of 1=Female and 0=Male, call the variable Female.

Instead of a set of dummy variables named MaritalStatus1 with values of 1=Married and 0=Single, along with MaritalStatus2 with values 1=Divorced and 0=Single, name the same variables Married and Divorced.

And if you’re new to dummy coding, this has the extra bonus of making the dummy coding intuitive.  It’s just a set of yes/no variables about all but one of your categories.

 


Interpreting Regression Coefficients in Models other than Ordinary Linear Regression

January 5th, 2010 by

Someone who registered for my upcoming Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well.

The short answer: Yes

The long-winded detailed explanation of why this is true and the one caveat:

One of the greatest things about regression models is that they all have the same set up: (more…)


Confusing Statistical Term #4: Hierarchical Regression vs. Hierarchical Model

December 21st, 2009 by

This one is relatively simple.  Very similar names for two totally different concepts.Stage 2

Hierarchical Models (aka Hierarchical Linear Models or HLM) are a type of linear regression models in which the observations fall into hierarchical, or completely nested levels.

Hierarchical Models are a type of Multilevel Models.

So what is a hierarchical data structure, which requires a hierarchical model?

The classic example is data from children nested within schools.  The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school.

Child-level predictors could be things like GPA, grade, and gender. School-level predictors could be things like: total enrollment, private vs. public, mean SES.

Because multiple children are measured from the same school, their measurements are not independent.  Hierarchical modeling takes that into account.

Hierarchical regression is a model-building technique in any regression model. It is the practice of building successive linear regression models, each adding more predictors.

For example, one common practice is to start by adding only demographic control variables to the model.   In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls.

You’re actually building separate but related models in each step.  But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones.

So hierarchical regression is really a series of regular old OLS regression models–nothing fancy, really.

Confusing Statistical Terms #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Terms #3: Levels

 


Chi-square test vs. Logistic Regression: Is a fancier test better?

November 9th, 2009 by

I recently received this email, which I thought was a great question, and one of wider interest…

Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis.  Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?

I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.

My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.

Thank you!

(more…)


Have you Wondered how using SPSS Burns Calories?

October 30th, 2009 by

Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.

Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:

“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “

And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)


6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

September 17th, 2009 by

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures.  That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.

But you need to check the assumptions anyway, because some departures are so far off that the p-values become inaccurate.  And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.

But sometimes you can’t.

Sometimes it’s because the dependent variable just isn’t appropriate for a linear model.  The (more…)