Linear Regression

Confusing Statistical Term #4: Hierarchical Regression vs. Hierarchical Model

December 21st, 2009 by Karen Grace-Martin

This one is relatively simple. Very similar names for two totally different concepts.

Hierarchical Models (aka Hierarchical Linear Models or HLM) are a type of linear regression models in which the observations fall into hierarchical, or completely nested levels.

Hierarchical Models are a type of Multilevel Models.

So what is a hierarchical data structure, which requires a hierarchical model?

The classic example is data from children nested within schools. The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school.

Child-level predictors could be things like GPA, grade, and gender. School-level predictors could be things like: total enrollment, private vs. public, mean SES.

Because multiple children are measured from the same school, their measurements are not independent. Hierarchical modeling takes that into account.

Hierarchical regression is a model-building technique in any regression model. It is the practice of building successive linear regression models, each adding more predictors.

For example, one common practice is to start by adding only demographic control variables to the model. In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls.

You’re actually building separate but related models in each step. But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones.

So hierarchical regression is really a series of regular old OLS regression models–nothing fancy, really.

Confusing Statistical Terms #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Terms #3: Levels

2 comments

Have you Wondered how using SPSS Burns Calories?

October 30th, 2009 by Karen Grace-Martin

Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.

Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:

“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “

And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)

No comments yet

3 Mistakes Data Analysts Make in Testing Assumptions in GLM

September 1st, 2009 by Karen Grace-Martin

I know you know it–those assumptions in your regression or ANOVA model really are important. If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions. Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy. Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t. (I promise!) And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice. Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1. They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met. And it’s so nice having a p-value, right? Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness. They find that every distribution is non-normal and heteroskedastic. They’re good tools, but these hammers think every data set is a nail. You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do. And once again, it probably works out much of the time. Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions. But not all the way, and not all the assumptions. You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions” need to be checked, but they’re not really model assumptions, they’re data issues. And it’s also partially because sometimes the assumptions have been taken to their logical conclusions. That textbook author is trying to make it more logical for you. But sometimes that just leads you to testing the related, but wrong thing. It works out most of the time, but not always.

No comments yet

Beyond Median Splits: Meaningful Cut Points

June 26th, 2009 by Karen Grace-Martin

I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.

But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be (more…)

No comments yet

Likert Scale Items as Predictor Variables in Regression

May 22nd, 2009 by Karen Grace-Martin

I was recently asked about whether it’s okay to treat a likert scale as continuous as a predictor in a regression model. Here’s my reply. In the question, the researcher asked about logistic regression, but the same answer applies to all regression models.

1. There is a difference between a likert scale item (a single 1-7 scale, eg.) and a full likert scale , which is composed of multiple items. If it is a full likert scale, with a combination of multiple items, go ahead and treat it as numerical. (more…)

32 comments

SPSS GLM or Regression? When to use each

April 23rd, 2009 by Karen Grace-Martin

Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions. It is what I usually use.

But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other. How do you decide when to use GLM and when to use Regression?

GLM has these options that Regression doesn’t: (more…)

22 comments