OptinMon

Confusing Statistical Term #4: Hierarchical Regression vs. Hierarchical Model

December 21st, 2009 by

This one is relatively simple.  Very similar names for two totally different concepts.Stage 2

Hierarchical Models (aka Hierarchical Linear Models or HLM) are a type of linear regression models in which the observations fall into hierarchical, or completely nested levels.

Hierarchical Models are a type of Multilevel Models.

So what is a hierarchical data structure, which requires a hierarchical model?

The classic example is data from children nested within schools.  The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school.

Child-level predictors could be things like GPA, grade, and gender. School-level predictors could be things like: total enrollment, private vs. public, mean SES.

Because multiple children are measured from the same school, their measurements are not independent.  Hierarchical modeling takes that into account.

Hierarchical regression is a model-building technique in any regression model. It is the practice of building successive linear regression models, each adding more predictors.

For example, one common practice is to start by adding only demographic control variables to the model.   In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls.

You’re actually building separate but related models in each step.  But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones.

So hierarchical regression is really a series of regular old OLS regression models–nothing fancy, really.

Confusing Statistical Terms #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Terms #3: Levels

 


Confusing Statistical Terms #2: Alpha and Beta

December 11th, 2009 by

Oh so many years ago I had my first insight into just how ridiculously confusing all the statistical terminology can be for novices.

I was TAing a two-semester applied statistics class for graduate students in biology.  It started with basic hypothesis testing and went on through to multiple regression.

It was a cross-listed class, meaning there were a handful of courageous (or masochistic) undergrads in the class, and they were having trouble keeping (more…)


Series on Confusing Statistical Terms

December 3rd, 2009 by

One of the biggest challenges in learning statistics and data analysis is learning the lingo.  It doesn’t help that half of the notation is in Greek (literally).

The terminology in statistics is particularly confusing because often the same word or symbol is used to mean completely different concepts.

I know it feels that way, but it really isn’t a master plot by statisticians to keep researchers feeling ignorant.

Really.

It’s just that a lot of the methods in statistics were created by statisticians working in different fields–economics, psychology, medicine, and yes, straight statistics.  Certain fields often have specific types of data that come up a lot and that require specific statistical methodologies to analyze.

Economics needs time series, psychology needs factor analysis.  Et cetera, et cetera.

But separate fields developing statistics in isolation has some ugly effects.

Sometimes different fields develop the same technique, but use different names or notation.

Other times different fields use the same name or notation on different techniques they developed.

And of course, there are those terms with slightly different names, often used in similar contexts, but with different meanings. These are never used interchangeably, but they’re easy to confuse if you don’t use this stuff every day.

And sometimes, there are different terms for subtly different concepts, but people use them interchangeably.  (I am guilty of this myself).  It’s not a big deal if you understand those subtle differences.  But if you don’t, it’s a mess.

And it’s not just fields–it’s software, too.

SPSS uses different names for the exact same thing in different procedures.  In GLM, a continuous independent variable is called a Covariate.  In Regression, it’s called an Independent Variable.

Likewise, SAS has a Repeated statement in its GLM, Genmod, and Mixed procedures.  They all get at the same concept there (repeated measures), but they deal with it in drastically different ways.

So once the fields come together and realize they’re all doing the same thing, people in different fields or using different software procedures, are already used to using their terminology.  So we’re stuck with different versions of the same word or method.

So anyway, I am beginning a series of blog posts to help clear this up.  Hopefully it will be a good reference you can come back to when you get stuck.

We’ve expanded on this list with a member training, if you’re interested.

If you have good examples, please post them in the comments.  I’ll do my best to clear things up.

 

Why Statistics Terminology is Especially Confusing

Confusing Statistical Term #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Term #3: Levels

Confusing Statistical Terms #4: Hierarchical Regression vs. Hierarchical Model

Confusing Statistical Term #5: Covariate

Confusing Statistical Term #6: Factor

Same Statistical Models, Different (and Confusing) Output Terms

Confusing Statistical Term #7: GLM

Confusing Statistical Term #8: Odds

Confusing Statistical Term #9: Multiple Regression Model and Multivariate Regression Model

Confusing Statistical Term #10: Mixed and Multilevel Models

Confusing Statistical Terms #11: Confounder

Six terms that mean something different statistically and colloquially

Confusing Statistical Term #13: MAR and MCAR Missing Data

 


Sharing SPSS Output across Versions

November 18th, 2009 by

If you’ve ever tried sharing SPSS output with your collaborators, advisor, or statistical consultant, you have surely noticed that the output is often not compatible across different versions of SPSS.

And if you work in a company where everyone is working on the same site license, it’s not a problem.  But if you’re collaborating with colleagues at different universities on different upgrade schedules, you might run into some problems.

It’s true that most software programs aren’t back-compatible.  You can’t read documents created in newer versions in older versions of software.

But SPSS’s sharing capabilities are more, um, interesting.

The syntax and data files are back and forward-compatible across many versions, at least since v9 or so.  (I don’t (more…)


Chi-square test vs. Logistic Regression: Is a fancier test better?

November 9th, 2009 by

I recently received this email, which I thought was a great question, and one of wider interest…

Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis.  Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?

I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.

My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.

Thank you!

(more…)


Have you Wondered how using SPSS Burns Calories?

October 30th, 2009 by

Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.

Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:

“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “

And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)