3 Mistakes Data Analysts Make in Testing Assumptions in GLM

I know you know it–those assumptions in your regression or ANOVA model really are important.  If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions.  Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy.  Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t.   (I promise!)  And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice.  Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1.  They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met.  And it’s so nice having a p-value, right?  Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness.  They find that every distribution is non-normal and heteroskedastic.  They’re good tools, but  these hammers think every data set is a nail.  You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do.  And once again, it probably works out much of the time.  Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions.  But not all the way, and not all the assumptions.  You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions”  need to be checked, but they’re not really model assumptions, they’re data issues.  And it’s also partially because sometimes the assumptions have been taken to their logical conclusions.  That textbook author is trying to make it more logical for you.  But sometimes that just leads you to testing the related, but wrong thing.  It works out most of the time, but not always.

 

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.