Karen Grace-Martin

3 Mistakes Data Analysts Make in Testing Assumptions in GLM

September 1st, 2009 by

I know you know it–those assumptions in your regression or ANOVA model really are important.  If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions.  Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy.  Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t.   (I promise!)  And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice.  Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1.  They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met.  And it’s so nice having a p-value, right?  Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness.  They find that every distribution is non-normal and heteroskedastic.  They’re good tools, but  these hammers think every data set is a nail.  You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do.  And once again, it probably works out much of the time.  Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions.  But not all the way, and not all the assumptions.  You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions”  need to be checked, but they’re not really model assumptions, they’re data issues.  And it’s also partially because sometimes the assumptions have been taken to their logical conclusions.  That textbook author is trying to make it more logical for you.  But sometimes that just leads you to testing the related, but wrong thing.  It works out most of the time, but not always.

 


Quick-R: A guide for SPSS, SAS, and Stata Users

August 20th, 2009 by

If you are a SPSS, SAS, or Stata user who finds yourself needing to use R (I mean, it’s free), I just found this great website: http://statmethods.net/index.html.

 


To Compare Regression Coefficients, Include an Interaction Term

August 14th, 2009 by

Just yesterday I got a call from a researcher who was reviewing a paper.  She didn’t think the authors had run their model correctly, but wanted to make sure.  The authors had run the same logistic regression model separately for each sex because they expected that the effects of the predictors were different for men and women.

On the surface, there is nothing wrong with this approach.  It’s completely legitimate to consider men and women as two separate populations and to model each one separately.

As often happens, the problem was not in the statistics, but what they were trying to conclude from them.   The authors went on to compare the two models, and specifically compare the coefficients for the same predictors across the two models.

Uh-oh. Can’t do that.

If you’re just describing the values of the coefficients, fine.  But if you want to compare the coefficients AND draw conclusions about their differences, you need a p-value for the difference.

Luckily, this is easy to get.  Simply include an interaction term between Sex (male/female) and any predictor whose coefficient you want to compare.  If you want to compare all of them because you believe that all predictors have different effects for men and women, then include an interaction term between sex and each predictor.  If you have 6 predictors, that means 6 interaction terms.

In such a model, if Sex is a dummy variable (and it should be), two things happen:

1.the coefficient for each predictor becomes the coefficient for that variable ONLY for the reference group.

2. the interaction term between sex and each predictor represents the DIFFERENCE in the coefficients between the reference group and the comparison group.  If you want to know the coefficient for the comparison group, you have to add the coefficients for the predictor alone and that predictor’s interaction with Sex.

The beauty of this approach is that the p-value for each interaction term gives you a significance test for the difference in those coefficients.

 


Essentials of Craft: How to Become a Skilled and Confident Statistical Analyst

August 12th, 2009 by

After nearly twenty years of helping researchers hone their statistical skills to become better data analysts, I’ve had a few insights about what that process looks like.

The one thing you don’t need to become a great data analyst is some innate statistical genius. That kind of fixed mindset will undermine the growth in your statistical skills.

So to start your journey become a skilled and confident statistical analyst, you need: (more…)


Is the mean always greater than the median in a right skewed distribution?

July 3rd, 2009 by

One of the basic tenets of statistics that every student learns in about the second week of intro stats is that in a skewed distribution, the mean is closer to the tail in a skewed distribution.

So in a right skewed distribution (the tail points right on the number line), the mean is higher than the median.

It’s a rule that makes sense, and I have to admit, I never questioned it.

But a great article in the Journal of Statistical Education shows that it really only holds in idealized, unimodal, continuous distributions:  http://jse.amstat.org/v13n2/vonhippel.html.

 


On Puzzles, Statistics, Algorithms, and Understanding

July 1st, 2009 by

My 8 year-old son got a Rubik’s cube in his Christmas stocking this year.

I had gotten one as a birthday present when I was about 10.  It was at the height of the craze and I was so excited.

I distinctly remember bursting into tears when I discovered that my little sister sneaked playing with it, and messed it up the day I got it.  I knew I would mess it up to an unsolvable point soon myself, but I was still relishing the fun of creating patterns in the 9 squares, then getting it back to 6 sides of single-colored perfection.  (I loved patterns even then). (more…)