Karen Grace-Martin

A Resource for SPSS Algorithms

September 25th, 2009 by Karen Grace-Martin

As a data analyst, you will occasionally need to know how your software package is calculating the statistics.

SPSS makes the algorithms for many of its tests available at:

Don’t expect them to be user-friendly if you’re not a statistician–these are the actual equations SPSS is using. But some have more detailed explanations than others, and sometimes you just need to make sure that the equation that SPSS is using is indeed the same one that your nicely detailed text is so nicely describing. This can be really useful when there are different versions of a test.

No comments yet

6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

September 17th, 2009 by Karen Grace-Martin

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures. That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.

But you need to check the assumptions anyway, because some departures are so far off that the p-values become inaccurate. And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.

But sometimes you can’t.

Sometimes it’s because the dependent variable just isn’t appropriate for a linear model. The (more…)

11 comments

3 Mistakes Data Analysts Make in Testing Assumptions in GLM

September 1st, 2009 by Karen Grace-Martin

I know you know it–those assumptions in your regression or ANOVA model really are important. If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions. Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy. Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t. (I promise!) And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice. Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1. They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met. And it’s so nice having a p-value, right? Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness. They find that every distribution is non-normal and heteroskedastic. They’re good tools, but these hammers think every data set is a nail. You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do. And once again, it probably works out much of the time. Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions. But not all the way, and not all the assumptions. You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions” need to be checked, but they’re not really model assumptions, they’re data issues. And it’s also partially because sometimes the assumptions have been taken to their logical conclusions. That textbook author is trying to make it more logical for you. But sometimes that just leads you to testing the related, but wrong thing. It works out most of the time, but not always.

No comments yet

Quick-R: A guide for SPSS, SAS, and Stata Users

August 20th, 2009 by Karen Grace-Martin

If you are a SPSS, SAS, or Stata user who finds yourself needing to use R (I mean, it’s free), I just found this great website: http://statmethods.net/index.html.

No comments yet

To Compare Regression Coefficients, Include an Interaction Term

August 14th, 2009 by Karen Grace-Martin

Just yesterday I got a call from a researcher who was reviewing a paper. She didn’t think the authors had run their model correctly, but wanted to make sure. The authors had run the same logistic regression model separately for each sex because they expected that the effects of the predictors were different for men and women.

On the surface, there is nothing wrong with this approach. It’s completely legitimate to consider men and women as two separate populations and to model each one separately.

As often happens, the problem was not in the statistics, but what they were trying to conclude from them. The authors went on to compare the two models, and specifically compare the coefficients for the same predictors across the two models.

Uh-oh. Can’t do that.

If you’re just describing the values of the coefficients, fine. But if you want to compare the coefficients AND draw conclusions about their differences, you need a p-value for the difference.

Luckily, this is easy to get. Simply include an interaction term between Sex (male/female) and any predictor whose coefficient you want to compare. If you want to compare all of them because you believe that all predictors have different effects for men and women, then include an interaction term between sex and each predictor. If you have 6 predictors, that means 6 interaction terms.

In such a model, if Sex is a dummy variable (and it should be), two things happen:

1.the coefficient for each predictor becomes the coefficient for that variable ONLY for the reference group.

2. the interaction term between sex and each predictor represents the DIFFERENCE in the coefficients between the reference group and the comparison group. If you want to know the coefficient for the comparison group, you have to add the coefficients for the predictor alone and that predictor’s interaction with Sex.

The beauty of this approach is that the p-value for each interaction term gives you a significance test for the difference in those coefficients.

33 comments

Is the mean always greater than the median in a right skewed distribution?

July 3rd, 2009 by Karen Grace-Martin

One of the basic tenets of statistics that every student learns in about the second week of intro stats is that in a skewed distribution, the mean is closer to the tail in a skewed distribution.

So in a right skewed distribution (the tail points right on the number line), the mean is higher than the median.

It’s a rule that makes sense, and I have to admit, I never questioned it.

But a great article in the Journal of Statistical Education shows that it really only holds in idealized, unimodal, continuous distributions: http://jse.amstat.org/v13n2/vonhippel.html.

2 comments