ANOVA

Interpreting (Even Tricky) Regression Coefficients – A Quiz

January 15th, 2010 by Karen Grace-Martin

Here’s a little quiz:

True or False?

1. When you add an interaction to a regression model, you can still evaluate the main effects of the terms that make up the interaction, just like in ANOVA.

2. The intercept is usually meaningless in a regression model. (more…)

1 comment

Have you Wondered how using SPSS Burns Calories?

October 30th, 2009 by Karen Grace-Martin

Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.

Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:

“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “

And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)

No comments yet

3 Mistakes Data Analysts Make in Testing Assumptions in GLM

September 1st, 2009 by Karen Grace-Martin

I know you know it–those assumptions in your regression or ANOVA model really are important. If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions. Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy. Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t. (I promise!) And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice. Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1. They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met. And it’s so nice having a p-value, right? Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness. They find that every distribution is non-normal and heteroskedastic. They’re good tools, but these hammers think every data set is a nail. You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do. And once again, it probably works out much of the time. Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions. But not all the way, and not all the assumptions. You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions” need to be checked, but they’re not really model assumptions, they’re data issues. And it’s also partially because sometimes the assumptions have been taken to their logical conclusions. That textbook author is trying to make it more logical for you. But sometimes that just leads you to testing the related, but wrong thing. It works out most of the time, but not always.

No comments yet

3 Situations When it Makes Sense to Categorize a Continuous Predictor in a Regression Model

July 24th, 2009 by Karen Grace-Martin

In many research fields, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits. This is a way of splitting the sample into two categories: the “high” values above the median and the “low” values below the median.

Reasons Not to Categorize a Continuous Predictor

There are many reasons why this isn’t such a good idea: (more…)

4 comments

New version released of Amelia II: A Program for Missing Data

June 30th, 2009 by Karen Grace-Martin

A new version of Amelia II, a free package for multiple imputation, has just been released today. Amelia II is available in two versions. One is part of R, and the other, AmeliaView, is a GUI package that does not require any knowledge of the R programming language. They both use the same underlying algorithms and both require having R installed.

At the Amelia II website, you can download Amelia II (did I mention it’s free?!), download R, get the very useful User’s Guide, join the Amelia listserve, and get information about multiple imputation.

If you want to learn more about multiple imputation:

Read more articles at this website
Watch my webinar on Approaches to Missing Data
Get the list of recommended resources. There are some really good, easy to read articles, books, and websites in this list.

1 comment

Beyond Median Splits: Meaningful Cut Points

June 26th, 2009 by Karen Grace-Martin

I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.

But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be (more…)

No comments yet