The Problem with Using Tests for Statistical Assumptions

by Karen Grace-Martin

Share

Every statistical model and hypothesis test has assumptions.

And yes, if you’re going to use a statistical test, you need to check whether those assumptions are reasonable to whatever extent you can.

Some assumptions are easier to check than others. Some are so obviously reasonable that you don’t need to do much to check them most of the time. And some have no good way of being checked directly, so you have to use situational clues.

There are so many nuances with assumptions as well: depending on a lot of the details of your particular study and data set, violations of some assumptions may be more or less serious. It depends on a lot of details: sample sizes, imbalance in the data across groups, whether the study is exploratory or confirmatory, etc.

And here’s the kicker: the simple rules your stats professor told you to use to test assumptions were absolutely sufficient when you were learning about tests and assumptions. But now that you’re doing real data analysis? It’s time to dig into the details.

So here are some guidelines about checking assumptions that should help you make decisions about which to check and when to conclude that you’ve checked enough.

Before you begin

  1. make sure you understand what the assumptions are and what they mean. There is a lot of misinformation and vague information out there about assumptions.
  2. Don’t forget that when assumptions violations are serious, it will call into question all your results. This really is important.

Don’t rely on a single statistical test to decide if another test’s assumptions have been met.

There are many tests, like Levene’s test for homogeneity of variance, the Kolmogorov-Smirnov test for normality, the Bartlett’s test for sphericity, whose main usage is to test the assumptions of another test.

These tests probably have other uses, but this is how I’ve generally seen them used.

These tests provide useful information about whether an assumption is being met. But they’re just one piece of information that you should use to decide if the assumption is reasonable.

Because, nuances.

Let’s use that same example of Levene’s test of homogeneity of variance. It’s often used in ANOVA as a sole decision criterion. And software makes it so easy.

But it’s too simple.

  1. It relies too much on p-values, and therefore, sample sizes. If the sample size is large, Levene’s will have a smaller p-value than if the sample size is small, given the same variances.So it’s very likely that you’re overstating a problem with the assumption in large samples and understating it in small samples. You can’t ignore the actual size difference in the variances when making this decision. So sure, look at the p-value, but also look at the actual variances and how much bigger some are than others. (In other words, actually look at the effect size, not just the p-value).
  2. The ANOVA is generally considered robust to violations of this assumption when sample sizes across groups are equal. So even if Levene’s is significant, moderately different variances may not be a problem in balanced data sets. Keppel (1992) suggests that a good rule of thumb is that if sample sizes are equal, robustness should hold until the largest variance is more than 9 times the smallest variance.
  3. This robustness goes away the more unbalanced the samples are. So you need to use judgment here, taking into account both the imbalance and the actual difference in variances.

Gather Evidence Instead

Use the results of the Levene’s test as one piece of evidence you’ll use to make a decision.

In addition to that and the ratio rule of thumb, the other pieces of info could potentially include:

  • A graph of your data. Is there an obvious difference in spread across groups?
  • A different test of the same assumption, if it exists. (Hartley’s F-max is another one for equal variances). See if the results match.

Consider each piece of evidence within the wider data context.

Now make a judgment. Take into account the sample sizes when interpreting p-values from any tests.

I know it’s not comfortable to make judgments based on uncertain information. Experience helps here.

But remember, in data analysis, it’s impossible not to.

Be transparent in what you did and on what evidence you based your decision.

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to answers and more resources 24/7.

Previous post:

Next post: