I am reviewing your notes from your workshop on assumptions. You have made it very clear how to analyze normality for regressions, but I could not find how to determine normality for ANOVAs. Do I check for normality for each independent variable separately? Where do I get the residuals? What plots do I run? Thank you!

I received this great question this morning from a past participant in my Assumptions of Linear Models workshop.

It’s one of those quick questions without a quick answer. Or rather, without a quick and useful answer. The quick answer is:

Do it exactly the same way. All of it.

The longer, useful answer is this:The assumptions are exactly the same for ANOVA and regression models. The normality assumption is that residuals follow a normal distribution. You usually see it like this:

*ε~ i.i.d. N(0, σ²)*

But what it’s really getting at is the distribution of Y|X. That’s Y given the value of X. Because X values are considered fixed, they have no distributions. Residuals have the same distribution as Y|X. If residuals are normally distributed, it means that Y is normally distributed within a value of X (not necessarily overall).

The only difference between the models is that ANOVAs generally have only categorical predictor variables, whereas regressions tend to have mostly continuous ones. So while the assumption is the same, it plays out differently.

When predictors are continuous, it’s impossible to check for normality of Y separately for each individual value of X. There are too many values of X and there is usually only one observation at each value of X. So you have to use the residuals to check normality.

But when predictors are categorical, there are usually just a few values of X (the categories), and there are many observations at each value of X. So you’ll often see the normality assumption for an ANOVA stated as:

“The distribution of Y within each group is normally distributed.” It’s the same thing as Y|X and *in this context, *it’s the same as saying the residuals are normally distributed.

The concept of a residual seems strange in an ANOVA, and often in that context, you’ll hear them called “errors” instead of “residuals.” But they’re the same thing. It’s the distance between the actual value of Y and the mean value of Y for a specific value of X. Those distances have the same distribution as the Ys within that group.

So in ANOVA, you actually have two options for testing normality. If there really are many values of Y for each value of X (each group), and there really are only a few groups (say, four or fewer), go ahead and check normality separately for each group.

But if you have many groups (a 2x2x3 ANOVA has 12 groups) or if there are few observations per group (it’s hard to check normality on only 20 data points), it’s often easier to just use the residuals and check them all together.

If you have a continuous covariate in the model as well, you’ve just lost option one, and residuals are the only way to go.

All GLM procedures have an option to save residuals. Once you do, run the same QQ plots to check normality as you would in regression.

{ 10 comments… read them below or add one }

Hi Karen, thank you so much for this post. I got quite confused about the normality assumption and this cleared it up like nothing I have read so far! I got a a few questions left however: If I understood you correctly you say in ANOVA you can check the distribution of Y per each group as well as check the distribution of residuals in order to assess the normality assumption. 1)Why should you choose to look at the distribution of Y with few groups instead of always choose to look at the distribution of residuals, regardless of the number of groups? And 2) If you look at residuals, do you check the distribution of residuals PER group or overall? Thanks so much!

Hello was just wondering if you are testing for normality in different levels of the IV , would you use Shapiro Wilk if each level is less than 50 or does it have to be the total number of cases are less than 50 ?

Hi Sharon,

I don’t use Shaprio Wilk. I use QQ Plots regardless of sample size.

Hi Karen,

I want to run an ANCOVA using R so as to evaluate the effect of a factor (which can be sex, age, area, etc., with several levels male/female, adult/subadult, a/b/c/d, etc.) on the relationship between length (covariate) and weight data (response variable). Prior to that, I have to check both normality and homogeneity of variances assumptions. Thanks to your enlightening post I now know that it is the residuals and not the variables that must fulfill normality.

But my question is, how I am supposed to check it? I mean, should I run the Shapiro-Wilk test for the logweight~loglength regression in each level of the factor? Putting sex as an example, for males and females separatedly? As there are some factors in my dataset which have many levels, I wonder if there is a quicker (although statistically correct) way to do so.

Thank you in advance!

Hi!

Can you tell me if it matters that data are non-normal? or we just have to verify the normality for the residuals?

Yes. https://www.theanalysisfactor.com/assumptions-about-residuals/

What can you do if normality is not fulfilled for the residuals when running a ANOVA and multiple comparison of means by using S.plus?

Hi Ruth,

It depends on exactly what kind of anova you’re doing and how the data are non-normal (ie, bimodal, skewed, really, really skewed). I’d need more information to tell you what to do, but here are some options:

1. Kruskal Wallis test

2. Transformation of Y

3. Some other type of model entirely. See this:https://www.theanalysisfactor.com/when-dependent-variables-are-not-fit-for-glm-now-what/

Hi Baris,

I am honestly not sure if the number of levels themselves affect the Levene’s test. I don’t generally use it myself.

But I can tell you, those two tests are comparing two different things. The first is comparing 4 variances to each other. The other is comparing 20. So it’s entirely reasonable to believe that those 4 variances are similar, but once you split the data into 20 groups, not all 20 variances are the same.

Karen

Dear Karen,

Thank you for your insightful blog posts. Although my question is not related to normality assumption, it is about the other assumption of anova; variance. I want to run a 2-way anova using SPSS (it’s unbalanced). First factor has 4 levels, the second one has 5 levels. When I do one-way anova with 4-level factor with log transformed DV, the levene test has p-value >.05; but when I introduce the second factor, levene test p-value <.05. Can the reason be the factors have too many levels? Thank you.

Regards,