The Assumptions of Normality and Constant Variance in a linear model (both OLS regression and ANOVA) are quite robust to departures. That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.
But you need to check the assumptions anyway, because some departures are so far from the assumptions that the p-value become inaccurate. And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.
But sometimes you can’t.
Sometimes it’s because the dependent variable just isn’t appropriate for a GLM. The dependent variable, Y, doesn’t have to be normal for the residuals to be normal (since Y is affected by the X’s).
But Y does have to be continuous, unbounded, and measured on an interval or ratio scale.
If you go through the Steps to Statistical Modeling, Step 3 is: Choose the variables for answering your research questions and determine their level of measurement. Part of the reason for doing this is to save yourself from running a linear model on a DV that just isn’t appropriate and will never meet assumptions. Some of these include DVs that are:
- Discrete counts, bounded at 0, which is often the most common value
- Zero Inflated, where even if the rest of the distribution looks normal, there is a huge spike in the distribution at 0.
- Censored or truncated, including time to event variables
- a Proportion, which is bounded at 0 and 1, or a percentage, which is bounded at 0 and 100.
If you have one of these, Stop. Do not pass Go. Do not run a linear model.
Hopefully you noticed this at Step 3, not when you’re checking assumptions.
If you’d like to learn more about planning a data analysis based on the types of variables involved, check out our webinar recording: The First 3 Steps to Statistical Modeling: How to Clarify the Research Question, the Design, and the Variables.
And if you want to learn all the ins and outs of dealing with logistic regression, check out our 8-hour live workshop Binary, Ordinal, and Multinomial Logistic Regression.