If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.
I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.
1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.
2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, assumptions. So they write them in more concrete terms that aren’t incorrect, but aren’t the core assumptions, either.
3. Some authors are writing for very specific fields or research situations, like experiments or survey data analysis. They state the assumptions in terms specific to that analysis, not the more general forms. For example, the assumptions of ANOVA are the same as those for regression, although they’re often written in a more specific form.
4. Likewise, sometimes the logical implication of an assumption is more interesting or important to a specific field or is just generally easier to test. So rather than writing the assumption itself, the implicatation is written. Logically, they’re really the same thing. But they can look totally different, and it can make you look at someone’s list and say “hey, they left something out!”
So what are they, really?
The Explicit Assumptions
These assumptions are explicitly stated by the model:
- The errors are independent of each other
- The errors are normally distributed
- The errors have a mean of 0 at all values of X
- The errors have constant variance
The Implicit Assumptions
These assumptions aren’t, but the specification of the model implies them. This is the way I’ve summarized them–they can be written with different terminology, of course.
- All X are fixed and are measured without error
- The model is linear in the parameters
- The predictors and response are specified correctly
- There is a single source of unmeasured random variance
If there is an assumption you’ve heard not on this list, chances are it is a logical extension of one of these core assumptions.
Fabio Valeri says
For linear regression the assumption of normality distribution of the residuals/errors is not mandatory. It is only useful if you want to use standard errors to compute p-values and confidence intervals. An alternative to compute CI and p-values would be bootstrppng.
S Chapman says
That is correct, normality of data (or errors) is not mandatory for the simple linear model to be useful. The Ordinary Least Squares method does not make any distributional assumptions.
IM CHIU says
The use of “residuals” in the Explicit Assumption can be misleading. The linear model make major assumptions on the “error” term. The “residuals” are the estimates of the “errors”.
Karen Grace-Martin says
Thanks IM, I updated that.