If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.

I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.

1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.

2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, assumptions. So they write them in more concrete terms that aren’t incorrect, but aren’t the core assumptions, either.

3. Some authors are writing for very specific fields or research situations, like experiments or survey data analysis. They state the assumptions in terms specific to that analysis, not the more general forms. For example, the assumptions of ANOVA are the same as those for regression, although they’re often written in a more specific form.

4. Likewise, sometimes the logical implication of an assumption is more interesting or important to a specific field or is just generally easier to test. So rather than writing the assumption itself, the implicatation is written. Logically, they’re really the same thing. But they can look totally different, and it can make you look at someone’s list and say “hey, they left something out!”

So what are they, really?

### The Explicit Assumptions

These assumptions are explicitly stated by the model:

- The residuals are independent
- The residuals are normally distributed
- The residuals have a mean of 0 at all values of X
- The residuals have constant variance

### The Implicit Assumptions

These assumptions aren’t, but the specification of the model implies them. This is the way I’ve summarized them–they can be written with different terminology, of course.

- All X are fixed and are measured without error
- The model is linear in the parameters
- The predictors and response are specified correctly
- There is a single source of unmeasured random variance

If there is an assumption you’ve heard not on this list, chances are it is a logical extension of one of these core assumptions.

{ 2 comments… read them below or add one }

For linear regression the assumption of normality distribution of the residuals/errors is not mandatory. It is only useful if you want to use standard errors to compute p-values and confidence intervals. An alternative to compute CI and p-values would be bootstrppng.

The use of “residuals” in the Explicit Assumption can be misleading. The linear model make major assumptions on the “error” term. The “residuals” are the estimates of the “errors”.

{ 1 trackback }