The Assumptions of Linear Models: Explicit and Implicit

If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.

I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.

1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.

2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, assumptions. So they write them in more concrete terms that aren’t incorrect, but aren’t the core assumptions, either.

3. Some authors are writing for very specific fields or research situations, like experiments or survey data analysis. They state the assumptions in terms specific to that analysis, not the more general forms. For example, the assumptions of ANOVA are the same as those for regression, although they’re often written in a more specific form.

4. Likewise, sometimes the logical implication of an assumption is more interesting or important to a specific field or is just generally easier to test. So rather than writing the assumption itself, the implicatation is written. Logically, they’re really the same thing. But they can look totally different, and it can make you look at someone’s list and say “hey, they left something out!”

So what are they, really?

The Explicit Assumptions

These assumptions are explicitly stated by the model:

The errors are independent of each other
The errors are normally distributed
The errors have a mean of 0 at all values of X
The errors have constant variance

The Implicit Assumptions

These assumptions aren’t, but the specification of the model implies them. This is the way I’ve summarized them–they can be written with different terminology, of course.

All X are fixed and are measured without error
The model is linear in the parameters
The predictors and response are specified correctly
There is a single source of unmeasured random variance

If there is an assumption you’ve heard not on this list, chances are it is a logical extension of one of these core assumptions.

Four Critical Steps in Building Linear Regression Models

While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Comments

Fabio Valeri says

November 17, 2017 at 1:50 pm

For linear regression the assumption of normality distribution of the residuals/errors is not mandatory. It is only useful if you want to use standard errors to compute p-values and confidence intervals. An alternative to compute CI and p-values would be bootstrppng.

- S Chapman says
  
  December 6, 2022 at 6:07 am
  
  That is correct, normality of data (or errors) is not mandatory for the simple linear model to be useful. The Ordinary Least Squares method does not make any distributional assumptions.
  
IM CHIU says

December 18, 2015 at 3:54 pm

The use of “residuals” in the Explicit Assumption can be misleading. The linear model make major assumptions on the “error” term. The “residuals” are the estimates of the “errors”.

- Karen Grace-Martin says
  
  August 13, 2020 at 11:57 am
  
  Thanks IM, I updated that.

The Explicit Assumptions

The Implicit Assumptions

Reader Interactions

Comments

Leave a Reply Cancel reply