The Assumptions of Linear Models: Explicit and Implicit

If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.Stage 2

I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.

1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.

2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, assumptions.  So they write them in more concrete terms that aren’t incorrect, but aren’t the core assumptions, either.

3. Some authors are writing for very specific fields or research situations, like experiments or survey data analysis.  They state the assumptions in terms specific to that analysis, not the more general forms.  For example, the assumptions of ANOVA are the same as those for regression, although they’re often written in a more specific form.

4. Likewise, sometimes the logical implication of an assumption is more interesting or important to a specific field or is just generally easier to test.  So rather than writing the assumption itself, the implicatation is written.  Logically, they’re really the same thing.  But they can look totally different, and it can make you look at someone’s list and say “hey, they left something out!”

So what are they, really?

The Explicit Assumptions

These assumptions are explicitly stated by the model:

  1. The errors are independent of each other
  2. The errors are normally distributed
  3. The errors have a mean of 0 at all values of X
  4. The errors have constant variance

The Implicit Assumptions

These assumptions aren’t, but the specification of the model implies them.  This is the way I’ve summarized them–they can be written with different terminology, of course.

  1. All X are fixed and are measured without error
  2. The model is linear in the parameters
  3. The predictors and response are specified correctly
  4. There is a single source of unmeasured random variance

If there is an assumption you’ve heard not on this list, chances are it is a logical extension of one of these core assumptions.

1.

 

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions

Comments

  1. Fabio Valeri says

    For linear regression the assumption of normality distribution of the residuals/errors is not mandatory. It is only useful if you want to use standard errors to compute p-values and confidence intervals. An alternative to compute CI and p-values would be bootstrppng.

    • S Chapman says

      That is correct, normality of data (or errors) is not mandatory for the simple linear model to be useful. The Ordinary Least Squares method does not make any distributional assumptions.

  2. IM CHIU says

    The use of “residuals” in the Explicit Assumption can be misleading. The linear model make major assumptions on the “error” term. The “residuals” are the estimates of the “errors”.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.