# Assumptions of Linear Models are about Residuals, not the Response Variable

by

I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable.

The asker had a situation where Y, the response, was not normally distributed, but the residuals were.

Quick Answer:  It’s just the residuals.

In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions:

ε~ i.i.d. N(0, σ²)

That ε is the residual term (and it ought to have an i subscript–one for each individual).  The i.i.d. means every residual is independent and identically distributed.  They all have the same distribution, which is defined right afterward.

You’ll notice there is nothing similar about Y.  ε’s distribution is influenced by Y’s, which is why Y has to be continuous, unbounded, and measured on an interval or ratio scale.

But Y’s distribution is also influenced by the X’s.  ε’s isn’t.  That’s why you can get a normal distribution for ε, but lopsided, chunky, or just plain weird-looking Y.

Learn more about each of the assumptions of linear models–regression and ANOVA–so they make sense–in our new On Demand workshop: Assumptions of Linear Models.

Bruce Weaver June 11, 2013 at 2:30 pm

Hello Karen. Thanks for this nice post on an issue that often confuses people. I have two minor comments. First, I think you meant to say interval OR ratio scale in the second to last paragraph. Second, I think it is useful (at least for more advanced users of statistics) to point out the important distinction between errors and residuals, as in this Wikipedia page:

The i.i.d. N(0, σ²) assumption applies to the errors, not the residuals. For example, if you give me n-1 of the residuals from your regression model, I can work out the last one, because they must sum to 0. So the residuals are not truly independent. The unobservable errors, on the other hand, can be truly independent.

Once again, thanks for a great post.

Cheers,
Bruce

Yashwanth October 21, 2016 at 11:55 am

Thanks Bruce. The answer got me confused on error and residual. This comment again re-installed faith in my understanding.

Kevin March 7, 2013 at 1:36 am

Hi Karen,

Since Y = E(Y) + ε, and E(Y) is a constant (function of X’s and betas), this should imply that the variance, independence and distributional assumptions on ε applies to Y as well. Am I right to say this?