Assumptions of Linear Models are about Residuals, not the Response Variable

by Karen Grace-Martin


I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable.

The asker had a situation where Y, the response, was not normally distributed, but the residuals were.

Quick Answer:  It’s just the residuals.

In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions:

ε~ i.i.d. N(0, σ²)

That ε is the residual term (and it ought to have an i subscript–one for each individual).  The i.i.d. means every residual is independent and identically distributed.  They all have the same distribution, which is defined right afterward.

You’ll notice there is nothing similar about Y.  ε’s distribution is influenced by Y’s, which is why Y has to be continuous, unbounded, and measured on an interval or ratio scale.

But Y’s distribution is also influenced by the X’s.  ε’s isn’t.  That’s why you can get a normal distribution for ε, but lopsided, chunky, or just plain weird-looking Y.

tn_assum_lmLearn more about each of the assumptions of linear models–regression and ANOVA–so they make sense–in our new On Demand workshop: Assumptions of Linear Models.

Bookmark and Share

{ 3 comments… read them below or add one }

Leave a Comment

Please note that Karen receives hundreds of comments at The Analysis Factor website each week. Since Karen is also busy teaching workshops, consulting with clients, and running a membership program, she seldom has time to respond to these comments anymore. If you have a question to which you need a timely response, please check out our low-cost monthly membership program, or sign-up for a quick question consultation.

Previous post:

Next post: