• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Assumptions of Linear Models are about Residuals, not the Response Variable

by Karen Grace-Martin 6 Comments

I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable.

The asker had a situation where Y, the response, was not normally distributed, but the residuals were.

Quick Answer:  It’s just the residuals.

In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions:

ε~ i.i.d. N(0, σ²)

That ε is the residual term (and it ought to have an i subscript–one for each individual).  The i.i.d. means every residual is independent and identically distributed.  They all have the same distribution, which is defined right afterward.

You’ll notice there is nothing similar about Y.  ε’s distribution is influenced by Y’s, which is why Y has to be continuous, unbounded, and measured on an interval or ratio scale.

But Y’s distribution is also influenced by the X’s.  ε’s isn’t.  That’s why you can get a normal distribution for ε, but lopsided, chunky, or just plain weird-looking Y.

tn_assum_lmLearn more about each of the assumptions of linear models–regression and ANOVA–so they make sense–in our new On Demand workshop: Assumptions of Linear Models.


Bookmark and Share

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Tagged With: Assumptions, linear model

Related Posts

  • ANCOVA Assumptions: When Slopes are Unequal
  • The Assumptions of Linear Models: Explicit and Implicit
  • Why ANOVA is Really a Linear Regression, Despite the Difference in Notation
  • Using Pairwise Comparisons to Help you Interpret Interactions in Linear Regression

Reader Interactions

Comments

  1. Tobia says

    June 11, 2019 at 4:06 am

    Hi Karen. Thanks for the post.

    I just have one question to make sure I got your point. If I cannot reject the null of normally distributed residuals (for example, running an Anderson-Darling test), does this imply that p-values in an OLS regression are right and reliable, even though the Xs and Ys are not normal?

    Cheers,
    Tobia

    Reply
    • Karen Grace-Martin says

      August 22, 2019 at 1:42 pm

      Hi Tobia, No. You really don’t want to use a hypothesis test to check assumptions. See:
      https://www.theanalysisfactor.com/the-problem-with-tests-for-statistical-assumptions/
      https://www.theanalysisfactor.com/anatomy-of-a-normal-probability-plot/

      Reply
  2. Bruce Weaver says

    June 11, 2013 at 2:30 pm

    Hello Karen. Thanks for this nice post on an issue that often confuses people. I have two minor comments. First, I think you meant to say interval OR ratio scale in the second to last paragraph. Second, I think it is useful (at least for more advanced users of statistics) to point out the important distinction between errors and residuals, as in this Wikipedia page:

    http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics

    The i.i.d. N(0, σ²) assumption applies to the errors, not the residuals. For example, if you give me n-1 of the residuals from your regression model, I can work out the last one, because they must sum to 0. So the residuals are not truly independent. The unobservable errors, on the other hand, can be truly independent.

    Once again, thanks for a great post.

    Cheers,
    Bruce

    Reply
    • Yashwanth says

      October 21, 2016 at 11:55 am

      Thanks Bruce. The answer got me confused on error and residual. This comment again re-installed faith in my understanding.

      Reply
  3. Kevin says

    March 7, 2013 at 1:36 am

    Hi Karen,

    Since Y = E(Y) + ε, and E(Y) is a constant (function of X’s and betas), this should imply that the variance, independence and distributional assumptions on ε applies to Y as well. Am I right to say this?

    Reply
    • Karen says

      March 7, 2013 at 9:51 am

      Hi Kevin,

      One small change that makes all the difference: Y=E(Y|X) + e. If every individual had the same value of X, then yes, the distribution of Y would match that of e. Since they generally differ, the Y’s are affected by the X’s but the residuals aren’t.

      The distribution of Y|X is the same as the distribution of e, but the distribution of Y isn’t necessarily. I’ve seen many data sets where Y is skewed, but e is normal.

      Karen

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • February Member Training: Choosing the Best Statistical Analysis

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT