• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

by Karen Grace-Martin 11 Comments

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures.  That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.

But you need to check the assumptions anyway, because some departures are so far that the p-value become inaccurate.  And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.

But sometimes you can’t.

Sometimes it’s because the dependent variable just isn’t appropriate for a linear model.  The dependent variable, Y, doesn’t have to be normal for the residuals to be normal (since Y is affected by the X’s).

The errors do.

But the distribution of the errors is related to the distribution of Y. So Y does have to be continuous, unbounded, and measured on an interval or ratio scale.

If you go through the Steps to Statistical Modeling, Step 3 is: Choose the variables for answering your research questions and determine their level of measurement. Part of the reason for doing this is to save yourself from running a linear model on a DV that just isn’t appropriate and will never meet assumptions.

Some of these include DVs that are:

  • Categorical
  • Ordinal
  • Discrete counts, bounded at 0, which is often the most common value
  • Zero Inflated, where even if the rest of the distribution looks normal, there is a huge spike in the distribution at 0.
  • Censored or truncated, including time to event variables
  • a Proportion, which is bounded at 0 and 1, or a percentage, which is bounded at 0 and 100.

If you have one of these, Stop.  Do not pass Go.  Do not run a linear model.

Hopefully you noticed this at Step 3, not when you’re checking assumptions, which is Step 11.

But luckily, there are other types of regression procedures available for all of these variables.

Tagged With: Assumptions, categorical outcome, categorical variable, Censored, Constant Variance, dependent variable, Discrete Counts, normality, ordinal variable, Proportion, Truncated, Zero Inflated

Related Posts

  • When Dependent Variables Are Not Fit for Linear Models, Now What?
  • When to Check Model Assumptions
  • Statistical Models for Truncated and Censored Data
  • Member Training: Types of Regression Models and When to Use Them

Reader Interactions

Comments

  1. Alberto says

    October 7, 2016 at 4:21 am

    Hi I have recently completed a log regression of 1 categorical variable vs 4 dependent variables. I have found the z score and chi values for these regressions however now I would like to know how i could rank the values within these variables to find “confidence intervals” ie if the value of the dependant variable is above X value what is the confident that this will cause the categorical variable to be “yes” or “no” for example.
    Thanks
    Alberto

    Reply
  2. alen owen says

    June 9, 2016 at 7:52 am

    How can i change non-normal data into normal data in order to be suitable for GLM?

    Reply
  3. Mark says

    April 3, 2016 at 11:46 am

    Hello. I would like to run a regression where the independent variable is continuous but values cannot be greater than 1 or less than -1. I also have six categroical variables with 3 levels each. What sort of regression can I run for this?

    Thanks Mark

    Reply
  4. Anees Khan says

    March 12, 2016 at 6:37 am

    Kindly help, Is there any Normality assumption required for RATIO and Dummy Independent Variable? I m confused

    thanks

    Anees

    Reply
    • Karen says

      March 26, 2016 at 1:15 pm

      Hi Anees,

      There are no distributional assumptions for Independent Variables in a regression. See this: https://www.theanalysisfactor.com/the-distribution-of-independent-variables-in-regression-models-2/

      Reply
  5. jenny says

    February 27, 2015 at 2:19 pm

    Help! I’m currently trying to run a 2x2x2x2 mixed factorial anova with 4 IVs and accuracy/success rates (described as %) in SPSS. My data is anything but normally distributed but I also don’t know which transformation to use to make it better. Any ideas would be so much appreciated!

    Reply
  6. Peter Flom says

    October 18, 2012 at 2:48 pm

    Nice post.

    “Unbounded” is interesting. if the bounds are very far from the mean (in standardized terms) it can be OK. Take, for example, weight of human adults. This has a lower bound. It certainly can’t be less than 0! Yet that’s fine, because that is so far from the mean.

    Reply
    • Karen says

      October 22, 2012 at 9:36 am

      Thanks, Peter.

      I agree. I think it’s not even that the bound is so far from the mean, but even that it’s so far from any data points. The practical problem is when you get ceiling and floor effects–when a lot of observations are butted up against the bound.

      It’s similar to the idea of using a linear regression instead of logistic, when all the probabilities are in the middle (say between .2 and .8). Because the sigmoidal logistic regression function is linear in the middle, you’ll get pretty much the same results. It’s close to 1 and 0 (the bounds) where logistic regression can accommodate the fact that the relationship isn’t linear.

      Karen

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

Free Webinars

Binary, Ordinal, and Multinomial Logistic Regression for Categorical Outcomes (Signup)

This Month’s Statistically Speaking Live Training

  • April Member Training: Statistical Contrasts

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT