• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

When to Check Model Assumptions

by Karen Grace-Martin 1 Comment

Like the chicken and the egg, there’s a question about which comes first: run a model or test assumptions? Unlike the chickens’, the model’s question has an easy answer.

There are two types of assumptions in a statistical model.  Some are distributional assumptions about the residuals.  Examples include independence, normality, and constant variance in a linear model.

Others are about the form of the model.  They include linearity and including the right predictors.

You can get clues about whether most of these assumptions will be met before running a model. But you can’t check them.

All the distributional assumptions of linear models are about the residuals.  Many of the others can be checked by looking at residuals.

And you can’t get residuals until you run a model.

In the steps to running a model I use, testing assumptions is step 11.  Running an initial model is number 9.  Here is the full list, in case you haven’t seen it.

  1. Write out research questions in theoretical and operational terms
  2. Design the study or define the design
  3. Choose the variables for answering the research questions and determine their level of measurement
  4. Write an analysis plan
  5. Calculate sample size estimations
  6. Collect, code, enter, and clean data
  7. Create new variables
  8. Run Univariate and Bivariate Statistics
  9. Run an initial model
  10. Refine predictors and check model fit
  11. Test assumptions
  12. Check for and resolve data issues
  13. Interpret Results
  14. Communicate Results

A Big Fat Caveat

So don’t start running normal probability plots or checking variances before you are reasonably sure you have what is close to a final model.

But that doesn’t mean you should put a lot of work into model refinement without a reasonable idea of whether the model is appropriate for the data.

You want to be thinking about the most appropriate type and form of model from the very beginning.

If you’ve done the foundational work in the early steps, testing assumptions is about looking for minor deviations, not major transgressions.

The Design

In Step 2, you defined the design. You checked for things like repeated measures, pairing, cluster sampling, or nested factors.  Any of these would make residuals non-independent.

If any of these design issues exist in your data, you’re not going to apply a linear model and only notice non-independence once you get to the 11th step.

Instead, you’d choose a model that accounts for the non-independence.

So yes, you still should check if the non-independence exists in the data during step 11.  (Sometimes it doesn’t even though the design indicates it’s likely).  But you should look for it and incorporate it into the analysis plan much, much earlier.

The Scales of Measurement

In Step 3, you defined the measurement scales of all variables.

Remember, an outcome variable (Y) does not have to be normally distributed for a linear model’s assumptions to be met.

The residuals do.

So don’t bother running tests of normality on Y. All that will do is make you panic unnecessarily if it’s a bit skewed.  (But do look at the distribution in an upcoming step, before you run a single model).

Since the predictor variables (the Xs) affect the shape of Y’s distribution, it’s possible for the residuals to be normally distributed even when Y isn’t.

But this can only happen if Y is continuous, unbounded, and measued on an interval or ratio scale.

If any of these fail, it’s nearly impossible to get normally distributed residuals, even with remedial transformations.

Types of variables that will generally fail these criteria include:

  • Categorical Variables, both nominal and ordinal.
  • Count Variables, which are often distributed as Poisson or Negative Binomial.

So if you find your outcome variable isn’t continuous, run a more appropriate initial model.

Run descriptive statistics first.

Likewise, in Step 6, you ran univariate and bivariate descriptive statistics—and even better, graphs—on all variables you planned to use in your model.

The univariate graphs will illuminate any distributional hiccups on even continuous data.   Skew may not be a problem, especially if it’s not extreme.  But other distributional issues can be.  Here you’re looking for issues like:

  • Zero Inflated data, which have a huge spike in the distribution at 0. They are common in count variables, but can occur with any distribution.
  • Censored or truncated data, which have full information only for some values. The distribution gets cut off for some values, at one or both ends.
  • Proportions, bounded at 0 and 1, become problematic if much of the data are close to the bounds.

The bivariate graphs will help you see any non-linearity in relationships and give you inklings of non-constant variance. This will allow you to incorporate these issues into the initial model run in Step 9, before you even get to checking assumptions.

When you finally do check the assumptions, you may still have some surprises. But they will be the kind you can remedy, not the kind that forces you to start over.

Standard Non-Deviation: The Steps to Running Any Statistical Model
Get the road map for your data analysis before you begin. Learn how to make any statistical modeling – ANOVA, Linear Regression, Poisson Regression, Multilevel Model – straightforward and more efficient.

Tagged With: categorical outcome, Censored, Model Assumptions, testing normality, Zero Inflated

Related Posts

  • When Dependent Variables Are Not Fit for Linear Models, Now What?
  • Member Training: Types of Regression Models and When to Use Them
  • 6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption
  • Confusing Statistical Term #7: GLM

Reader Interactions

Comments

  1. tesfaw says

    May 3, 2017 at 4:32 am

    that is good explanation thank,s

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.