• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • our programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • statistical resources
  • blog
  • about
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • contact
  • login

regression assumptions

The Assumptions of Linear Models: Explicit and Implicit

by Karen Grace-Martin  4 Comments

If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.Stage 2

I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.

1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.

2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, [Read more…] about The Assumptions of Linear Models: Explicit and Implicit

Tagged With: GLM, linear model, regression assumptions

Related Posts

  • Assumptions of Linear Models are about Residuals, not the Response Variable
  • Continuous and Categorical Variables: The Trouble with Median Splits
  • Member Training: The Link Between ANOVA and Regression
  • Why ANOVA is Really a Linear Regression, Despite the Difference in Notation

3 Mistakes Data Analysts Make in Testing Assumptions in GLM

by Karen Grace-Martin  Leave a Comment

I know you know it–those assumptions in your regression or ANOVA model really are important.  If they’re not met adequately, all your p-values are inaccurate, wrong, useless.

But, and this is a big one, linear models are robust to departures from those assumptions.  Meaning, they don’t have to fit exactly for p-values to be accurate, right, and useful.

You’ve probably heard both of these contradictory statements in stats classes and a million other places, and they are the kinds of statements that drive you crazy.  Right?

I mean, do statisticians make this stuff up just to torture researchers? Or just to keep you feeling stupid?

No, they really don’t.   (I promise!)  And learning how far you can push those robust assumptions isn’t so hard, with some training and a little practice.  Over the years, I’ve found a few mistakes researchers commonly make because of one, or both, of these statements:

1.  They worry too much about the assumptions and over-test them. There are some nice statistical tests to determine if your assumptions are met.  And it’s so nice having a p-value, right?  Then it’s clear what you’re supposed to do, based on that golden rule of p<.05.

The only problem is that many of these tests ignore that robustness.  They find that every distribution is non-normal and heteroskedastic.  They’re good tools, but  these hammers think every data set is a nail.  You want to use the hammer when needed, but don’t hammer everything.

2.They assume everything is robust anyway, so they don’t test anything. It’s easy to do.  And once again, it probably works out much of the time.  Except when it doesn’t.

Yes, the GLM is robust to deviations from some of the assumptions.  But not all the way, and not all the assumptions.  You do have to check them.

3. They test the wrong assumptions. Look at any two regression books and they’ll give you a different set of assumptions.

This is partially because many of these “assumptions”  need to be checked, but they’re not really model assumptions, they’re data issues.  And it’s also partially because sometimes the assumptions have been taken to their logical conclusions.  That textbook author is trying to make it more logical for you.  But sometimes that just leads you to testing the related, but wrong thing.  It works out most of the time, but not always.
Bookmark and Share

tn_assum_lmLearn more about each of the assumptions of linear models–regression and ANOVA–so they make sense–in our new On Demand workshop: Assumptions of Linear Models.

 

Tagged With: Heteroskedacticity, regression assumptions, testing assumptions, testing normality

Related Posts

  • When to Check Model Assumptions
  • The Assumptions of Linear Models: Explicit and Implicit
  • Member Training: The Link Between ANOVA and Regression
  • Can Likert Scale Data ever be Continuous?

Outliers: To Drop or Not to Drop

by Karen Grace-Martin  24 Comments

Should you drop outliers? Outliers are one of those statistical issues that everyone knows about, but most people aren’t sure how to deal with.  Most parametric statistics, like means, standard deviations, and correlations, and every statistic based on these, are highly sensitive to outliers.

And since the assumptions of common statistical procedures, like linear regression and ANOVA, are also based on these statistics, outliers can really mess up your analysis.

Despite all this, as much as you’d like to, it is NOT acceptable to drop an observation just because it is an outlier.  They can be legitimate observations and are sometimes the most interesting ones.  It’s important to investigate the nature of the outlier before deciding.

  1. If it is obvious that the outlier is due to incorrectly entered or measured data, you should drop the outlier:

    For example, I once analyzed a data set in which a woman’s weight was recorded as 19 lbs.  I knew that was physically impossible.  Her true weight was probably 91, 119, or 190 lbs, but since I didn’t know which one, I dropped the outlier.

    This also applies to a situation in which you know the datum did not accurately measure what you intended.  For example, if you are testing people’s reaction times to an event, but you saw that the participant is not paying attention and randomly hitting the response key, you know it is not an accurate measurement.

  2. If the outlier does not change the results but does affect assumptions, you may drop the outlier.  But note that in a footnote of your paper.

    Neither the presence nor absence of the outlier in the graph below would change the regression line:

    graph-1

  3. More commonly, the outlier affects both results and assumptions.  In this situation, it is not legitimate to simply drop the outlier.  You may run the analysis both with and without it, but you should state in at least a footnote the dropping of any such data points and how the results changed.

    graph-2

  4. If the outlier creates a strong association, you should drop the outlier and should not report any association from your analysis.

    In the following graph, the relationship between X and Y is clearly created by the outlier.  Without it, there is no relationship between X and Y, so the regression coefficient does not truly describe the effect of X on Y.

    graph-3

So in those cases where you shouldn’t drop the outlier, what do you do?

One option is to try a transformation.  Square root and log transformations both pull in high numbers.  This can make assumptions work better if the outlier is a dependent variable and can reduce the impact of a single point if the outlier is an independent variable.

Another option is to try a different model.  This should be done with caution, but it may be that a non-linear model fits better.  For example, in example 3, perhaps an exponential curve fits the data with the outlier intact.

Whichever approach you take, you need to know your data and your research area well.  Try different approaches, and see which make theoretical sense.

Tagged With: dropping outliers, outliers, regression assumptions, transformation

Related Posts

  • Best Practices for Data Preparation
  • Four Weeds of Data Analysis That are Easy to Get Lost In
  • What It Really Means to Remove an Interaction From a Model
  • Member Training: Data Cleaning

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: The Link Between ANOVA and Regression

Upcoming Workshops

    No Events

Upcoming Free Webinars

TBA

Quick links

Our Programs Statistical Resources Blog/News About Contact Log in

Contact

Upcoming

Free Webinars Membership Trainings Workshops

Privacy Policy

Search

Copyright © 2008–2023 The Analysis Factor, LLC.
All rights reserved.

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT