A Reason to Not Drop Outliers

I recently had this question in consulting:

I’ve got 12 out of 645 cases with Mahalanobis’s Distances above the critical value, so I removed them and reran the analysis, only to find that another 10 cases were now outside the value. I removed these, and another 10 appeared, and so on until I have removed over 100 cases from my analysis! Surely this can’t be right!?! Do you know any way around this? It is really slowing down my analysis and I have no idea how to sort this out!!

And this was my response:

I wrote an article about dropping outliers.  As you’ll see, you can’t just drop outliers without a REALLY good reason.  Being influential is not in itself a good enough reason to drop data.

 

The Pathway: Steps for Staying Out of the Weeds in Any Data Analysis
Get the road map for your data analysis before you begin. Learn how to make any statistical modeling – ANOVA, Linear Regression, Poisson Regression, Multilevel Model – straightforward and more efficient.

Reader Interactions

Comments

  1. Craig slinkman says

    There is another cause if outliers. The data point may be an indication that the model is insufficient and that something spins seriously wrong. Examples of this are a missing predictor variable or an incorrect functional form.

    Therefore be careful about dipping outliers from you data set. I am retired and no longer have my books but I suggest you look at Sanford Weisberg’s Applied Linear Regression.

  2. soma says

    hey! thanks for sharing this!
    i have a question and unfortunately i couldn’t find my answer.i hope some one can help here.
    i know how to remove outliers.but what dint understand is ,should i only remove it according to my dependent variable vector?or i should do it for other vectors too?for example if i want to estimate salary according to age and education ,…
    should i remove records which they are outlier in age vector?

  3. Meenu says

    Hi Karen, the newsletter and the dropping outliers link do not seem to be available. Is there a way I can get the answer to the question posted above?

    Thanks
    Meenu


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.