# outliers

### Member Training: Data Cleaning

June 1st, 2020 by

Data Cleaning is a critically important part of any data analysis. Without properly prepared data, the analysis will yield inaccurate results. Correcting errors later in the analysis adds to the time, effort, and cost of the project.

### Outliers and Their Origins

November 11th, 2016 by

Outliers are one of those realities of data analysis that no one can avoid.

Those pesky extreme values cause biased parameter estimates, non-normality in otherwise beautifully normal variables, and inflated variances.

Everyone agrees that outliers cause trouble with parametric analyses. But not everyone agrees that they’re always a problem, or what to do about them even if they are.

Sometimes a nonparametric or robust alternative is available — and sometimes not.

There are a number of approaches in statistical analysis for dealing with outliers and the problems they create. It’s common for committee members or Reviewer #2 to have very strong opinions that there is one and only one good approach.

Two approaches that I’ve commonly seen are: 1) delete outliers from the sample, or 2) winsorize them (i.e., replace the outlier value with one that is less extreme).

The problem with both of these “solutions” is that they also cause problems — biased parameter estimates and underweighted or eliminated valid values. (more…)

### Member Training: Working with Truncated and Censored Data

July 1st, 2016 by

Statistically speaking, when we see a continuous outcome variable we often worry about outliers and how these extreme observations can impact our model.

But have you ever had an outcome variable with no outliers because there was a boundary value at which accurate measurements couldn’t be or weren’t recorded?

Examples include:

• Income data where all values above \$100,000 are recorded as \$100k or greater
• Soil toxicity ratings where the device cannot measure values below 1 ppm
• Number of arrests where there are no zeros because the data set came from police records where all participants had at least one arrest

These are all examples of data that are truncated or censored.  Failing to incorporate the truncation or censoring will result in biased results.

This webinar will discuss what truncated and censored data are and how to identify them.

There are several different models that are used with this type of data. We will go over each model and discuss which type of data is appropriate for each model.

We will then compare the results of models that account for truncated or censored data to those that do not. From this you will see what possible impact the wrong model choice has on the results.

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

### Incorporating Graphs in Regression Diagnostics with Stata

May 24th, 2016 by

You put a lot of work into preparing and cleaning your data. Running the model is the moment of excitement.

You look at your tables and interpret the results. But first you remember that one or more variables had a few outliers. Did these outliers impact your results? (more…)

### Five things you need to know before learning Structural Equation Modeling

March 14th, 2016 by

By Manolo Romero Escobar

If you already know the principles of general linear modeling (GLM) you are on the right path to understand Structural Equation Modeling (SEM).

As you could see from my previous post, SEM offers the flexibility of adding paths between predictors in a way that would take you several GLM models and still leave you with unanswered questions.

It also helps you use latent variables (as you will see in future posts).

GLM is just one of the pieces of the puzzle to fit SEM to your data. You also need to have an understanding of:
(more…)

### Member Training: Outliers and Influential Points

September 1st, 2013 by

Outliers. There are as many opinions on what to do about them as there are causes for them.

In this webinar, we’ll explore the different types of outliers, methods for figuring out which type you have, whether they’re influential, and what to do about them.

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

Not a Member? Join!