I probably don’t need to tell you about what missing data does to your analysis.
If you have any experience with missing data, you know it really messes things up. The thing is, it’s not a data issue like skewness or non-normality that you can just ignore. It’s going to affect your analysis. Ignoring it still means choosing a way of dealing with missing data–but you’re using the default method.
Depending on which statistical software you’re using, and the patterns and percentage of missing data, the default may or may not be a perfectly acceptable way of dealing with the missing data.
But in data analysis, it’s always better if you understand the defaults, know what they’re doing, and decide for yourself if it’s the best approach.
And up until about 10 years ago, there weren’t many other options. There was listwise deletion and there was imputation. But many of the imputation methods were pretty sketchy. So it was a “damned if you do, damned if you don’t” kind of situation.
But it’s different now.
In August 1999, just a month after I started at the Statistical Consulting office at Cornell, I saw a talk by Joe Schaefer at the Joint Statistical Meetings about multiple imputation. I was blown away. It seemed too good to be true–it solved pretty much all of the problems with missing data.
So I read all that I could, attended a week-long mini-class, and tried it all out.
It turns out at that time, you had to use special stand-alone software to implement it, and all the ones I tried were a bit clunky to use.
Luckily, statistical software has caught up. And in that time, a few new studies have shown that some of the restrictive assumptions of multiple imputation aren’t as restrictive as they at first seemed. So it’s easier and more accurate than ever.
It’s also become clear that some of those old methods aren’t always as horrible as they seemed–there are some situations when listwise deletion works just fine.
But it pays to know the difference, and how to implement not just multiple imputation, but maximum likelihood approaches, which also give great outcomes and are a bit easier to use.
So I am once again offering an online workshop on missing data: Effectively Dealing with Missing Data Without Biasing Your Results. It includes 8 hours of instruction, 5 hours of Q&A, and we’ll go through all the approaches for dealing with missing data in detail:
- what they are
- the advantage and disadvantages of each
- how to implement them in various statistical software
- the data and analysis situations when it’s best to each one
- how to figure out which situations you have
If you have any questions, feel free to contact me. You can get more details and register here.




