In choosing an approach to missing data, there are a number of things to consider. But you need to keep in mind what you’re aiming for before you can even consider which approach to take.
There are three criteria we’re aiming for with any missing data technique:
1. Unbiased parameter estimates: Whether you’re estimating means, regressions, or odds ratios, you want your parameter estimates to be accurate representations of the actual population parameters. In statistical terms, that means the estimates should be unbiased. If all the assumptions of your statistical test are met, the sample is randomly selected, and no data were missing, you can be confident that estimates are unbiased. But missing data (and many of the solutions to solve it) can mess with that nice property.
2. Adequate power: Case deletion–dropping cases with missing data–lowers sample size, and therefore lowers power. At least theoretically. However, if you don’t have a problem with bias, and you are getting significant results, you have adequate power to detect your effects. End of story.
3. Accurate standard errors, and therefore p-values and confidence intervals: in the world of statistical inference, not just description, we need not only the parameter estimate to be accurate, but the standard errors of those estimates as well. Many approaches to missing data, such as single imputation of any type, underestimates standard errors. This means p-values are too small, confidence intervals too narrow, and you, the researcher, making claims that really aren’t there.
Modern approaches like Mulitple Imputation and Full Information Maximum Likelihood meet all three criteria for many missing data problems. But simpler techniques can adequately meet them as well in specific situations.