The Second Problem with Mean Imputation

by Karen Grace-Martin

A previous post discussed the first reason to not use mean imputation as a way of dealing with missing data–it does not preserve the relationships among variables.

A second reason is that any type of single imputation underestimates error variation in any statistic that used the imputed data.  Because the imputations are themselves estimates, there is some error associated with them.  But your statistical software doesn’t know that.  It treats it as real data.

Ultimately, because your standard errors are too low, so are your p-values.  Now you’re making Type I errors without realizing it.

A better approach?  Mulitple Imputation or Full Information Maximum Likelihood.

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: