Most Multiple Imputation methods assume multivariate normality, so a common question is how to impute missing values from categorical variables.
Paul Allison, one of my favorite authors of statistical information for researchers, did a study that showed that the most common method actually gives worse results that listwise deletion. (Did I mention I’ve used it myself?)
What is the bad method?
1. Dummy code the variable
2. Impute a continuous value. This will generally be between 0 and 1.
3. Round off to either 0 or 1, based on whether the imputed value is below or above .5.
As Allison discovered, this method generally leads to biased results, and incorrect standard errors.
What to do instead?
Allison compared this approach to four others, each of which generally gave more accurate results, at least under some conditions.
1. Listwise deletion
2. Imputation of the continuous variable without rounding (just leave off step 3).
3. Logistic Regression imputation
4. Discriminant Analysis imputation
These last two generally performed best, but only work in limited situations.
Access the full article here.