One important consideration in choosing a missing data approach is the missing data mechanism—different approaches have different assumptions about the mechanism.
Each of the three mechanisms describes one possible relationship between the propensity of data to be missing and values of the data, both missing and observed.
Missing Completely at Random, MCAR, means there is no relationship between (more…)
Both of the methods discussed here require that the data are missing at random–not related to the missing values. If this assumption holds, resulting estimates (i.e., regression coefficients and standard errors) will be unbiased with no loss of power.
Do you find quizzes irresistible? I do.
Here’s a little quiz about working with missing data:
1. Imputation is really just making up data to artificially inflate results. It’s better to just drop cases with missing data than to impute.
2. I can just impute the mean for any missing data. It won’t affect results, and improves power.
3. Multiple Imputation is fine for the predictor variables in a statistical model, but not for the response variable.
4. Multiple Imputation is always the best way to deal with missing data.
5. When imputing, it’s important that the imputations be plausible data points.
6. Missing data isn’t really a problem if I’m just doing simple statistics, like chi-squares and t-tests.
7. The worst thing that missing data does is lower sample size and reduce power.
In my last post, I gave a little quiz about missing data. This post has the answers.
If you want to try it yourself before you see the answers, go here. (It’s a short quiz, but if you’re like me, you find testing yourself irresistible).
In choosing an approach to missing data, there are a number of things to consider. But you need to keep in mind what you’re aiming for before you can even consider which approach to take.
There are three criteria we’re aiming for with any missing data technique:
1. Unbiased parameter estimates: Whether you’re estimating means, regressions, or odds ratios, you want your parameter estimates to be accurate representations of the actual population parameters. In statistical terms, that means the estimates should be unbiased. If all the (more…)