Do you find quizzes irresistible? I do.
Here’s a little quiz about working with missing data:
True or False?
1. Imputation is really just making up data to artificially inflate results. It’s better to just drop cases with missing data than to impute.
2. I can just impute the mean for any missing data. It won’t affect results, and improves power.
3. Multiple Imputation is fine for the predictor variables in a statistical model, but not for the response variable.
4. Multiple Imputation is always the best way to deal with missing data.
5. When imputing, it’s important that the imputations be plausible data points.
6. Missing data isn’t really a problem if I’m just doing simple statistics, like chi-squares and t-tests.
7. The worst thing that missing data does is lower sample size and reduce power.
They’re all false.
(I’ll post the reasons in the next post).
These are some of the misconceptions among researchers I’ve come across over the years about missing data.
Anthony Brown says
I find this issue extremely important, and it is ridiculous that instructors in some basic and intermediate statistics courses do not stress the importance of cleaning the data or new methods of dealing with missing data. Prior knowledge of data preparation is essential to the overall generalization of one’s analysis, and measurement results.
For example, a fellow dissertation colleague is purposing to conduct a multiple regression analysis and refuse to consider preparing her data. She seems to think that because the scale she is using does not go into specifics such as data preparation it is not needed. She simply will collect data and apply the scale thinking her parametric analysis will provide her with substantive results. Perhaps so, but will the result be valid? Perhaps, I doubt it /Not…
Ironically, when I mentioned your training, and my interest to complete stats III she insisted that she is not interested in being a statistician. However, now that she is being told to dig deeper with her analysis, and that she is confronted with terms she does not understand. She calls me, and repeatedly I ask her about how she is going to clean her data – let alone deal with missing data points.
Her response is I am not at that point or level and when I get there I will research that stuff; well she thinks I will be there for her.
Missing Data Imputation is an essential tool, but if one does not understand or want to accept that this tool is of no use if they do not understand why it is more effective than simply using mean replacement, or other data deletion methods. It makes more sense when one knows how to clean the data manually, thus the mechanics of this tool can be easier to grasp.
Knowing the type of missingness, and the acceptable missing limits is important.
Do not get me started, because as you know I will begin not to make sense, because my mind moves faster than I type. That I will send before reading and the rest is; well, is History against MY-story
Hello Karen, how are you doing?
Anthony H Brown
Thanks for the impassioned response. Hopefully, your colleague will find out sooner or later that she does need solid statistical training even if she doesn’t want to be a statistician. And hopefully she won’t be up against an important deadline when it happens. 🙂
You’re right about data cleaning, of course, even beyond missing data. The integrity of your data is absolutely vital.
I just started to write a very long response about a personal experience I had with data integrity, but I think it’s actually a whole post. Lol.
But I’m glad that you get it. You will be well served.