I don’t need to tell you, missing data stinks. After getting stuck on a big problem with missing data many years ago, I started studying what to do about it in a big way.
Answers to Questions from the Missing Data Webinar
These questions were originally asked in a live webinar. We didn’t get through all the questions, so I’m answering many of them in this series. If you want to listen to the full webinar, you can get the recording on this page. It’s free.
Approaches to Dealing with Missing Data
Missing Data in the Context of Data Analysis
by Paul Allison
by Joseph Schafer
This book is the basis
by Roderick Little
This is the Missing
- Allison, P.D. (2000). Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28, 301-309.
- Allison, P.D. (1987) Estimation of linear models with incomplete data. In C. Clogg [Ed.] Sociological Methodology. San Francisco: Jossey Bass, 71-103.
- Graham, J. W., & Hofer, S. M. (2000). Mulitple imputation in multivariate research. In T. D. Little, K. U. Schnabel, & J. Baumert, (Eds.), Modeling longitudinal and multiple-group data: Practical issues, applied approaches, and specific examples. Hillsdale, NJ: Erlbaum.This chapter is a very user-friendly description of the use of Joe Schafer’s NORM program, with an illustrative empirical example. (Also see Schafer & Olsen — below — for the same kind of information).
- Graham, J. W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997). Analysis with missing data in prevention research. In K. Bryant, M. Windle, & S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. (pp. 325-366). Washington, D.C.: American Psychological Association.In the context of an empirical example, this chapter discusses, and illustrates the pros and cons of four acceptable, and readily available methods: (a) raw data maximum likelihood with Amos; (b) multiple imputation with NORM; (c) multiple imputation with EMCOV; and (d) EM algorithm (with EMCOV) and bootstrap. We show how the following “old” methods fall very short of desiriable treatment of missing data (listwise deletion, pairwise deletion, mean substitution).
- Graham, J.W. & Donaldson, S.I. (1993) Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of follow-up data. Journal of Applied Psychology, 78, 119-128
- Horton, N. J. & Lipsitz, S.R. (2001). Multiple Imputation in Practice: comparison of Software Packages for Regression Models with Missing Variables. The American Statistician, 55, 244-254.
- Horton, N.J. & Kleinman, K.P. (2007). Much Ado about Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models. The American Statistician, 61, 79-90.
- Muthén, B.O., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431-462
- Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177.
- Wothke, W. (2000) Longitudinal and multi-group modeling with missing data. (Adobe pdf format) In T.D. Little, K.U. Schnabel, and J. Baumert [Eds.] Modeling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates. (Reproduced with permission).
- Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. This is a very well-written overview of the new approaches to dealing with missing data. Joe Schaefer is one of the top statististicians doing research on Missing data techniques and John Graham runs the statistical consulting center at Penn State. Together they explain these new techniques in understandable ways.