I don’t need to tell you, missing data stinks. After getting stuck on a big problem with missing data many years ago, I started studying what to do about it in a big way.
Missing Data is one of those topics that spans multiple stages. You’re going to encounter it even with simple statistics. But to really deal with it well, you’re going to need to use methods at Stage 3.
The Craft of Statistical Analysis Free Webinars
Answers to Questions from the Approaches to Missing Data Webinar
- Do Top Journals Require Reporting on Missing Data Techniques?
- What is the Difference between MAR and MCAR?
- Is Multiple Imputation Possible in the Context of Survival Analysis?
These questions were originally asked in a live webinar. We didn’t get through all the questions, so I’m answering many of them in this series.
Articles at The Analysis Factor
Approaches to Dealing with Missing Data
- Multiple Imputation: 5 Recent Findings that Change How to Use It
- When Listwise Deletion works for Missing Data
- Missing Data Mechanisms: A Primer
- Quiz Yourself about Missing Data
- Answers to the Missing Data Quiz
- 3 Ad-hoc Missing Data Approaches that You Should Never Use
- Multiple Imputation of Categorical Variables
- Missing Data: Criteria for Choosing an Effective Approach
- EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?
- Seven Ways to Make up Data: Common Methods to Imputing Missing Data
- Mean Imputation
- The Second Problem with Mean Imputation
- Multiple Imputation Resources
- Multiple Imputation in a Nutshell
- Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood
- Computing Cronbach’s Alpha in SPSS with Missing Data
- New version released of Amelia II: A Program for Missing Data
- Averaging and Adding Variables with Missing Data in SPSS
Missing Data in the Context of Data Analysis
- The 13 Steps for Statistical Modeling in any Regression or ANOVA
- Five Advantages of Running Repeated Measures ANOVA as a Mixed Model
by Paul Allison
Very reader-friendly. One of “the little green Sage books.” This is an excellent overview, covers much of what a data analyst needs to know, and very accessible. This is the book to start with. And very reasonably priced.
|Analysis of Incomplete Multivariate Data
by Joseph Schafer
This book is the basis for Joe’s series of multiple imputation programs in S-Plus. It is somewhat more readable than Little & Rubin (below).
|Statistical Analysis with Missing Data, Second Edition
by Roderick Little & Donald Rubin
This is the Missing Data Bible. It can get pretty technical at times, but can be worth working through.
- Allison, P.D. (2000). Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28, 301-309.
- Allison, P.D. (1987) Estimation of linear models with incomplete data. In C. Clogg [Ed.] Sociological Methodology. San Francisco: Jossey Bass, 71-103.
- Graham, J. W., & Hofer, S. M. (2000). Mulitple imputation in multivariate research. In T. D. Little, K. U. Schnabel, & J. Baumert, (Eds.), Modeling longitudinal and multiple-group data: Practical issues, applied approaches, and specific examples. Hillsdale, NJ: Erlbaum.This chapter is a very user-friendly description of the use of Joe Schafer’s NORM program, with an illustrative empirical example. (Also see Schafer & Olsen — below — for the same kind of information).
- Graham, J. W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997). Analysis with missing data in prevention research. In K. Bryant, M. Windle, & S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. (pp. 325-366). Washington, D.C.: American Psychological Association.In the context of an empirical example, this chapter discusses, and illustrates the pros and cons of four acceptable, and readily available methods: (a) raw data maximum likelihood with Amos; (b) multiple imputation with NORM; (c) multiple imputation with EMCOV; and (d) EM algorithm (with EMCOV) and bootstrap. We show how the following “old” methods fall very short of desiriable treatment of missing data (listwise deletion, pairwise deletion, mean substitution).
- Graham, J.W. & Donaldson, S.I. (1993) Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of follow-up data. Journal of Applied Psychology, 78, 119-128
- Horton, N. J. & Lipsitz, S.R. (2001). Multiple Imputation in Practice: comparison of Software Packages for Regression Models with Missing Variables. The American Statistician, 55, 244-254.
- Horton, N.J. & Kleinman, K.P. (2007). Much Ado about Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models. The American Statistician, 61, 79-90.
- Muthén, B.O., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431-462
- Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177.
- Wothke, W. (2000) Longitudinal and multi-group modeling with missing data. (Adobe pdf format) In T.D. Little, K.U. Schnabel, and J. Baumert [Eds.] Modeling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates. (Reproduced with permission).
- Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. This is a very well-written overview of the new approaches to dealing with missing data. Joe Schaefer is one of the top statististicians doing research on Missing data techniques and John Graham runs the statistical consulting center at Penn State. Together they explain these new techniques in understandable ways.