Missing Data

I don’t need to tell you, missing data stinks. After getting stuck on a big problem with missing data many years ago, I started studying what to do about it in a big way.

Missing Data is one of those topics that spans multiple stages. You’re going to encounter it even with simple statistics. But to really deal with it well, you’re going to need to use methods at Stage 3.

The Craft of Statistical Analysis Free Webinars

Approaches to Missing Data: The Good, the Bad, and the Unthinkable

Answers to Questions from the Approaches to Missing Data Webinar

Do Top Journals Require Reporting on Missing Data Techniques?

What is the Difference between MAR and MCAR?

Is Multiple Imputation Possible in the Context of Survival Analysis?

These questions were originally asked in a live webinar. We didn’t get through all the questions, so I’m answering many of them in this series.

Articles at The Analysis Factor


Approaches to Dealing with Missing Data

Multiple Imputation: 5 Recent Findings that Change How to Use It

When Listwise Deletion works for Missing Data

Missing Data Mechanisms: A Primer

Quiz Yourself about Missing Data

Answers to the Missing Data Quiz

3 Ad-hoc Missing Data Approaches that You Should Never Use

Multiple Imputation of Categorical Variables

Missing Data: Criteria for Choosing an Effective Approach

EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?

Seven Ways to Make up Data: Common Methods to Imputing Missing Data

Mean Imputation

The Second Problem with Mean Imputation

Multiple Imputation Resources

Multiple Imputation in a Nutshell

Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood


Computing Cronbach’s Alpha in SPSS with Missing Data

New version released of Amelia II: A Program for Missing Data

Averaging and Adding Variables with Missing Data in SPSS

Missing Data in the Context of Data Analysis

The 13 Steps for Statistical Modeling in any Regression or ANOVA

Five Advantages of Running Repeated Measures ANOVA as a Mixed Model


Missing Data

by Paul Allison

Very reader-friendly. One of “the little green Sage books.” This is an excellent overview, covers
much of what a data analyst needs to know, and very accessible. This is
the book to start with. And
very reasonably priced.

Analysis of Incomplete Multivariate Data

by Joseph Schafer

This book is the basis
for Joe’s series of multiple imputation programs
in S-Plus. It is somewhat more readable than Little & Rubin (below).

Statistical Analysis with Missing Data, Second

by Roderick Little
& Donald Rubin

This is the Missing
Data Bible. It can get pretty technical at times, but can be worth
working through.

Journal Articles

  • Allison, P.D. (2000). Multiple Imputation for Missing Data: A Cautionary Tale. Sociological Methods and Research, 28, 301-309.
  • Allison, P.D. (1987) Estimation of linear models with incomplete data. In C. Clogg [Ed.] Sociological Methodology. San Francisco: Jossey Bass, 71-103.
  • Graham, J. W., & Hofer, S. M. (2000). Mulitple imputation in multivariate research. In T. D. Little, K. U. Schnabel, & J. Baumert, (Eds.), Modeling longitudinal and multiple-group data: Practical issues, applied approaches, and specific examples. Hillsdale, NJ: Erlbaum.This chapter is a very user-friendly description of the use of Joe Schafer’s NORM program, with an illustrative empirical example. (Also see Schafer & Olsen — below — for the same kind of information).
  • Graham, J. W., Hofer, S.M., Donaldson, S.I., MacKinnon, D.P., & Schafer, J.L. (1997). Analysis with missing data in prevention research. In K. Bryant, M. Windle, & S. West (Eds.), The science of prevention: methodological advances from alcohol and substance abuse research. (pp. 325-366). Washington, D.C.: American Psychological Association.In the context of an empirical example, this chapter discusses, and illustrates the pros and cons of four acceptable, and readily available methods: (a) raw data maximum likelihood with Amos; (b) multiple imputation with NORM; (c) multiple imputation with EMCOV; and (d) EM algorithm (with EMCOV) and bootstrap. We show how the following “old” methods fall very short of desiriable treatment of missing data (listwise deletion, pairwise deletion, mean substitution).
  • Graham, J.W. & Donaldson, S.I. (1993) Evaluating interventions with differential attrition: The importance of nonresponse mechanisms and use of follow-up data. Journal of Applied Psychology, 78, 119-128
  • Horton, N. J. & Lipsitz, S.R. (2001). Multiple Imputation in Practice: comparison of Software Packages for Regression Models with Missing Variables. The American Statistician, 55, 244-254.
  • Horton, N.J. & Kleinman, K.P. (2007). Much Ado about Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models. The American Statistician, 61, 79-90.
  • Muthén, B.O., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 52, 431-462
  • Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177.
  • Wothke, W. (2000) Longitudinal and multi-group modeling with missing data. (Adobe pdf format) In T.D. Little, K.U. Schnabel, and J. Baumert [Eds.] Modeling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates. (Reproduced with permission).
  • Schafer, J.L. & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. This is a very well-written overview of the new approaches to dealing with missing data. Joe Schaefer is one of the top statististicians doing research on Missing data techniques and John Graham runs the statistical consulting center at Penn State. Together they explain these new techniques in understandable ways.