• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Two Recommended Solutions for Missing Data: Multiple Imputation and Maximum Likelihood

by Karen Grace-Martin 17 Comments

Two methods for dealing with missing data, vast improvements over traditional approaches, have become available in mainstream statistical software in the last few years.

Both of the methods discussed here require that the data are missing at random–not related to the missing values. If this assumption holds, resulting estimates (i.e., regression coefficients and standard errors) will be unbiased with no loss of power.

The first method is Multiple Imputation (MI). Just like the old-fashioned imputation methods, Multiple Imputation fills in estimates for the missing data.  But to capture the uncertainty in those estimates, MI estimates the values multiple times. Because it uses an imputation method with error built in, the multiple estimates should be similar, but not identical.

The result is multiple data sets with identical values for all of the non-missing values and slightly different values for the imputed values in each data set. The statistical analysis of interest, such as ANOVA or logistic regression, is performed separately on each data set, and the results are then combined. Because of the variation in the imputed values, there should also be variation in the parameter estimates, leading to appropriate estimates of standard errors and appropriate p-values.

Multiple Imputation is available in SAS, S-Plus, R, and now SPSS 17.0 (but you need the Missing Values Analysis add-on module).

The second method is to analyze the full, incomplete data set using maximum likelihood estimation. This method does not impute any data, but rather uses each cases available data to compute maximum likelihood estimates. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data.

When data are missing, we can factor the likelihood function. The likelihood is computed separately for those cases with complete data on some variables and those with complete data on all variables. These two likelihoods are then maximized together to find the estimates. Like multiple imputation, this method gives unbiased parameter estimates and standard errors. One advantage is that it does not require the careful selection of variables used to impute values that Multiple Imputation requires. It is, however, limited to linear models.

Analysis of the full, incomplete data set using maximum likelihood estimation is available in AMOS. AMOS is a structural equation modeling package, but it can run multiple linear regression models.  AMOS is easy to use and is now integrated into SPSS, but it will not produce residual plots, influence statistics, and other typical output from regression packages.

References:
Schafer, J. Software for Multiple Imputation
Hox, J.J. (1999) A Review of Current Software for Handling Missing Data, Kwantitatieve Methoden, 62, 123-138.
Allison, P. (2000). Multiple Imputation for Missing Data: A Cautionary Tale, Sociological Methods and Research, 28, 301-309.

Approaches to Missing Data: the Good, the Bad, and the Unthinkable
Learn the different methods for dealing with missing data and how they work in different missing data situations.

Tagged With: maximum likelihood, Missing Data, Multiple Imputation, R, SAS, SPSS

Related Posts

  • Multiple Imputation in a Nutshell
  • EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?
  • Statistical Software Access From Home
  • Member Training: What’s the Best Statistical Package for You?

Reader Interactions

Comments

  1. Lau says

    August 25, 2017 at 12:21 pm

    Hi Karen,

    I have the same problem as LF. I’m doing an Exploratory Factor Analysis and just 27 of all 198 participants completed every item. So I did a multiple imputation. But now I’m struggeling how to run the factor analysis.

    For any advice I would be very very thankful!!

    Reply
    • Karen Grace-Martin says

      September 11, 2018 at 1:24 pm

      Hi Lau,

      You don’t need Multiple Imputation for a factor analysis as factor analysis has no p-values, and that is where MI shines.

      I would do an EM estimate of the correlation matrix, then base the factor analysis on this matrix, rather than the raw data. This is something we show step-by-step in our Factor Analysis workshop, but it is a lot to explain. It’s easier in some software than others.

      Reply
  2. hosein says

    April 26, 2016 at 1:45 pm

    hello – i am working in mineral exploration field -Do Cohen likelihood maximum Method for censored (missing) data replacement use for Geochemical data Now?

    Reply
  3. Emily Stone says

    July 15, 2015 at 9:27 am

    Hello! I am doing Asymptotically distribution free estimation in AMOS due to a data set that is not normal and has ordinal data. I am trying to determine how to handle missing data with this type of estimation in AMOS. Can you do multiple imputation in AMOS? Thank you so much!

    Reply
    • Karen says

      July 18, 2015 at 9:54 am

      Hi Emily,

      AMOS doesn’t do multiple imputation, but you don’t need it to. It does maximum likelihood. You might find this helpful, though it’s not exactly what you’re doing:
      How to Use Full Information Maximum Likelihood in AMOS to Analyze Regression Models with Missing Data

      Reply
  4. LF says

    April 7, 2015 at 12:24 pm

    Thanks Karen. Any help to the above question about the difference in MPlus and AMOS is much appreciated.

    I am struggling with dealing with missing data and doing an Exploratory Factor Analysis with a complete dataset. I thought perhaps I could do Multiple Imputation in SPSS and do the EFA there but I don’t think it is one of the supported analyses for pooled data. Any suggestions how to use MI in an EFA in SPSS or do I have to switch to another software? Any help is much appreciated.

    Thank you.

    Reply
  5. LF says

    March 10, 2015 at 7:37 am

    Hello Karen,
    In AMOS, when you use ML estimation with missing data, it says that the full sample is used. I’ve recently tried using MPlus and when it runs there, it says it takes out those cases from the analysis that doesn’t have any data on those variables. If it’s the same estimation method for missing data between the two packages, then why would it come out different. Is AMOS doing the same just not telling us it’s based on part of the sample?

    Thank you.

    Reply
    • Karen says

      March 23, 2015 at 12:28 pm

      Hi LF,

      I don’t know MPlus, so I’m not sure what it is doing. AMOS isn’t dropping cases for having some missing data. I would suggest looking into the defaults in MPlus. Perhaps you just need to change an option.

      Any Mplus users want to chime in?

      Reply
  6. Patrick Onyeneho says

    January 18, 2015 at 3:19 am

    How do i implement the add on of missing data using the .ML method in spss

    Thank you

    Reply
  7. Michael says

    December 25, 2014 at 8:25 pm

    Thanks Karen for the R free resource website..
    Hi Peng, If you are looking for some case studies in R with real world proven examples you can try for some free classes at http://my-classes.com/
    there are practice tests also available to self assess your knowledge.

    Reply
  8. kaushal Chaudhary says

    March 12, 2014 at 2:14 pm

    Hi Karen,

    SAS also used ml (maximum likelihood) or reml (restricted maximum likelihood) method for parameter estimation. Does this mean it also impute missing values in the data? So, if there are missing observartions, we do not have to impute. Thanks for your clarification.

    Reply
    • Karen says

      April 4, 2014 at 9:56 am

      Hi Kaushal,

      ML isn’t imputing. But yes, you can use SAS proc calis for missing predictors in a linear model or proc mixed for missing outcome values in a multilevel model.

      Reply
  9. Dong says

    November 1, 2013 at 7:28 pm

    I am looking into how to run an MLE. Can SPSS 20 run an MLE in it’s easy-to-use pull-down menus or can this only be done via syntax? Thank you!

    Reply
    • Karen says

      November 8, 2013 at 11:38 am

      It can. It is based on the analysis, however. What kind of model are you looking for?

      Reply
  10. peng says

    January 30, 2010 at 1:49 am

    hi friends,
    I am new to R.I would like to know R-PLUS.Does any know where can I get the free training for R-PLUS.

    Regards,
    Peng.

    Reply
    • Karen says

      September 18, 2012 at 9:08 am

      Hi Peng,

      If you need free, I would suggest: http://www.ats.ucla.edu/stat/r/

      Karen

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

Free Webinars

Binary, Ordinal, and Multinomial Logistic Regression for Categorical Outcomes (Signup)

This Month’s Statistically Speaking Live Training

  • April Member Training: Statistical Contrasts

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT