The Analysis Factor

A Note from Karen

Featured Article: Five Recent Findings that Change How to Use Multiple Imputation

Resource of the Month

What's New

About Us

Our Website

More About Us

You received this email because you subscribed to The Analysis Factor's list community. To change your subscription, see the link at end of this email. If your email is having trouble with the format, click here for a web version.

Please forward this to anyone you know who might benefit. If you received this from a friend, sign up for this ezine now!

Karen Grace-Martin Dear %$firstname$%,

Angee, our tech guru, is out of town at a training event this week, leaving me to put this together. So if any formatting is funky or you just can't read anything, let me suggest you read this on our web site.

We just finished up the Interpreting (Even Tricky) Regression Coefficients workshop and it was fun. We had a great group of researchers who asked really good questions. It was our biggest group yet, and it really added a dynamic energy to the workshop.

And we're already starting to plan for the next workshop, which begins the first week in May. It will be on missing data, and is a pretty hands-on workshop about how to implement various techniques in a variety of statistical software, and how to decide which one fits your situation next.

This month's article will update you on some recent findings about using multiple imputation, one of those missing data techniques. Some of these contradict what missing data statisticians were advising even 5 years ago. I know I've done things a few years ago the way they're now advising against. So hopefully you find the update helpful.

Happy analyzing,
Karen

Missing Data, and multiple imputation specifically, is one area of statistics that is changing rapidly. Research is still ongoing, and each year new findings on best practices and new techniques in software appear.

The downside for researchers is that some of the recommendations missing data statisticians were making even five years ago have changed.

Remember that there are three goals of multiple imputation, or any missing data technique: Unbiased parameter estimates in the final analysis (regression coefficients, group means, odds ratios, etc.); accurate standard errors of those parameter estimates, and therefore, accurate p-values in the analysis; and adequate power to find meaningful parameter values significant.

So here are a few updates that will help you achieve these goals.

1. Don’t round off imputations for dummy variables. Many common imputation techniques, like MCMC, require normally distributed variables. Suggestions for imputing categorical variables were to dummy code them, impute them, then round off imputed values to 0 or 1. Recent research, however, has found that rounding off imputed values actually leads to biased parameter estimates in the analysis model. You actually get better results by leaving the imputed values at impossible values, even though it’s counter-intuitive.

2. Don’t transform skewed variables. Likewise, when you transform a variable to meet normality assumptinos before imputing, you not only are changing the distribution of that variable but the relationship between that variable and the others you use to impute. Doing so can lead to imputing outliers, creating more bias than just imputing the skewed variable.

3. Use more imputations. The advice for years has been that 5-10 imputations are adequate. And while this is true for unbiasedness, you can get inconsistent results if you run the multiple imputation more than once. Bodner (2008) recommends having as many imputations as the percentage of missing data. Since running more imputations isn’t any more work for the data analyst, there’s no reason not to.

4. Create multiplicative terms before imputing. When the analysis model contains a multiplicative term, like an interaction term or a quadratic, create the multiplicative terms first, then impute. Imputing first, and then creating the multiplicative terms actually biases the regression parameters of the multiplicative term (von Hippel, 2009).

5. Alternatives to multiple imputation aren’t usually better. Multiple imputation assumes the data are missing at random. In most tests, if an assumption is not met, there are better alternatives—a nonparametric test or an alternative type of model. This is often not true with missing data. Alternatives like listwise deletion (a.k.a. ignoring it) have more stringent assumptions. So do nonignorable missing data techniques like Heckman’s selection models.

References:

Allison, Paul D. 2005. “Imputation of Categorical Variables with PROC MI,” Presented at
the 30th Meeting of SAS Users Group International, April 10–13, Philadephia, PA.

Bodner, T. E. 2008. “What Improves with Increased Missing Data Imputations?”
Structural Equation Modeling 15(4):651–75.

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576.

von Hippel, P.T. (2009). "How To Impute Squares, Interactions, and Other Transformed Variables." Sociological Methodology 39.

Did you miss our webinar on missing data? It's a nice overview of the four basic approaches to missing data (and yes, ignoring it is one of them) and the things you need to know about why and where your data are missing to choose the best one. You can get a free video download at:

Approaches to Missing Data: The Good, the Bad, and the Unthinkable

The next Craft of Statistical Analysis Webinar:

Principal Components Analysis: This webinar will summarize what it is, when to use it, how it differs from Factor Analysis, and briefly demonstrate the 5 steps to conducting a Principal Component Analysis.

Get more information and register here.

The next Workshop:

Effectively Dealing with Missing Data without Biasing your Results: This six-hour workshop will explain modern techniques of multiple imputation and maximum likelihood--how they work, when to use each, what their assumptions are, how to decide when to use each one, and most importantly--how to do each step in statistical software.

Get more information and register here.

What is The Analysis Factor? The Analysis Factor is the difference between knowing about statistics and knowing how to use statistics. It acknowledges that statistical analysis is an applied skill. It requires learning how to use statistical tools within the context of a researcher’s own data, and supports that learning.

The Analysis Factor, the organization, offers statistical consulting, resources, and learning programs that empower researchers to become confident, able, and skilled statistical practitioners. Our aim is to make your journey acquiring the applied skills of statistical analysis easier and more pleasant.

Karen Grace-Martin, the founder, spent seven years as a statistical consultant at Cornell University. While there, she learned that being a great statistical advisor is not only about having excellent statistical skills, but about understanding the pressures and issues researchers face, about fabulous customer service, and about communicating technical ideas at a level each client understands.

You can learn more about Karen Grace-Martin and The Analysis Factor at analysisfactor.com.

Please forward this newsletter to colleagues who you think would find it useful. Your recommendation is how we grow.

If you received this email from a friend or colleague, click here to subscribe to this newsletter.

Need to change your email address? See below for details.

No longer wish to receive this newsletter? See below to cancel.