Is Multiple Imputation Possible in the Context of Survival Analysis?

May 27th, 2011 by

Sure.  One of the big advantages of multiple imputation is that you can use it for any analysis.

It’s one of the reasons big data libraries use it–no matter how researchers are using the data, the missing data is handled the same, and handled well.

I say this with two caveats. (more…)

The 3 Stages of Mastering Statistical Analysis

October 14th, 2009 by

Like any applied skill, mastering statistical analysis requires:

1. building a body of knowledge

2. adeptness of the tools of the trade (aka software package)

3. practice applying the knowledge and using the tools in a realistic, meaningful context.

If you think of other high-level skills you’ve mastered in your life–teaching, survey design, programming, sailing, landscaping, anything–you’ll realize the same three requirements apply.

These three requirements need to be developed over time–over many years to attain mastery. And they need to be developed together. Having more background knowledge improves understanding of how the tools work, and helps the practice go better. Likewise, practice in a real context (not perfect textbook examples) makes the knowledge make more sense, and improves skills with the tools.

I don’t know if this is true of other applied skills, but from what I’ve seen over many years of working with researchers as they master statistical analysis, the journey seems to have 3 stages. Within each stage, developing all 3 requirements–knowledge, tools, and experience–to a level of mastery sets you up well for the next stage. (more…)

The Second Problem with Mean Imputation

October 2nd, 2008 by

A previous post discussed the first reason to not use mean imputation as a way of dealing with missing data–it does not preserve the relationships among variables.

A second reason is that any type of single imputation underestimates error variation in any statistic that used the imputed data.  Because the imputations are themselves estimates, there is some error associated with them.  But your statistical software doesn’t know that.  It treats it as real data.

Ultimately, because your standard errors are too low, so are your p-values.  Now you’re making Type I errors without realizing it.

A better approach?  Mulitple Imputation or Full Information Maximum Likelihood.

Multiple Imputation Resources

September 15th, 2008 by

Two excellent resources about multiple imputation and missing data:

Joe Schafer’s Multiple Imputation FAQ Page gives more detail about multiple imputation, including a list of references.

Paul Allison’s 2001 book Missing Data is the most readable book on the topic. It gives in-depth information on many good approaches to missing data, including multiple imputation. It is aimed at social science researchers, and best of all, it is very affordable (about $15).


Power and Sample Size Calculations

September 2nd, 2008 by

The best article I’ve read about how to calculate power and sample sizes is Russell V. Lenth’s “Some Practical Guidelines for Effective Sample Size Determination” in The American Statistician (full reference  below).  It is written for statistical consultants who assist researchers who need to make sample size estimates, so it is just a bit on the technical side.  Since it is so well written, however, I recommend that clients read it as well.  Whether you are working with a statistician who will do the calculations for you, or you are doing them yourself, the article details the information you need to gather and what you need to understand about the calculations.

Researchers who are at a university that subscribes to JSTOR can read the article for free here (for all others it is $14).

Dr. Lenth’s Power and Sample-size web page has an applet that calculates power and sample size, more information about calculating power and sample sizes, and a version of the paper:

Lenth, R. V. (2001), “Some Practical Guidelines for Effective Sample Size Determination,” The American Statistician, 55, 187-193.

This recommendation is listed in the Statistically Speaking membership program Resource Library.