You’ve probably experienced this before. You’ve done a statistical analysis, you’ve figured out all the steps, you finally get results and are able to interpret them. But the statistical results just look…wrong. Backwards, or even impossible—theoretically or logically.
This happened a few times recently to a couple of my consulting clients, and once to me. So I know that feeling of panic well. There are so many possible causes of incorrect results, but there are a few steps you can take that will help you figure out which one you’ve got and how (and whether) to correct it.
Errors in Data Coding and Entry
In both of my clients’ cases, the problem was that they had coded missing data with an impossible and extreme value, like 99. But they failed to define that code as missing in SPSS. So SPSS took 99 as a real data point, which (more…)
I received received a question about controlling for inflated Type I error through Bonferroni corrections in nonparametric tests. Here’s the specific question and my quick answer:
My colleague is applying non parametric (Kruskal-Wallis) to check for differences between groups. There are 12 groups and test showed that there is significant difference in the groups. However, to check which pair is significant is tedious and I’m not sure if there is comparable post-hoc test in non-parametric approach. Any resources available in hands?
My answer:
Bonferroni correction is your only option when applying non-parametric statistics (that I’m aware of). Or, actually, any test other than ANOVA.
A Bonferroni correction is actually very simple. Just take the number of comparisons you want to make, then multiply each p-value by that number. If the calculated p-value is greater than 1, round to 1.0.
Missing Data, and multiple imputation specifically, is one area of statistics that is changing rapidly. Research is still ongoing, and each year new findings on best practices and new techniques in software appear.
The downside for researchers is that some of the recommendations missing data statisticians were making even five years ago have changed.
Remember that there are three goals of multiple imputation, or any missing data technique: Unbiased parameter estimates in the final analysis (more…)
One of the biggest questions I get is about the difference between mediators, moderators, and how they both differ from control variables.
I recently found a fabulous free video tutorial on the difference between mediators, moderators, and suppressor variables, by Jeremy Taylor at Stats Make Me Cry. The witty example is about the different types of variables–talent, practice, etc.–that explain the relationship between having a guitar and making lots of $$.
Have you ever needed to do some major data management in SPSS and ended up with a syntax program that’s pages long? This is the kind you couldn’t even do with the menus, because you’d tear your hair out with frustration because it took you four weeks to create some new variables.
I hope you’ve gotten started using Syntax, which not only gives you a record of how you’ve recoded and created all those new variables and exactly which options you chose in the data analysis you’ve done.
But once you get started, you start to realize that some things feel a little clunky. You have to run the same descriptive analysis on 47 different variables. And while cutting and pasting is a heck of a lot easier than doing that in the menus, you wonder if there isn’t a better way.
There is.
SPSS syntax actually has a number of ways to increase programming efficiency, including macros, do loops, repeats.
I admit I haven’t used this stuff a lot, but I’m increasingly seeing just how useful it can be. I’m much better trained in doing these kinds of things in SAS, so I admit I have been known to just import data into SAS to run manipulations.
But I just came across a great resources on doing sophisticated SPSS Syntax Programming, and it looks like some fabulous bedtime reading. (Seriously).
And the best part is you can download it (or order it, if you’d like a copy to take to bed) from the author’s website, Raynald’s SPSS Tools, itself a great source of info on mastering SPSS.
So once you’ve gotten into the habit of hitting Paste instead of Okay, and gotten a bit used to SPSS syntax, and you’re ready to step your skills up a notch, this looks like a fabulous book.
[Edit]: As per Jon Peck in the comments below, the most recent version is now available at www.ibm.com/developerworks/spssdevcentral under Books and Articles.
Want to learn more? If you’re just getting started with data analysis in SPSS, or would like a thorough refresher, please join us in our online workshop Introduction to Data Analysis in SPSS.
1. For a general overview of modeling count variables, you can get free access to the video recording of one of my The Craft of Statistical Analysis Webinars:
Poisson and Negative Binomial for Count Outcomes
2. One of my favorite books on Categorical Data Analysis is:
Long, J. Scott. (1997). Regression models for Categorical and Limited Dependent Variables. Sage Publications.
It’s moderately technical, but written with social science researchers in mind. It’s so well written, it’s worth it. It has a section specifically about Zero Inflated Poisson and Zero Inflated Negative Binomial regression models.
3. Slightly less technical, but most useful only if you use Stata is >Regression Models for Categorical Dependent Variables Using Stata, by J. Scott Long and Jeremy Freese.
4. UCLA’s ATS Statistical Software Consulting Group has some nice examples of Zero-Inflated Poisson and other models in various software packages.