Just yesterday I got a call from a researcher who was reviewing a paper. She didn’t think the authors had run their model correctly, but wanted to make sure. The authors had run the same logistic regression model separately for each sex because they expected that the effects of the predictors were different for men and women.
On the surface, there is nothing wrong with this approach. It’s completely legitimate to consider men and women as two separate populations and to model each one separately.
As often happens, the problem was not in the statistics, but what they were trying to conclude from them. The authors went on to compare the two models, and specifically compare the coefficients for the same predictors across the two models.
Uh-oh. Can’t do that.
If you’re just describing the values of the coefficients, fine. But if you want to compare the coefficients AND draw conclusions about their differences, you need a p-value for the difference.
Luckily, this is easy to get. Simply include an interaction term between Sex (male/female) and any predictor whose coefficient you want to compare. If you want to compare all of them because you believe that all predictors have different effects for men and women, then include an interaction term between sex and each predictor. If you have 6 predictors, that means 6 interaction terms.
In such a model, if Sex is a dummy variable (and it should be), two things happen:
1.the coefficient for each predictor becomes the coefficient for that variable ONLY for the reference group.
2. the interaction term between sex and each predictor represents the DIFFERENCE in the coefficients between the reference group and the comparison group. If you want to know the coefficient for the comparison group, you have to add the coefficients for the predictor alone and that predictor’s interaction with Sex.
The beauty of this approach is that the p-value for each interaction term gives you a significance test for the difference in those coefficients.
In many research fields, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits. This is a way of splitting the sample into two categories: the “high” values above the median and the “low” values below the median.
Reasons Not to Categorize a Continuous Predictor
There are many reasons why this isn’t such a good idea: (more…)
My 8 year-old son got a Rubik’s cube in his Christmas stocking this year.
I had gotten one as a birthday present when I was about 10. It was at the height of the craze and I was so excited.
I distinctly remember bursting into tears when I discovered that my little sister sneaked playing with it, and messed it up the day I got it. I knew I would mess it up to an unsolvable point soon myself, but I was still relishing the fun of creating patterns in the 9 squares, then getting it back to 6 sides of single-colored perfection. (I loved patterns even then). (more…)
A new version of Amelia II, a free package for multiple imputation, has just been released today. Amelia II is available in two versions. One is part of R, and the other, AmeliaView, is a GUI package that does not require any knowledge of the R programming language. They both use the same underlying algorithms and both require having R installed.
At the Amelia II website, you can download Amelia II (did I mention it’s free?!), download R, get the very useful User’s Guide, join the Amelia listserve, and get information about multiple imputation.
If you want to learn more about multiple imputation:
I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.
But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be (more…)
Spending the summer writing a research grant proposal? Stuck on how to write up the statistics section?
An excellent handbook that outlines how to prepare the statistical content for grant proposals is “Statistics Guide for Research Grant Applicants.” Sections include “Describing the Study Design”, “Sample Size Calculations”, and “Describing the Statistical Methods,” among others.
The navigation for the guide is not obvious–it is in the left margin menu, among other menus, toward the bottom. You have to scroll down from the top of the page to see it.
The authors, JM Bland, BK Butland, JL Peacock, J Poloniecki, F Reid, P Sedgwick, are statisticians at St. George’s Hospital Medical School, London.