You’ve probably experienced this before. You’ve done a statistical analysis, you’ve figured out all the steps, you finally get results and are able to interpret them. But they just look…wrong. Backwards, or even impossible—theoretically or logically.
This happened a few times recently to a couple of my consulting clients, and once to me. So I know that feeling of panic well. There are so many possible causes of incorrect results, but there are a few steps you can take that will help you figure out which one you’ve got and how (and whether) to correct it.
Errors in Data Coding and Entry
In both of my clients’ cases, the problem was that they had coded missing data with an impossible and extreme value, like 99. But they failed to define that code as missing in SPSS. So SPSS took 99 as a real data point, which (more…)
Just yesterday I got a call from a researcher who was reviewing a paper. She didn’t think the authors had run their model correctly, but wanted to make sure. The authors had run the same logistic regression model separately for each sex because they expected that the effects of the predictors were different for men and women.
On the surface, there is nothing wrong with this approach. It’s completely legitimate to consider men and women as two separate populations and to model each one separately.
As often happens, the problem was not in the statistics, but what they were trying to conclude from them. The authors went on to compare the two models, and specifically compare the coefficients for the same predictors across the two models.
Uh-oh. Can’t do that.
If you’re just describing the values of the coefficients, fine. But if you want to compare the coefficients AND draw conclusions about their differences, you need a p-value for the difference.
Luckily, this is easy to get. Simply include an interaction term between Sex (male/female) and any predictor whose coefficient you want to compare. If you want to compare all of them because you believe that all predictors have different effects for men and women, then include an interaction term between sex and each predictor. If you have 6 predictors, that means 6 interaction terms.
In such a model, if Sex is a dummy variable (and it should be), two things happen:
1.the coefficient for each predictor becomes the coefficient for that variable ONLY for the reference group.
2. the interaction term between sex and each predictor represents the DIFFERENCE in the coefficients between the reference group and the comparison group. If you want to know the coefficient for the comparison group, you have to add the coefficients for the predictor alone and that predictor’s interaction with Sex.
The beauty of this approach is that the p-value for each interaction term gives you a significance test for the difference in those coefficients.
The way to follow up on a significant two-way interaction between two categorical variables is to check the simple effects. Most of the time the simple effects tests give a very clear picture about the interaction. Every so often, however, you have a significant interaction, but no significant simple effects. It is not a logical impossibility. They are testing two different, but related hypotheses.
Assume your two independent variables are A and B. Each has two values: 1 and 2. The interaction is testing if A1 – B1 = A2 – B2 (the null hypothesis). The simple effects are testing whether A1-B1=0 and A2-B2=0 (null) or not.
If you have a crossover interaction, you can have A1-B1 slightly positive and A2-B2 slightly negative. While neither is significantly different from 0, they are significantly different from each other.
And it is highly useful for answering many research questions to know if the differences in the means in one condition equal the differences in the means for the other. It might be true that it’s not testing a hypothesis you’re interested in, but in many studies, all the interesting effects are in the interactions.
A Linear Regression Model with an interaction between two predictors (X1 and X2) has the form:
Y = B0 + B1X1 + B2X2 + B3X1*X2.
It doesn’t really matter if X1 and X2 are categorical or continuous, but let’s assume they are continuous for simplicity.
One important concept is that B1 and B2 are not main effects, the way they would be if (more…)
There are two reasons to center predictor variables in any type of regression analysis–linear, logistic, multilevel, etc.
1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables (the ones that were multiplied).
2. To make interpretation of parameter estimates easier.
I was recently asked when is centering NOT a good idea? (more…)
It seems very many researchers are needing to learn multilevel and mixed models, and I have to say, it’s not so easy on your own.
I too went to graduate school before it was taught in classes–we did learn mixed models as in Split Plot designs, but things have progressed a bit since then. So I too have had to learn them without benefit of a class, or teacher. So I feel your pain. But I’ve struggled through and learned a (more…)