Last week I had the pleasure of teaching a webinar on Interpreting Regression Coefficients. We walked through the output of a somewhat tricky regression model—it included two dummy-coded categorical variables, a covariate, and a few interactions.
As always seems to happen, our audience asked an amazing number of great questions. (Seriously, I’ve had multiple guest instructors compliment me on our audience and their thoughtful questions.)
We had so many that although I spent about 40 minutes answering questions, we still only got through half! Since my voice was starting to go out at that point, I announced I would follow up here to answer the unanswered questions.
If you were not on the webinar, these will make a lot more sense if you watch the recording first and grab the slides handout. Watch Interpreting Linear Regression Coefficients: A Walk through Output here.
I will sometimes refer you to a slide number in the answer.
Q1: Am getting confused. Is 0 female now or is 1 female?!
A number of people asked about this. And it gets very confusing because of the way SPSS reports it.
In the original variable, Male=0 and Female=1. (Slide 7)
When SPSS’s General Linear Model procedure dummy codes this variable (which it does because I specified it as categorical), it automatically makes the value that comes last alphabetically of my Gender_N variable equal to 0.
So when we look at the regression output table (Slide 14), you can see that it calls the variable “Gender_N=0”. That is a different variable than Gender_N. This new variable, Gender_N=0 has a value of 1 for Males. When the output gives (Gender_N=0) a coefficient, we know that it used a different internal coding of the Gender_N=0 variable.
So when I interpret the Gender_N=0 variable, I interpret that as Males, compared to Females.
It is very confusing until you get used to it, but it’s worthwhile to pay attention to what your software is doing. In my experience, not every procedure does it the same way, even within the same software.
Q2: Are all the coefficients included on the left statistically significant? What if they weren’t? would you still do the same algebra?
No, they’re not, at least not at α=.05. In fact only a few are. The p-values are available on Slide 13 if you want to check them out.
Even so, yes, you will do the algebra the same way. A non-significant coefficient may not be significantly different from 0, but that doesn’t mean it actually = 0. If you leave them out of the equation as you do the algebra, you will be setting them =0 and it can throw everything off.
The only thing I would do differently is to come to different conclusions about these coefficients, but the math is the same either way.
Q3: How I can calculate in percentage the individual contribution of group, gender and age?
A: I think what you’re looking for is an eta-square statistic, which will tell you the percent of variance in Y that is accounted for by each predictor in the model.
Q4: Can we go back to interpreting the gender difference in group 3? We didn’t get to see the graph of that one…
A: (Start on Slide 35). It’s true that there isn’t a graph, but I can tell you that it looks pretty much like the one on Slide 27.
On Slide 27 we see the regression lines for Male and Female students in the control group. The difference in their intercepts is -.843 and the difference in their slopes is -.425.
What we see in Slide 35 is that the Group 3 Male students have an intercept .07 lower than Group 4 Male students. The same is true for Female students. Group 3 is .07 lower.
But the difference in intercepts between Males and Females in Group 3 is exactly the same as the difference in intercepts between Males and Females in Group 4—just what we saw in Slide 27. That difference is -.843, the coefficient for males (Gender_N=0).
So we can interpret that -.843 not just as the difference in mean satisfaction between male and female students in the control group, but in each group.
You’ll see the same pattern for the slopes. In both Group 3 and Group 4 (and any other group), the difference in slopes is -.425.
Q5: In a randomized trial you would mainly be interested to conclude on the effect of group(s), right? What is then the difference between adjusting for age and effect of age?
It’s just a difference on which coefficients you focus on.
What we know here is we have an interaction between age and group. I’ve been talking about the effect of age—the slope.
But we do also get from the table the effect of group at one specific age: the mean. These group effects are the differences in the mean satisfaction for each of the treatment groups compared to the control (at the mean age).
Because of the interaction, there is not just one mean difference across groups. Those mean differences change, depending on the age. You can use the marginal means to pick specific ages at which you want to make those comparisons. Or you can center Age at different values in order to make the regression coefficients reflect those differences.
I consider the first option much simpler and you can read more about it in this article I wrote about Spotlight Analysis.
Q6: How do you interpret the R2s for each of the lines in the graphs?
See Slide 27. I wouldn’t honestly put much into those as the graph is ignoring the other effects in the model.
That said, those R2s are from the simple regression models between Age and Satisfaction for each sex. So if you fit a simple regression just for Males, we’d say that 19.4% of the variance in satisfaction scores could be explained by Age. And for females, .4% of the variance in satisfaction scores could be explained by Age (so basically none).
Q7: What would be the test for the difference among the 4 slopes or intercepts?
That test is found in the ANOVA table on Slide 12. The F statistic for Group is .014. This tests whether the four intercepts have any differences. (Conclusion: no evidence they do, p=.998).
The F statistic for Group*Age tests whether the four slopes are the same. F=3.795, p=.011. So I would say we have evidence they are different. We don’t know which ones are different until we look at the coefficients.
Q8: Should we always center?
Generally, yes. See this article: When NOT to Center a Predictor Variable in Regression
Q9: Can changing the reference level give us a significant effect not observed before?
Yes and no. It won’t change the overall F tests that indicate if there are any differences among groups (See Q7, above).
But sometimes an F test indicates there is at least one difference somewhere, but you don’t see it in the coefficients. For example, our coefficients are only comparing each treatment condition to the control. It’s possible that if the only difference was between two treatment groups, but neither’s mean was far enough from the controls to be significant, then changing the reference group could lead to a significant result that you didn’t see before.
But be careful here. This usually happens when either the control’s mean is between the two treatment means or the control group just has a much smaller sample size than the treatment groups do. You do not want to be switching around reference groups just searching for a significant effect if it’s not a scientifically important comparison. And if it’s about sample size, keep that in mind and think carefully about the difference between statistical significance and scientifically meaningful effect sizes.
Q10. What is the significance of where the male and female lines cross?
A: See Slide 27. Not much at all. That’s just the age at which the mean satisfaction scores are equal for men and women. I don’t think that is really important in this study, but it could certainly be of interest in other studies.
Q11: This data set does not contain repeated measurements over time but usually in a randomized trial you would have at least two measurements (pre-post) and often more. If you add measurements over time, would you still center age or just adjust for age and focus on the effect of time?
A: Yes, I would still center Age. Even if this isn’t a key predictor, it’s still useful to be able to interpret its results and it makes the intercept more meaningful.
Q12: When we say “interactions”, does it mean there is correlation between different predictor factors?
A. No. See The Difference Between Interaction and Association. And in case this is a little confusing, substitute the term Correlation for Association.
Q13: If you included month with year, would slope be different? Since this included age as discreet, if we used 25.5 for 25 years and 6 months?
If we measured age at a more granular level, then no, the slope wouldn’t change. It might change slightly at a level of rounding, but that’s it. If you changed the unit of measurement to months, however, so that 25 years 6 months = 306 months, then yes the slope would change as the units change.
Q14: Two of the males have low scores. Would you worry about them as outliers?
A. No. It’s true they might be outliers and in a full analysis. I would run some influence statistics to see how much they’re changing results. And that could lead me to investigate whether they were actually errors. If they weren’t I wouldn’t remove them. I might do something like a quantile regression instead, but unless they’re really problematic, I leave outliers in.
Q15: I’m a bit confused in relation to your answer to the last question (can we conclude that in males satisfaction is significantly influenced by age, but not females) isn’t it the p of the interaction you have to assess (not just age itself)?
A: Yes, absolutely. Before you make any true conclusions, you will want to look at p-values. I was just describing what the relationships look like.
I also wouldn’t use “influenced” in this study. For all we know, it’s not really about Age. Maybe it’s really about some other variable that happens to be correlated with Age, like previous experience with online learning.
Q16: The different reference group definitions (between R and SPSS) seem to give different significance values. Is that because they are testing different hypotheses? (e.g. “Is group 1 different from the reference group?”)
A: Yes. Because they’re using different reference groups, we have different hypothesis tests and therefore different p-values.
Q17: Is the figure on Slide 33 for males or females?
Q18: In Slide 33, why do you choose group 4 as your standard [reference group]? How do we choose if we don’t have control or intervention group?
In this example, yes, it was easy to choose the reference group because there was a control. If there isn’t, there may or may not be a clear best reference group. See this article: Strategies for Choosing the Reference Category in Dummy Coding
Q19: How can I interpret the coefficients for gender in a regression model (assuming gender was one of the predictor variables rather than control)?
A: It doesn’t really matter that gender is a control or a key predictor. In both cases, it’s an observed predictor variable, meaning you didn’t randomly assign people to these groups, you just observe which group they were already in.
So whether this observed variable is central to your research question or just a control variable, you’re still going to interpret its coefficient as the difference in means for males vs. females. If it’s central, you’ll probably discuss this difference a lot in your discussion and if it’s a control, you won’t.
Q20: What does the probability for the intercept mean?
I assume you mean the p-value. It is testing the null hypothesis that the intercept = 0. As you see on slide 17, the intercept is the mean satisfaction for females in the control group at the mean age.
That’s not really a hypothesis test we’re usually interested in, whether one specific group’s mean = 0. But it could be in some studies.
Q21: So first you do an ANOVA analysis, and then a GLM?
Ah, no. This is one of those confusing naming things in statistics. I just ran one general linear model. Every linear model, whether we call it regression or ANOVA, will give you an ANOVA table. It’s the table of Sums of Square, Mean Squares, and F tests.
Some regression procedures don’t give much info on that table—they just give you these statistics for the model as a whole and the error term. But just because they’re only printing out a few for you doesn’t mean that each predictor in the model doesn’t have it’s own SS, MS, and F.
So the GLM procedure prints out the full ANOVA table by default, then optionally (I had to ask for it) prints out the regression coefficients table.
You can get this regression coefficients table if you’re running an ANOVA model as well. We just usually don’t because it’s not very helpful.
Q22: Is it generally true that the higher the effect size the higher the likelihood of it being significant?
Yes, given the same sample size, standard deviation, and alpha level. See: 5 Ways to Increase Power in a Study.
Q23: Since none of the linear group terms were significant, would the model be improved by keeping ONLY the interaction terms for the groups?
No. It’s true that the linear group terms were small, but they weren’t exactly 0. If you take them out, you are forcing them to equal 0. That would force each of those four regression lines to go through the same point right at the mean age. Sure, they crossed close to that age, but it wasn’t exactly at that age. We get a more accurate regression line if we let those things be estimated by the data, not forcing them to equal 0.
Q24: On Slide 12 the estimate for the results in R is different for the main effect of age (-0.4471 p = 0.001949) compared with the value in SPSS (.108 p = 0.389 ). How come? I understand that the interaction terms’ intercepts can differ between R and SPSS due to the reference group chosen, but why should these individual terms have different coefficients?
A: It’s because of the interactions. Because Age is involved in two interactions (Age*Gender and Age*Group) the slope of Age is different for each Gender and each Group. If you’re comparing all other lines to the purple line on Slide 33, you’ll get a different slope difference than if you compare all lines to the green line.
Q25: Also, the slopes in the graph on Slide 33 don’t seem to match the data. The slopes for green and beige are of different signs but about the same magnitude (.272 and -.287) but look massively different in steepness. Blue has a coeff of -0.022 which should be not very steep but it’s steeper than purple at 0.108.
Yes, the graph and the equations are slightly different. The intercepts and slopes in the model equations are just for females. The graph is combining males and females together. So they’re in the same order, but they’re not exactly the same.