Understanding Interactions Between Categorical and Continuous Variables in Linear Regression

by Jeff Meyer

by Jeff Meyer, MPA, MBA

We’ve looked at the interaction effect between two categorical variables. Now let’s make things a little more interesting, shall we?

What if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two?

Well, you’re in luck. Read on.

We’ll keep working with our trusty 2014 General Social Survey data set. But this time let’s examine the impact of job prestige level (a continuous variable) and gender (a categorical, dummy coded variable) as our two predictors. Here, gender is called “male” and is coded 1 for males and 0 for females.

jm-blog-interactions_btwn_categorial_contin_variables-1

We can see that, on average, men make approximately $16,285 more than women.

What we want to know is, does the gender difference in income differ based on the job prestige level? An interaction term will tell us.

jm-blog-interactions_btwn_categorial_contin_variables-2

So you’re probably wondering, how do we interpret these coefficients?

The constant tells us that women with an average prestige level job will earn $25,874. Men with an average prestige level job would expect to earn $15,716 more than women, on average $41,590.

But this gender gap of $15,716 isn’t the same for every prestige level of job. The interaction is statistically significant at a level of 0.0001.

For every 1-unit increase in job prestige score, a woman should expect to earn an additional $709, while a man should expect to earn an additional $1,312 (0.709 + 0.603).

In other words, the increase in salary for having a higher prestige job differs for men and women.

Another way to say the same thing is that the difference in salary between men and women widens as job prestige score increases.

Cue graph:

jm-blog-interactions_btwn_categorial_contin_variables-3

So if someone tells you that men make X amount more than women, keep in mind that the difference in income depends (in part) upon the caliber of the job. The more prestigious the job, the greater the gap, as the graph shows.

Moral of the story: When there is a statistically significant interaction between a categorical and continuous variable, the rate of increase (or the slope) for each group within the categorical variable is different.

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

{ 9 comments… read them below or add one }

David

Hi,
How did you make that Cue graph?

Reply

David

good explanation. If before adding interaction, there is no significant for male, but after adding interaction, there is significance for male, how to interpret it in your example?

Reply

Jeff Meyer

Hi David,

In this example we are using the interaction to determine whether the slope for job prestige is different between males and females. If the data is fairly balanced between males and females I wouldn’t expect the parameter estimate for “males” to be different by much between the two models.

If we had an interaction between 2 categorical variables then the results could be very different because male would represent something different in the two models. For example if the two categories were gender and marital status, in the non-interaction model the coefficient for “male” represents the difference between males and females. In the interaction model male represents the difference between male and females for the base category of marital status. In which case “male” represents a different parameter estimate between the two models.

Jeff

Reply

David

Hi Jeff, Thanks for your reply. My last question is If we have one categorical with three groups(male=reference, female, transgender) and one continuous predictor (prestige level), after adding with interaction terms, p value for female become significant and also female* prestige, how to explain it? Does it mean something wrong with the analysis?

Reply

Jeff Meyer

It most likely means that there is less unexplained variance in the model by including the interaction and as a result the simple effect of female has become significant. Most importantly, you should check the effect size to see how your predictors have changed when adding the interaction.

Reply

Ahmad Elbadry

Thanks Jeff for your explanation, but I think you made a mistake. You say:
“The constant tells us that women with an average prestige level job will earn $25,874. Men with an average prestige level job would expect to earn $15,716 more than women, on average $41,590.”
But these values are not for an average prestige level job; they are for a job with a prestige level of zero. Your statement becomes true only when the job prestige level variable is transformed such that it is centered around its mean; i.e., the mean becomes zero.

Reply

Jeff Meyer

Hi Ahmad,

Thanks for your comments and attention to detail. I failed to point out that I had centered job prestige score. If you look at the 2nd table you will see that I used the extension “_ctr”. I typically add this whenever I center a continuous variable. In the first table it is cut off and is not obvious. If the variable was not centered your statement is definitely accurate.

Reply

Silvia Stroe

Hi,

one question: how do you calculate the interaction term in this case? Do you just multiply the categorical variable gender (0 or 1) with the continuous variable job prestige level?
In this case the interaction term will take either the value 0 (for females) or the value of the job prestige level (for males).
Is is ok to construct the interaction term just by multiplying the two variables like that?
I am asking because I have a very similar case and I didn’t know how to go about. Thanks so much for answering,

Reply

Jeff Meyer

Hi Silvia,

Yes you can create an interaction by generating a new variable which is the product of a dummy variable times the continuous variable. But it is easier to let the software do it in your model. In SPSS in the UNIANOVA command you would add a new predictor such as job_prestige*gender. If you are using Stata it is job_prestige#gender. In R I believe it would be job_prestige:gender. Be sure to also include the two individual predictors of the interaction, such as job_prestige and gender, in the model.

Jeff

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: