Just yesterday I got a call from a researcher who was reviewing a paper. She didn’t think the authors had run their model correctly, but wanted to make sure. The authors had run the same logistic regression model separately for each sex because they expected that the effects of the predictors were different for men and women.

On the surface, there is nothing wrong with this approach. It’s completely legitimate to consider men and women as two separate populations and to model each one separately.

As often happens, the problem was not in the statistics, but what they were trying to conclude from them. The authors went on to compare the two models, and specifically compare the coefficients for the same predictors across the two models.

Uh-oh. Can’t do that.

If you’re just describing the values of the coefficients, fine. But if you want to compare the coefficients AND draw conclusions about their differences, *you need a p-value for the difference*.

Luckily, this is easy to get. Simply include an interaction term between Sex (male/female) and any predictor whose coefficient you want to compare. If you want to compare all of them because you believe that all predictors have different effects for men and women, then include an interaction term between sex and each predictor. If you have 6 predictors, that means 6 interaction terms.

In such a model, if Sex is a dummy variable (and it should be), two things happen:

1.the coefficient for each predictor becomes the coefficient for that variable ONLY for the reference group.

2. the interaction term between sex and each predictor represents the DIFFERENCE in the coefficients between the reference group and the comparison group. If you want to know the coefficient for the comparison group, you have to add the coefficients for the predictor alone and that predictor’s interaction with Sex.

The beauty of this approach is that the p-value for each interaction term gives you a significance test for the difference in those coefficients.

{ 31 comments… read them below or add one }

Dear Sir,

Thanks for nice explanation and it helped me a lot for my research work

Dear Karen,

I am currently analyzing a data set which is grouped (resp. dummy-coded) into non-athletes (0) and athletes (1). My research question is whether they differ regarding the relationships between different (namely 8) self-reported pain coping strategies and pain-related disability. I use the approach to include eight interaction terms in the model. Obviously, the beta weight for the reference group are different when I include the interactions terms from what I see when i run the regression for the reference group alone. Does this not lead to interpretation issues? Also, I got insecure when choosing the regression method. I used stepwise when running the models seperately and different predictors for each model remain in the model. However, when I chose stepwise method in the model with the interaction term, none of the interaction terms remains in the model. Which method do you recommend for such a kind of approach?

Kind regards,

Hannah

Hello Karen,,,

i examine whether the association between management changes (dummy variable equal to one if there is a change in management, zero otherwise) and earnings management activities differ between two regime (dummy variable equal for for new regime, zero otherwise). As shown, i have two dummy variables and i do not know if it is appropriate to interact them in order to examine the difference.

hey

I wanted to know is thee any way other than the use f interaction term to compare the coefficients of two models? other than the use of

interaction term

I want to highlight for comparison of logit and probit coefficients across groups just a p-value is not enough, since there are substantial issues pertaining to such comparisons. Please refer to this great document by Richard Williams http://www3.nd.edu/~rwilliam/stats3/RW_ESRA2013.pdf

Good luck 🙂

If the models were multinomial logistic regressions, you could compare two or more groups using a post estimation command called suest in stata. Suest stands for seemingly unrelated estimation and enables a researcher to establish whether the coefficients from two or more models are the same or not. Prior to this, the regression models will have to be stored first using the command est store model1, 2, 3, etc

Hello Karen,

I’m analyzing 2 subsamples for my Master Thesis. One subsample for the period before the recent financial crisis and the other period is defined as the period during the financial crisis. I have 6 independent variables. I want to test if the coefficients of these independent variables significantly differ from each other or not for the 2 subsamples. I’m using SPSS, but I have no idea what test or function to use. Could you help me out?

Thanks in advance

Hello Karen,

I am currently running a regression of health on age, age squared, income and eduction and test whether there is a gender difference. So I ran a regression of health on age, age*male, age squared, age squares*male, income, income*male, education and education*male. However the pvalues of the interaction terms show that they are insignificant. I wonder did I do anything wrong in my regression? (P.S. I am using EViews and male is a dummy variable). Thank you very much!

I’m analyzing data, and I’ve split the data into male/female and running separate regression models as you describe. So now you are saying I need do to include for each predictor variable an additional interaction variable of itself and sex? Doesn’t it matter that I have split my dataset? Won’t that mean that all the female interaction effects variables will essentially be: sex*predictor, where the interaction (‘sex’) will be female every time, and never male? So how will it make any difference?

Oops, I think I asked the same question as Vanda above. (But if you can spot a relevant difference, let me know.)

Hi Karen. Thanks for this — it’s been really helpful in thinking about my own situation. I have a slightly different problem in that I want to test the relative influence of the *same* set of predictors in predicting two different data vectors. Essentially what I’d like to know is whether the weights on those identical predictors are different for the two data vectors. I know how I’d set up the data to make your approach work, but I’m not totally sure it’s still the right approach. Any insight?

Thanks in advance!

Hi Matt,

Yep, it’s the same question. 🙂

To do this with an interaction, you need to stack the two data sets and use an indicator variable for which set of investors. Interact that indicator variable with all other predictors.

Hi Karen,

Thank you, these posts are very helpful! I’ve been reading up on this topic a lot but there is one nagging question that I can’t seem to find an answer to anywhere:

Q. The coefficients for the predictor variables and their significance level are for the reference group. For the interaction terms, the coefficient is how much DIFFERENT the comparison group is from the reference, and the p-value indicates if that DIFFERENCE is significant. How in these models can we know whether the effect of the predictor is significant for the comparison group??

E.g.,if the predictor X1 has B = 0.5 and interaction term X1*Z has B = -0.4, then for the comparison group the B of X1 is 0.5 – 0.4 = 0.1. The p-val on the interaction term tells us whether that 0.1 is significantly different from 0.5, but how can we determine if it actually a significant predictor for the comparison group??

Hi Bob,

So you want to know if the comparison group’s slope is significantly different from 0?

Two ways to do it.

1. Switch your coding of 0 and 1 to reverse which is the reference group.

2. Depending on which software you’re using, put the interaction term into the model before the individual terms. In both SPSS GLM and SAS proc glm, this will change the meaning of the coefficients and you’ll get a slope coefficient for each group. It’s hard to explain, but try it and you’ll see (if you’re using one of these–I don’t know if it works in other software).

Hi Karen

Have you found any definite answer to this query of Bob’s? I have also used interaction terms in my model, in exactly the same way and I am still confused about calculating the significance of new value for Beta coefficient (as is 0.1 in Bob’s case). It is simple to calculate p-value for correlation coefficients but what about regression coefficients? Any help will be highly appreciated.

Kind Regards

Maria

Hi Karen,

Thank you for the helpful artical!

I want to test the different effect of temperature on mortality between two cities. I already built two separate regression model for each city and one single regression model with dummy variables (cityA=1, cityB=0). However, how to compare the effect of temperature if I use the single, there is only one coefficient of temperature?

A big thanks in advance!

Hi Alva,

If you include an interaction term between city and temperature, you’ll get another coefficient for it.

Here is an article I wrote about it: https://www.theanalysisfactor.com/interpreting-interactions-in-regression/

And if you want all the step-by-step detail, I would recommend my Interpreting (Even Tricky) Regression Coefficients Workshop. It walks you through interactions. http://theanalysisinstitute.com/on-demand-workshop-irc/. We now have it available on demand.

Hi Karen,

Thank you for the information!

But I have a little bit confuse about the interpretation of the interaction term in the first link.

I already built a regression model with dummy variable and interaction term like this:

Mortality=B0+B1*T+B2*City+B3*City*T (cityA=1,cityB=0, T means temperature)

In SPSS, the coefficient of “city” is not significant, but the coefficient of “T” and “interaction” are significant, can I explain like following:

There is no significant difference between dummy variables which means there is no significant difference between the mortality between city A and city B. But the influence of temperature on mortality is significant different between city A and city B.

Thank you for your help!

Hi Karen,

thanks for the post, would you have a reference? either a book or article on why running separate analysis and using the CI of the coefficient to establish a significant difference is not as good as introducing an interaction term in the model.

Thanks

Hmm, I’d have to look for that. Honestly, I may be remembering that from a personal communication. I would look in a regression book that includes interactions, such as Aiken & West or Kutner et al.

i want to knew that the SPSS & R & STATA gives the same results for poisson regression data analysis??

please answer me??

Hi Abid,

They should. There may be differences in defaults across programs, but if you choose the same options, they should. Have you tried this and gotten different results?

Hi Karen,

applied to a cox model to demonstrate differences in several biomarkers between both sexes, what would “reference group and the comparison group” in your example mean? Men or women, dependent on the coding as a dummy (0 or 1)?

Do you recommend a single model including sex as a covariate or do you refer to separate models for men and women, including all the biomarkers and interaction terms (biomarker*sex) in the separate models, respectively?

Thanks for your help!

Hi Sven,

Yes, exactly. The reference group is whichever one is coded 0.

Running a single model is more efficient–the residual variance is smaller. Also, it give you a p-value for the difference in coefficients.

If neither of those is important in your situation, then running separate models can make interpretation a lot easier.

Karen

cant you also compute the z score using the two coefficients and standard errors?…however you would lose some statistical power in forgoing the interaction term and using two models, however sometimes I think this is easier to interpret.

Also (and maybe someone can clarify this for me) it is sometimes unclear to me what the reference category is for an interaction term. For instance, if my interaction term was composed of race*religion where race is a dummy composed of (1= white and 0=nonwhite) and religion is a dummy where (1= protestant and 0=all others [assuming this was recoded from a categorical variable with multiple categories of religion]), what would the ref category be? Arent there three instances in which the interaction term would take on a zero (i.e. nonwhite(0)*all others(0); nonwhite(0)*protestant(1); and white(1)*all others (0))? So would this be analyzing the interaction of being white(1) and protestant (1) to all other possible categories?? Sorry if this is confusing…Thanks

Hi Jen,

There is an econometric test for comparing two coefficients that does essentially that. I think it’s called a Chow test. The interaction is more efficient, as you mention.

The ref category for that interaction is the one where variables=0: nonwhite/all others. I think where you’re getting confused is that the interaction terms are not comparing the other three groups to this one, as happens with a main effect.

A good exercise is to write out the regression equation, with coefficients for each of the four categories (combinations of race and religion). You’ll get a better idea of which comparison each coefficient is measuring.

I did an example of this in one of my webinars: Interpreting Linear Regression Coefficients: A Walk through Output You may want to check that out. It’s a free download.

Karen

Hi Karen,

Thanks for your helpful article. I’m really fine with it in case of only two subgroups (i.e. sex). Unfortunately I’ve 4 subgroups in my analysis (using SAS) and I would like to perform ‘pairwise tests’ like ‘is there any difference in slopes between subgroup1 and subgroup 2’. How can I complete the PROC GLM (using CONTRAST or ESTIMATE?) to do this correctly?

Thank you!

Hi Anja,

If you use /PRINT SOLUTION you’ll get the paramter estimates. The three coefficients you get for the interaction term will compare the slopes of all three comparison groups to a single reference group.

There isn’t a specific contrast I know of that will give you the other comparisons of slopes (as opposed to means). I would just switch the reference group by recoding the group variable.

Karen

Hello Karen:

I want to investigate if there are differences in the determinants of investment between diferent types of investors. Basically i want to run the same regression for each type of investor and test if their investment is determined by the same variables (no differences in the determinants by type of investors) or not (there are variables that are more relevant to some investors than for others). Since I’m considering 7 types of investors and several determinants of investment, is it exequible to include an interaction term? In this case i have to include 6 interactions for each predictor, right? Is the wald test an alternative as suggested by Anna? Thank you.

Hi Vanda,

It may be. I’m not familiar with a Wald test in that context. I know there are tests to do such things, but they’re all less precise than an interaction, in which you don’t need to approximate anything–you just estimate it directly. Interactions can be difficult to interpret, though, in regression models.

But if you run your model as a GLM, rather than a regression (which is easy to do in SAS or SPSS), you can include one interaction with investor type for each of the other predictors. I don’t know if Stata has a similar procedure, but I suspect it does.

Karen

You can also do a Wald test – a post-estimation command in Stata – that saves coefficients from the last model you ran and compares them to coefficients in the next model to determine whether they are statistically significantly different from each other. If you want to run sub-group analyses instead of including an interaction term for group x IV, this is a good option!