I recently received this great question:
Hi Karen, ive purchased a lot of your material and read a lot of your pdf documents w.r.t. regression and interaction terms. Its, now, my general understanding that interaction for two or more categorical variables is best done with effects coding, and interactions cont v. categorical variables is usually handled via dummy coding. Further, i may mess this up a little but hopefully you’ll get my point and more importantly my question, i understand that
1) given a fitted line Y = b0 + b1 x1 + b2 x2 + b3 x1*x2, the interpretation for b3 is the diff of the effect of x1 on Y, when x2 changes one unit, if x1 and x2 are cont. ( also interpretation can be reversed in terms of x1 and x2).
2) given a fitted line Y = b0 + b1 x1 + b2 x2 + b3 x1*x2, the interpretation for b3 is it affects Y by changing the slope (whereas b2 affects the intercept), if x1 is cont. and x2 is cat 0,1, namely dummy coded. Thus, b1 is not a main effect for x1.
But what was not clear to me is what happens when x1 and x2 are effect coded cat variables, let’s say (-1,1) and the fitted line is Y = b0 + b1 x1 + b2 x2 + b3 x1*x2, the interpretation for b3 is ???? this was only hinted at in the outlines .. I’d like to know how to interpret the b3 in this case, and I’d like to know why in this case of interaction, b1 is still considered a main effect for say x1? Thanks for any clarifications!
Any help would be greatly appreciated,
Great question. First, yes, everything you state in points 1 and 2 are correct. I would add, though, that the effect-coded coefficients are not all that easy to interpret by themselves. They do give you p-values for true main effects, and that’s useful, but you usually need to interpret using multiple comparison tests of the means.
To answer your questions:
It all comes down to what is a 0 point in dummy and effect coding.
In dummy coding, we’ve set the value of 0 to one of the categories, so:
1. the coefficients reflect actual cell mean differences, and have meaningful interpretations as such
2. when there is an interaction, the value of b1, eg., is the effect of X1 when X2 = 0. Since X2=0 for one category of X2, b1 is not a main effect (an overall effect of X1 across all values of X2). It’s a marginal effect–an effect of X1 at a single value of X2.
In effect coding, we’ve set the value of 0 to being in between the two categories, so:
1. the coefficients are differences between cell means and grand means, and do not have particularly meaningful interpretations. So you can use the p-value to tell that there is a significant interaction, but the only way to interpret that interaction in a meaningful way is to actually look at the means. That’s why we usually just look at F statistics and means in ANOVA, which uses effect coding. The coefficients themselves aren’t very interpretable.
2. when there is an interaction, the value of b1, eg., is the effect of X1 when X2 = 0. Since X2 = 0 at the mean of the two categories of X2, b1 is a main effect. It’s the effect of X1 at the mean value of X2.
Below, I’ve inserted an example. I’ve displayed the means of each group and the coefficients when the variables are dummy and effect coded.
In dummy coding, the intercept is one of the four means–when both Poverty and Gender=0. The various coefficients are differences from that mean. Grab a calculator and see if you can figure out how to get each mean.
In effect coding, the intercept is the grand mean. That point right in the middle of the graph. The coefficients are differences from that point to the points marked with X on the graph. Once again, see if you can figure out what each coefficient is telling you.