Clarifications on Interpreting Interactions in Regression

In a previous post, Interpreting Interactions in Regression, I said the following:

In our example, once we add the interaction term, our model looks like:

Height = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun

Adding the interaction term changed the values of B1 and B2. The effect of Bacteria on Height is now 4.2 + 3.2*Sun. For plants in partial sun, Sun = 0, so the effect of Bacteria is 4.2 + 3.2*0 = 4.2. So for two plants in partial sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 4.2 cm taller than a plant with less bacteria.

For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4. So for two plants in full sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 7.4 cm taller than a plant with less bacteria.

But I just received the following question about this explanation.  I thought I’d respond here, in case I’m confusing other people as well.

The question was:

I was confused on how to interpret the interaction results. According to the post “For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4.” I do not understand why the “sun” coefficient is not included, such that the effect of bacteria in full sun would be 9 + 4.2 + 3.2*1. Thanks for your help.

And here’s my answer:

Excellent question.  First of all, you would need to include the 9 (the coefficient for full sun) to calculate the predicted, or mean, height for plants in full sun at any specific value of Bacteria that you decided to plug in.

Because Sun is dummy-coded, that 9 (Sun’s coefficient) represents the difference in mean plant heights for plants in full sun compared to those in partial sun ONLY when Bacteria=0.

But to know the effect of Bacteria levels on plant height, you don’t need to know the differences in means.  The effect of a predictor variable, X,  in a regression model is how much Y differs, on average, for a one-unit difference in X.

In this example, it’s the increase (or decrease) in plant height for each incremental difference in soil bacteria count.

That’s the slope in a simple linear regression.

The interaction is telling you that this increase is not the same for plants in full and partial sun.

So the coefficient of Bacteria on its own is not enough to tell you the effect of Bacteria on plant height.  The coefficient of Bacteria is not an overall slope for Bacteria.

Because it’s not a constant effect.  There are two different slopes (effects of Bacteria on height).  One for full sun and one for part sun.

 

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Reader Interactions

Comments

  1. Bul says

    Hi all) I need to clarify about the significance of the coefficients. Do we require that all coefficients are significant? I mean coefficient of Bacteria and coefficient of the interaction of Bacteria with Sun. If only the coefficient of interaction is significant, so we consider only its value for interpetation (not summing with coefficient of Bacteria)? Thank you for clarifying.

  2. Seren says

    If I am unsure whether the model requires an interaction term, and I add it in anyway could this cause the model to be incorrect? In other words is it a safe bet to always add in an interaction term? Thank you

  3. A says

    The effect of Bacteria on Height can be interpreted as “how much Height differs for a one-unit difference in Bacteria”

    Rewrite original formula:
    Height_1 = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun
    = 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria

    Assume, Bacteria increase by 1:
    Height_2 = 35 + 9*Sun + (4.2 + 3.2*Sun)* (Bacteria + 1)
    = 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria + (4.2 + 3.2*Sun)
    = Height_1 + (4.2 + 3.2*Sun)

    Therefore, Height_2 – Height_1 = 4.2 + 3.2*Sun

    So the effect is 4.2 + 3.2*Sun

    • Karen says

      Yes, exactly. That’s the whole idea of the interaction–the effect of bacteria on height is not the same for without sun as it is for with sun.

  4. ben says

    would it be correct to say that the effect of bacteria AND sun is 9+4.2+3.2*1? or what else would that sum mean?
    thanks for any input,
    ben

  5. Hendrik says

    What if, instead of Bacteria, we had another dummy coded variable that interacted with Sun: Would this argument still hold?


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.