In a previous post, **Interpreting Interactions in Regression**, I said the following:

In our example, once we add the interaction term, our model looks like:

Height = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun

Adding the interaction term changed the values of B1 and B2. The effect of Bacteria on Height is now 4.2 + 3.2*Sun. For plants in partial sun, Sun = 0, so the effect of Bacteria is 4.2 + 3.2*0 = 4.2. So for two plants in partial sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 4.2 cm taller than a plant with less bacteria.

For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4. So for two plants in full sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 7.4 cm taller than a plant with less bacteria.

But I just received the following question about this explanation. I thought I’d respond here, in case I’m confusing other people as well.

The question was:

I was confused on how to interpret the interaction results. According to the post “For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4.” I do not understand why the “sun” coefficient is not included, such that the effect of bacteria in full sun would be 9 + 4.2 + 3.2*1. Thanks for your help.

And here’s my answer:

Excellent question. First of all, you would need to include the 9 (the coefficient for full sun) to calculate the predicted, or mean, height for plants in full sun at any specific value of Bacteria that you decided to plug in.

Because Sun is **dummy-coded**, that 9 (Sun’s coefficient) represents the difference in mean plant heights for plants in full sun compared to those in partial sun ONLY when Bacteria=0.

But to know the *effect* of Bacteria levels on plant height, you don’t need to know the differences in means. The effect of a predictor variable, X, in a regression model is how much Y differs, on average, for a one-unit difference in X.

In this example, it’s the increase (or decrease) in plant height for each incremental difference in soil bacteria count.

That’s the slope in a simple linear regression.

The interaction is telling you that this increase is not the same for plants in full and partial sun.

So the coefficient of Bacteria on its own is not enough to tell you the effect of Bacteria on plant height. The coefficient of Bacteria is not an overall slope for Bacteria.

Because it’s not a constant effect. There are two different slopes (effects of Bacteria on height). One for full sun and one for part sun.

### Related Posts

- Using Marginal Means to Explain an Interaction to a Non-Statistical Audience
- Interpreting Lower Order Coefficients When the Model Contains an Interaction
- Your Questions Answered from the Interpreting Regression Coefficients Webinar
- Understanding Interactions Between Categorical and Continuous Variables in Linear Regression

{ 5 comments… read them below or add one }

If I am unsure whether the model requires an interaction term, and I add it in anyway could this cause the model to be incorrect? In other words is it a safe bet to always add in an interaction term? Thank you

The effect of Bacteria on Height can be interpreted as “how much Height differs for a one-unit difference in Bacteria”

Rewrite original formula:

Height_1 = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun

= 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria

Assume, Bacteria increase by 1:

Height_2 = 35 + 9*Sun + (4.2 + 3.2*Sun)* (Bacteria + 1)

= 35 + 9*Sun + (4.2 + 3.2*Sun)*Bacteria + (4.2 + 3.2*Sun)

= Height_1 + (4.2 + 3.2*Sun)

Therefore, Height_2 – Height_1 = 4.2 + 3.2*Sun

So the effect is 4.2 + 3.2*Sun

Yes, exactly. That’s the whole idea of the interaction–the effect of bacteria on height is not the same for without sun as it is for with sun.

would it be correct to say that the effect of bacteria AND sun is 9+4.2+3.2*1? or what else would that sum mean?

thanks for any input,

ben

What if, instead of Bacteria, we had another dummy coded variable that interacted with Sun: Would this argument still hold?

{ 1 trackback }