Adding interaction terms to a regression model can greatly expand understanding of the relationships among the variables in the model and allows more hypotheses to be tested.
The example from Interpreting Regression Coefficients was a model of the height of a shrub (Height) based on the amount of bacteria in the soil (Bacteria) and whether the shrub is located in partial or full sun (Sun). Height is measured in cm, Bacteria is measured in thousand per ml of soil, and Sun = 0 if the plant is in partial sun and Sun = 1 if the plant is in full sun. The regression equation was estimated as follows:
Height = 42 + 2.3*Bacteria + 11*Sun
It would be useful to add an interaction term to the model if we wanted to test the hypothesis that the relationship between the amount of bacteria in the soil on the height of the shrub was different in full sun than in partial sun.
One possibility is that in full sun, plants with more bacteria in the soil tend to be taller, whereas in partial sun, plants with more bacteria in the soil are shorter. Another possibility is that plants with more bacteria in the soil tend to be taller in both full and partial sun, but that the relationship is much more dramatic in full than in partial sun.
The presence of a significant interaction indicates that the effect of one predictor variable on the response variable is different at different values of the other predictor variable. It is tested by adding a term to the model in which the two predictor variables are multiplied. The regression equation will look like this:
Height = B0 + B1*Bacteria + B2*Sun + B3*Bacteria*Sun
Adding an interaction term to a model drastically changes the interpretation of all of the coefficients. If there were no interaction term, B1 would be interpreted as the unique effect of Bacteria on Height. But the interaction means that the effect of Bacteria on Height is different for different values of Sun. So the unique effect of Bacteria on Height is not limited to B1, but also depends on the values of B3 and Sun. The unique effect of Bacteria is represented by everything that is multiplied by Bacteria in the model: B1 + B3*Sun. B1 is now interpreted as the unique effect of Bacteria on Height only when Sun = 0.
In our example, once we add the interaction term, our model looks like:
Height = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun
Adding the interaction term changed the values of B1 and B2. The effect of Bacteria on Height is now 4.2 + 3.2*Sun. For plants in partial sun, Sun = 0, so the effect of Bacteria is 4.2 + 3.2*0 = 4.2. So for two plants in partial sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 4.2 cm taller than a plant with less bacteria.
For plants in full sun, however, the effect of Bacteria is 4.2 + 3.2*1 = 7.4. So for two plants in full sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 7.4 cm taller than a plant with less bacteria.
Because of the interaction, the effect of having more bacteria in the soil is different if a plant is in full or partial sun. Another way of saying this is that the slopes of the regression lines between height and bacteria count are different for the different categories of sun. B3 indicates how different those slopes are.
Interpreting B2 is more difficult. B2 is the effect of Sun when Bacteria = 0. Since Bacteria is a continuous variable, it is unlikely that it equals 0 often, if ever, so B2 can be virtually meaningless by itself. Instead, it is more useful to understand the effect of Sun, but again, this can be difficult. The effect of Sun is B2 + B3*Bacteria, which is different at every one of the infinite values of Bacteria. For that reason, often the only way to get an intuitive understanding of the effect of Sun is to plug a few values of Bacteria into the equation to see how Height, the response variable, changes.
- Clarifications on Interpreting Interactions in Regression
- Interpreting Lower Order Coefficients When the Model Contains an Interaction
- The Impact of Removing the Constant from a Regression Model: The Categorical Case
- Understanding Interaction Between Dummy Coded Categorical Variables in Linear Regression