Interpreting Lower Order Coefficients When the Model Contains an Interaction

A Linear Regression Model with an interaction between two predictors (X₁ and X₂) has the form:

Y = B₀ + B₁X₁ + B₂X₂ + B₃X₁*X₂.

It doesn’t really matter if X₁ and X₂ are categorical or continuous, but let’s assume they are continuous for simplicity.

One important concept is that B₁ and B₂ are not main effects, the way they would be if there were no interaction term. Rather, they are conditional effects.

Main Effects and Conditional Effects

A main effect is the overall effect of X₁across all values of X_2.That overall effect is the difference in the mean of Y for each one unit change in X₁.

If there were no interaction term in the model, then B₁ is a main effect, and that is how regression coefficients are generally interpreted.

But B₁ is not that when there is an interaction in the model. It is the effect of X₁ conditional on X₂ = 0.

For all values of X₂ other than zero, the effect of X₁ is B₁ + B₃X₂.

The biggest practical implication is that when you add an interaction term to a model, B₁ and B₂ change drastically by definition (even if B₃ is not significant) because B₁ and B₂ are measuring a different effect than they were in a model without the interaction term.

But it isn’t labeled differently on the output. You have to know how to interpret those effects.

So don’t panic if B₁ suddenly isn’t significant. It’s measuring something else altogether.

So B₁, in the presence of an interaction, is the effect of X₁ only when X₂ = 0.

If X₂ never equals 0 in the data set, then B₁ has no meaning. None.

Centering to Improve Interpretation

This is a good reason to center X₂. If X₂ is centered at its mean, then B₁ is the effect of X₁ when X₂ is at its mean. Much more interpretable.

Even better is to center X₂ at some meaningful value even if it’s not its mean. For example, if X₂ is Age of children, perhaps the sample mean is 6.2 years. But 5 is the age when most children begin school, so centering Age at 5 might be more meaningful, depending on the topic being studied.

If X₂ is categorical, the same approach applies, but with a different implication. If X₂ is dummy coded 0/1, B₁ is the effect of X₁ only for the reference group.

The effect of X₁ for the comparison group is B₁ + B₃. To see why, plug in 0 for X₂ for the reference group and write out the regression equation. Then plug in 1 for X₂ for the comparison group. Do the algebra.

Interpreting Linear Regression Coefficients: A Walk Through Output

Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Comments

Steph says

April 6, 2024 at 11:03 am

Hi,
Thank you for this helpful article. I want to clarify, without the presence of an interaction, is B1 the main effect of X1 across *all values* of X2, or is B1 the effect after holding all other predictors (X2, X3, X4, etc etc) constant? So for example with the presence of an interaction, if B1 for X1 is 10, and my X2 values range from 0-10, is B1 the effect when my X2 is any value it contains (0-10)? Or, is B1 the effect only when my X2 is 0?

Reply
- Karen Grace-Martin says
  
  May 3, 2024 at 12:04 pm
  
  Hi Steph,
  
  B1 is the effect of X1 after holding all other predictors constant, but it doesn’t have to be held at 0. It can be at any value of X2 if there are no interactions.
  
  Reply
George says

December 7, 2022 at 4:02 am

Hi Karen,

I understand the point you want to make but I do believe that further clarifications are required. Mutliple regresssion models provide the unique effect of each predictor, controlling for the others (setting them to zero). So, anyway, the individual effect, say, B1 is a partial slope – i.e., conditional to setting the other predictors to zero.

I think there is a lot of misunderstanding in interpreting B1 effects between:

(1) y = B0+B1*X1
(2) y = B0+B1*X1+B2*X2

Terminology is a *pain* but it does make a difference here.

Best

Reply
- Karen Grace-Martin says
  
  December 21, 2022 at 3:08 pm
  
  Hi George,
  
  Oh, absolutely, each coefficients is the unique effect after holding others constant.
  
  It’s actually not necessary to hold others constant at 0 to control for them. Any constant will do. There are certainly advantages of choosing 0 for the other variables, but it doesn’t affect the interpretation of a variable’s coefficient unless those others are involved in an interaction with the variable of interest. That was a lot of my point.
  
  Reply
Tongming Kang says

August 14, 2020 at 10:51 am

Hi Karan,

Thanks for this article! It really helped me to gain a better understanding in interpreting interaction in a model.

And I want to ask you some questions. Let’s say I have a multivariable linear model containing two independent variables. Is it OK to center both two variables before introducing the interaction term? And when it comes to intrepretation, is it always recommended to center predictors? (Or in which condition would it be nice to center predictors?)

Thanks for your help!

Reply
sahar says

July 25, 2020 at 7:10 am

Hi Karen,
It was excellent. your explanation was simple, practical and suitable.

Reply
Bernd says

July 23, 2020 at 9:28 am

I made this observation when I compared the outcomes of a mixed-model analysis with two fixed effects, one categorical (A) and one continuous (B), the latter entered the model as covariate. The fixed effect stats for A were quite different in two models with or without the interaction with B in the way you described it. In fact, after using a z-transform of B (in SPSS a z-transform of a variable can be requested via the Descriptives command), the meaning of the main effect of A was established.

Thanks for the explanation!
Bernd

Reply
Alexander Seidel says

July 23, 2020 at 9:10 am

For categorial predictors it is indeed important how to center them. Setting them to -1 and 1 (deviation coding) compares the level at 1 to the mean of the predictor, while setting them to -0.5 and 0.5 (simple coding) compares the level at 0.5 to the level -0.5. It also changes the meaning of your parameter estimate (distance from the center of the predictor to distance between levels)

For more predictor levels this becomes more complicated to code with simple coding, but google helps.

Reply
Thet says

May 11, 2019 at 5:57 am

Hi Karen,

Thanks a lot for your posts. They are extremely valuable to my thesis.
I am now having a situation with the confidence intervals of the coefficients in a model with statistical interaction term.

Using the example in your post, how can I know the confidence interval of each effect?
1. When X2=0, Y = (B1+B3*0)*X1 = B1*X1
Can I simply use the confidence interval of B1 generated/calculated by the software output?

2. When X2=1, Y= (B1+B3*1)*X1
How can I calculate the confidence interval of that (B1+B3)?

Or is it nonsense to calculate confidence intervals of each coefficients in such models with interaction?

With Much Thanks,
Thet

Reply
- Karen Grace-Martin says
  
  May 31, 2019 at 11:40 am
  
  Hi Thet,
  
  It’s more common to report the confidence intervals for each parameter estimate, which is what the software generates. You can’t use the confidence interval of B1 for B1+B3.
  
  Reply
Elaine says

March 6, 2016 at 5:01 am

Omer, it’s not a silly question. Depending on your software, you command it to show the basic info of x. For STATA, its sum X2. I forget for sas but something similar.

The sum command will show the mean value for X2 in your data. Say it’s 6.2. You must now generate a new variable for X2. In stata, you would say
gen X2_c (or whatever name you like) = X2-6.2

Your data is now centered. I believe to center at a different value you would just subtract that value from X2, but that seems too simple. I’m a beginner, too.

Reply
- Karen Grace-Martin says
  
  October 26, 2018 at 4:56 pm
  
  Hi Elaine,
  
  No, you got it–it really is that simple. For whatever value you want to center on, just subtract that value from X2.
  
  Reply
Charles Lao says

May 29, 2015 at 11:22 am

Nice post. However, I think the condition only apply if your design matrix is a offset from reference model. For a over-parameterized model or sigma-restricted model B1 will the your main effect.

Reply
Omer says

April 27, 2015 at 4:48 am

Hi Karen,

This may be a silly question but how do you center X2?

Reply
- Soutik says
  
  August 9, 2017 at 9:06 pm
  
  Hi Omer,
  
  You can center X2 by standardizing it, i.e. (X2-mean(X2))/sd(X2)
  
  Reply

Main Effects and Conditional Effects

Centering to Improve Interpretation

Reader Interactions

Comments

Leave a Reply Cancel reply