Why ANOVA is Really a Linear Regression, Despite the Difference in Notation

When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.”  But they never explained why.Stage 2

And I couldn’t figure it out.

The model notation is different.

The output looks different.

The vocabulary is different.

The focus of what we’re testing is completely different. How can they be the same model?

Why you need to know this

Before I answer that, let me mention why it’s so important to see how they’re the same. It’s true that if you never did any data analysis beyond linear models, you could always run ANOVAs as ANOVAs and regressions as regressions.

The fundamental sameness of these models is really important to wrap your head around, though, once you move beyond linear models into extensions of them, like logistic regression and  linear mixed models.

Although software allows you to run ANOVAs and linear regressions in different procedures, they don’t for these other types of models.

So being able to translate from one to the other in the simpler linear case is a boon to anyone who moves on to these more complicated models.

Back to how ANOVA and linear regression are the same model

In this article, we’re going to focus on notation, as that is the most fundamental part.

Here is the typical regression model with two predictors. Let’s assume one predictor is a treatment variable (treatment vs. control) and the other is some other kind of grouping variable, like whether subjects are bilingual or monolingual:

Yi = β0 + β1X1i + β2X2i + εi

where:

Yi = Response for individual i

β0 = Intercept—the mean of Y when all Xs=0

βj = Coefficient of Xj, the jth predictor

Xji  = jth predictor for individual i

εi = Residual for individual i

 

And here is the typical ANOVA model with two predictors:

Yijk = μ + αj + βk + εijk

 

Yijk = Response for individual i, who is in treatment j and group k

μ = the grand mean

εijk = Residual for individual i who is in treatment j and group k

α = effect of treatment

β = effect of group

Why the models are the same, despite differences in notation

In the ANOVA model, the predictors are often called factors or grouping variables, but just like in the regression case, we can call them the more generic “predictors.” For simplicity, there is no interaction, though that would be simple to add to both models.

Y, the response variable, is on the left hand side of both equations. The subscript i indicates that each case has an individual value of Y.

Likewise, both models have an error term, denoted by ε. This too has an i subscript because there is one value per case.

In between these on both models are three terms and these don’t look the same.  At all.

In the linear regression model, we use Xs to indicate the value of the predictor variables. This is super flexible — if X is numerical, we just plug in the numerical values of X. If X is categorical, we simply indicate which group someone was in with coded values of X1.

The simplest and most common would have a 1 for the treatment group and a 0 for the control group.

βj measures the effect on Y of the treatment effect.

The subscripts

Because ANOVA assumes that all the predictors are categorical (aka factors or grouping variables), those predictors have a limited number of values. These are values like: treatment vs. control group. Two values.

Because of the limited number of values, the ANOVA model uses subscripts to indicate if someone is in the treatment or control group. Subscript j would have values of 1 for the treatment group and 2 for the control.

α measures the effect on Y of the treatment effect.

Even though these X values aren’t written directly into the ANOVA model, they exist. Your software is actually creating those X values for you when you indicate that X is categorical. Usually ANOVA uses effect coding (1 and -1) for these X values rather than dummy coding (1 and 0).

One other term that looks different, but is essentially the same thing: the constant. In the regression model, this is called the intercept and denoted β0 and in the ANOVA model, this is called the grand mean and denoted μ.

There are, of course, other differences in the work flow of conducting regressions and ANOVAs, including how we code the Xs, how that coding changes the mean around which we’re measuring the effects of those Xs, and which output we tend to focus on.

So we have the same Y, the constant, the residuals. What’s different is just the way we expressed the Xs and their effects. Write it a slightly different way, and that ANOVA is really a linear regression.

 

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions

Comments

  1. PROF. Dr. dr.TITIK NURHIDAYAH,STP.MSi.dr.spesialis. Ph.DA.grc. Ph.DTech.Ph.DEngg.Ph.DStat.Ph.DMedc.Ph.DMedl.Ph.DMetalurgy.Ph.D177211 says

    The constant. In the regression model, this is called the intercept and denoted β0 and in the ANOVA model, this is called the grand mean and denoted μ.
    KONSTAN = THE Y AND X SAME AS LIKE ANSURANCY
    Predictors:

    Yijk = μ + αj + βk + εijk

    KONSTANTA = ONLY THE NUMBER ON THE AFTER X AFTER Y = C+ ….+ …… X + …. + X2 …+… = Y
    εi
    where:

    Yi = Response for individual i

    β0 = Intercept—the mean of Y when all Xs=0

    βj = Coefficient of Xj, the jth predictor

    Xji = jth predictor for individual i

    εi = Residual for individual i

    Yi = β0 + β1X1i + β2X2i + εi

  2. Jacob Wobbrock says

    Great post! I think there are maybe two typos here. “Subscript j would have values of 1 for the treatment group and 1 for the control.” I think it should be 0 for the control, no?

    And, “Even those these X…” seems like it should be “Even though these X…”

    • Karen Grace-Martin says

      Thanks Jacob! I fixed both.

      Actually, the subscript j values would themselves be 1 and 2 to denote groups 1 and 2. But the coding of the actual assumed X values would be 1 and -1. I added a sentence as well to make this clear.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.