When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained *why*.

And I couldn’t figure it out.

The model notation is different.

The output looks different.

The vocabulary is different.

The focus of what we’re testing is completely different. How can they be the same model?

**Why you need to know this**

Before I answer that, let me mention *why* it’s so important to see how they’re the same. It’s true that if you never did any data analysis beyond linear models, you could always run ANOVAs as ANOVAs and regressions as regressions.

The fundamental sameness of these models is really important to wrap your head around, though, once you move beyond linear models into extensions of them, like logistic regression and linear mixed models.

Although software allows you to run ANOVAs and linear regressions in different procedures, they don’t for these other types of models.

So being able to translate from one to the other in the simpler linear case is a boon to anyone who moves on to these more complicated models.

**Back to how ANOVA and linear regression are the same model**

In this article, we’re going to focus on notation, as that is the most fundamental part.

Here is the typical regression model with two predictors. Let’s assume one predictor is a treatment variable (treatment vs. control) and the other is some other kind of grouping variable, like whether subjects are bilingual or monolingual:

### Y_{i} = β_{0} + β_{1}X_{1i} + β_{2}X_{2i} + ε_{i}

where:

Y_{i} = Response for individual i

β_{0} = Intercept—the mean of Y when all Xs=0

β_{j} = Coefficient of X_{j, }the jth predictor

X_{ji} = jth predictor for individual i

ε_{i} = Residual for individual i

And here is the typical ANOVA model with two predictors:

### Y_{ijk} = μ + α_{j} + β_{k} + ε_{ijk}

Y_{ijk} = Response for individual i, who is in treatment j and group k

μ = the grand mean

ε_{ijk} = Residual for individual i who is in treatment j and group k

α = effect of treatment

β = effect of group

In the ANOVA model, the predictors are often called factors or grouping variables, but just like in the regression case, we *can* call them the more generic “predictors.” For simplicity, there is no interaction, though that would be simple to add to both models.

Y, the response variable, is on the left hand side of both equations. The subscript i indicates that each case has an individual value of Y.

Likewise, both models have an error term, denoted by ε. This too has an i subscript because there is one value per case.

In between these on both models are three terms and these don’t look the same. At all.

In the linear regression model, we use Xs to indicate the value of the predictor variables. This is super flexible — if X is numerical, we just plug in the numerical values of X. If X is categorical, we simply indicate which group someone was in with coded values of X1.

The simplest and most common would have a 1 for the treatment group and a 0 for the control group.

β_{j} measures the effect on Y of the treatment effect.

Because ANOVA assumes that all the predictors are categorical (aka factors or grouping variables), those predictors have a limited number of values. These are values like: treatment vs. control group. Two values.

Because of the limited number of values, the ANOVA model uses subscripts to indicate if someone is in the treatment or control group. Subscript j would have values of 1 for the treatment group and 2 for the control.

α_{ }measures the effect on Y of the treatment effect.

Even though these X values aren’t written directly into the ANOVA model, they exist. Your software is actually creating those X values for you when you indicate that X is categorical. Usually ANOVA uses effect coding (1 and -1) for these X values rather than dummy coding (1 and 0).

One other term that looks different, but is essentially the same thing: the constant. In the regression model, this is called the intercept and denoted β_{0 }and in the ANOVA model, this is called the grand mean and denoted μ.

There are, of course, other differences in how we generally analyze regressions and ANOVAs, including how we code the Xs, how that coding changes the mean around which we’re measuring the effects of those Xs, and which output we tend to focus on. We’ll explore some of those in later articles.

Jacob Wobbrock says

Great post! I think there are maybe two typos here. “Subscript j would have values of 1 for the treatment group and 1 for the control.” I think it should be 0 for the control, no?

And, “Even those these X…” seems like it should be “Even though these X…”

Karen Grace-Martin says

Thanks Jacob! I fixed both.

Actually, the subscript j values would themselves be 1 and 2 to denote groups 1 and 2. But the coding of the actual assumed X values would be 1 and -1. I added a sentence as well to make this clear.