When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why.
And I couldn’t figure it out.
The model notation is different.
The output looks different.
The vocabulary is different.
The focus of what we’re testing is completely different. How can they be the same model?
Why you need to know this
Before I answer that, let me mention why it’s so important to see how they’re the same. It’s true that if you never did any data analysis beyond linear models, you could always run ANOVAs as ANOVAs and regressions as regressions.
The fundamental sameness of these models is really important to wrap your head around, though, once you move beyond linear models into extensions of them, like logistic regression and linear mixed models.
Although software allows you to run ANOVAs and linear regressions in different procedures, they don’t for these other types of models.
So being able to translate from one to the other in the simpler linear case is a boon to anyone who moves on to these more complicated models.
Back to how ANOVA and linear regression are the same model
In this article, we’re going to focus on notation, as that is the most fundamental part.
Here is the typical regression model with two predictors. Let’s assume one predictor is a treatment variable (treatment vs. control) and the other is some other kind of grouping variable, like whether subjects are bilingual or monolingual:
Yi = β0 + β1X1i + β2X2i + εi
Yi = Response for individual i
β0 = Intercept—the mean of Y when all Xs=0
βj = Coefficient of Xj, the jth predictor
Xji = jth predictor for individual i
εi = Residual for individual i
And here is the typical ANOVA model with two predictors:
Yijk = μ + αj + βk + εijk
Yijk = Response for individual i, who is in treatment j and group k
μ = the grand mean
εijk = Residual for individual i who is in treatment j and group k
α = effect of treatment
β = effect of group
In the ANOVA model, the predictors are often called factors or grouping variables, but just like in the regression case, we can call them the more generic “predictors.” For simplicity, there is no interaction, though that would be simple to add to both models.
Y, the response variable, is on the left hand side of both equations. The subscript i indicates that each case has an individual value of Y.
Likewise, both models have an error term, denoted by ε. This too has an i subscript because there is one value per case.
In between these on both models are three terms and these don’t look the same. At all.
In the linear regression model, we use Xs to indicate the value of the predictor variables. This is super flexible — if X is numerical, we just plug in the numerical values of X. If X is categorical, we simply indicate which group someone was in with coded values of X1.
The simplest and most common would have a 1 for the treatment group and a 0 for the control group.
βj measures the effect on Y of the treatment effect.
Because ANOVA assumes that all the predictors are categorical (aka factors or grouping variables), those predictors have a limited number of values. These are values like: treatment vs. control group. Two values.
Because of the limited number of values, the ANOVA model uses subscripts to indicate if someone is in the treatment or control group. Subscript j would have values of 1 for the treatment group and 2 for the control.
α measures the effect on Y of the treatment effect.
Even though these X values aren’t written directly into the ANOVA model, they exist. Your software is actually creating those X values for you when you indicate that X is categorical. Usually ANOVA uses effect coding (1 and -1) for these X values rather than dummy coding (1 and 0).
One other term that looks different, but is essentially the same thing: the constant. In the regression model, this is called the intercept and denoted β0 and in the ANOVA model, this is called the grand mean and denoted μ.
There are, of course, other differences in how we generally analyze regressions and ANOVAs, including how we code the Xs, how that coding changes the mean around which we’re measuring the effects of those Xs, and which output we tend to focus on. We’ll explore some of those in later articles.