If your graduate statistical training was anything like mine, you learned ANOVA in one class and Linear Regression in another. My professors would often say things like “ANOVA is just a special case of Regression,” but give vague answers when pressed.

It was not until I started consulting that I realized how closely related ANOVA and regression are. They’re not only related, they’re **the same thing**. Not a quarter and a nickel–different sides of the same coin.

So here is a very simple example that shows why. When someone showed me this, a light bulb went on, even though I already knew both ANOVA and mulitple linear regression quite well (and already had my masters in statistics!). I believe that understanding this little concept has been key to my understanding the general linear model as a whole–its applications are far reaching.

Use a model with a single categorical independent variable, employment category, with 3 categories: managerial, clerical, and custodial. The dependent variable is Previous Experience in months. (This data set is employment.sav, and it is one of the data sets that comes free with SPSS).

We can run this as either an ANOVA or a regression. In the ANOVA, the categorical variable is effect coded, which means that each category’s mean is compared to the grand mean. In the regression, the categorical variable is dummy coded**, which means that each category’s intercept is compared to the reference group’s intercept. Since the intercept is defined as the mean value when all other predictors = 0, and there are no other predictors, the three intercepts are just means.

In both analyses, Job Category has an F=69.192, with a p < .001. Highly significant.

In the ANOVA, we find the means of the three groups are:

Clerical: 85.039

Custodial: 298.111

Manager: 77.619

In the Regression, we find these coefficients:

Intercept: 77.619

Clerical: 7.420

Custodial: 220.492

The intercept is simply the mean of the reference group, Managers. The coefficients for the other two groups are the differences in the mean between the reference group and the other groups.

You’ll notice, for example, that the regression coefficient for Clerical, is the difference between the mean for Clerical, 85.039, and the Intercept, or mean for Manager (85.039 – 77.619 = 7.420). The same works for Custodial.

So an ANOVA reports each mean and a p-value that says at least two are significantly different. A regression reports only one mean(as an intercept), and the differences between that one and all other means, but the p-values evaluate those specific comparisons.

It’s all the same model, the same information, but presented in different ways. Understand what the model tells you in each way, and you are empowered.

I suggest you try this little exercise with any data set, then add in a second categorical variable, first without, then with an interaction. Go through the means and the regression coefficients and see how they add up.

**The dummy coding creates two 1/0 variables: Clerical = 1 for the clerical category, 0 otherwise; Custodial = 1 for the custodial category, 0 otherwise. Observations in the Managerial category have a 0 value on both of these variables, and this is known as the reference group.

{ 40 comments… read them below or add one }

{ 1 trackback }