# The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes

by

Just recently, a client got some feedback from a committee member that the Analysis of Covariance (ANCOVA) model she ran did not meet all the assumptions.

Specifically, the assumption in question is that the covariate to be uncorrelated to the independent variable.

This committee member is, in the strictest sense, correct. Analysis of Covariance was developed for experimental situations in which the independent variables are categorical and usually manipulated, not observed. The covariate–continuous and observed–is considered a nuisance variable. There are no research questions about how this covariate itself affects the dependent variable. The only hypothesis tests of interest are about the independent variables, controlling for the effects of the nuisance covariate.

A typical example would be to compare the math scores of students who were enrolled in three different learning programs at the end of the school year. The only research question would be about whether the math scores differed on average among the three programs. It would be useful to control for a covariate like IQ scores, but we are not really interested in the relationship between IQ and math scores.

But that’s really just one application of a linear model with one categorical and one continuous predictor. The research question of interest doesn’t have to be about the categorical predictor, and the covariate doesn’t have to be a nuisance variable.

A regression model with one continuous and one dummy variable is the same model (actually, you’d need two dummy variables to cover the three categories, but that’s another story).

The focus of that model may differ–perhaps the main research question is about the continuous predictor. But it’s the same model. And your software will run it the same way. YOU may focus on different parts of the output or select different options, but it’s the same model.

And that’s where the model names can get in the way of understanding the relationships among your variables. The model itself doesn’t care if the categorical variable was manipulated. It doesn’t care if the categorical independent variable and her continuous covariate are mildly correlated.

If those ANCOVA assumptions aren’t met, it does not change the analysis at all. It only affects how parameter estimates are interpreted and the kinds of conclusions you can draw.

In fact, those assumptions really aren’t about the model. They’re about the design. It’s the design that affects the conclusions. It doesn’t matter if a covariate is a nuisance variable or an interesting phenomenon to the model. That’s a design issue.

So what do you do instead of labeling models? Just call them a General Linear Model. It’s hard to think of regression and ANOVA as the same model because the equations look so different. But it turns out they aren’t.

If you look at the two models, first you may notice some similarities. Both are modeling Y, an outcome. Both have a “fixed” portion on the right with some parameters to estimate–this portion estimates the mean values of Y at the different values of X.

Both equations have a residual, which is the random part of the model–the variation in Y that is not affected by the Xs.

But wait a minute, Karen, are you nuts?–there are no Xs in the ANOVA model!

Actually, there are. They’re just implicit. Since the Xs are categorical, they have only a few values, to indicate which category a case is in. Those j and k subscripts? They’re really just indicating the values of X.

(And for the record, I think a couple Xs are a lot easier to keep track of than all those subscripts.  Ever have to calculate an ANOVA model by hand?  Just sayin’.)

So instead of trying to come up with the right label for a model, focus instead on understanding (and describing in your paper) the measurement scales of your variables, if and how much they’re related, and how that affects the conclusions.

Cornelia

Hi Karen! Thanks for clarifying.

I have one more question: What (if any) would be the difference between running ANCOVA and a dummy coded forced entry LM? Can I in fact do it both ways?

I am currently using a dataset with 4 categorial variables and 3 continuous ones. I would like to do hypothesis testing on my dataset. My trusted statistician told me yesterday that I should be doing F-Tests on that one (would be ANCOVA, right?) – so I am a little confused as I thought LR would be fine.

Thank you very much!

Karen

Hi Cornelia, I’m not sure what you mean by forced entry LM. But yes, you should get the same results from running an ANCOVA or a linear regression.

That said, some software has different defaults for things like which interactions get included if you run it through an ANCOVA procedure vs. a linear model procedure. But if the model is specified the same, you will get identical results. Either can give you F tests, but again, often linear regression *procedures* don’t print them all out.

Hemanth

Hi Karen, would you please suggest me a thesis topic for my masters which must be working on R and SPSS softwares.
It helps me to find a job after complition of my masters here in Europe

Amber Ward

Love the title! So I now know I want to use ANCOVA. Just struggling to do a power analysis using GPower. Do you know how I should work out “df numerator” and “number of groups” (does this refer to each time a measurement will be made)?

If it helps – the study design is an intervention and I want to compare treatment group and control group test scores after treatment both at 6 and 12 months, controlling for baseline test scores.

Thank you!

LIndsey

I have estimated a GLM and included one 3-level categorical variable and several continuous variables — one of which is hypothesized to interact with the categorical variable. I asked SPSS for the regression parameters. The continuous variable is significant as a predictor in the ANOVA results table but is not a significant predictor in the Regression parameter results.

Daniel

Ok so let’s say I knew there was significant positive correlation between the number of kids a couple has and their happiness. Could I then say that the
anova would be significant even if I made the catagorical 0,1-2, 3+ such That I may have lost informtation in my groupings
?

Karen

No, not necessarily. It may or may not.

Jennifer

Hi Karen,

Just wondering – does the covariate have to be continuous?

I am testing for group differences and want to use a MANOVA approach due to the DVs being meaningfully related.

Can I control for gender? (One of the groups has significantly more females than the other)

Karen

Hi Jennifer,

Many people mean a continuous variable when they say “covariate,” but not everyone. Yes, you can control for a categorical variable.

Joan Hendrikz

Hi Karen,
I really love the way you explain various stats concepts, using metaphors and analogies as well to aid the process of understanding. I have also been in a stats advisory role throughout my career and it is great to see people walk away happy and excited with a new understanding of something heretofore a mystery. I find metaphors and analogies a very powerful tool to this end. Cheers, Joan.

Karen

Thanks, Joan!

I agree. I call it “the click.” You can see when someone gets a concept that they were bewildered by. It’s especially rewarding when they believed they couldn’t learn it at all. 🙂

Karen

Tobias Musyoka

Karen you are a great statistician, and a good teacher too. I just wish that you taught me in class.

Congratulations.

Tobias

Karen

I do have to humbly admit, though, that you’re finding this helpful in part because it’s a review and it’s now in the context of your own research. You’ll really learn it now, and it is helpful to have good support at this stage (and that’s why I’m here), but I couldn’t teach so well if you didn’t have the background.

Karen

jaffer

need real example for regression and explaining the output of it

Karen

Hi Jaffer,

If you want a real example of regression output, in one of my very first webinars, I did just that. I literally went through the output of a model with both categorical and continuous predictors (and an interaction), and we went step-by-step through how to read the coefficients.

You can get a free download here: Interpreting Linear Regression Parameters: A Walk Through Output

Karen