The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes

Just recently, a client got some feedback from a committee member that the Analysis of Covariance (ANCOVA) model she ran did not meet all the Stage 2assumptions.

Specifically, the assumption in question is that the covariate has to be uncorrelated with the independent variable.

This committee member is, in the strictest sense of how analysis of covariance is used, correct.

And yet, they over-applied that assumption to an inappropriate situation.

ANCOVA for Experimental Data

Analysis of Covariance was developed for experimental situations and some of the assumptions and definitions of ANCOVA apply only to those experimental situations.

The key situation is the independent variables are categorical and manipulated, not observed.

The covariate–continuous and observed–is considered a nuisance variable. There are no research questions about how this covariate itself affects or relates to the dependent variable.

The only hypothesis tests of interest are about the independent variables, controlling for the effects of the nuisance covariate.

A typical example is a study to compare the math scores of students who were enrolled in three different learning programs at the end of the school year.

The key independent variable here is the learning program. Students need to be randomly assigned to one of the three programs.

The only research question is about whether the math scores differed on average among the three programs. It is useful to control for a covariate like IQ scores, but we are not really interested in the relationship between IQ and math scores.

So in this example, in order to conclude that the learning program affected math scores, it is indeed important that IQ scores, the covariate, is unrelated to which learning program the students were assigned to.

You could not make that causal interpretation if it turns out that the IQ scores were generally higher in one learning program than the others.

So this assumption of ANCOVA is very important in this specific type of study in which we are trying to make a specific type of inference.

ANCOVA for Other Data

But that’s really just one application of a linear model with one categorical and one continuous predictor. The research question of interest doesn’t have to be about the causal effect of the categorical predictor, and the covariate doesn’t have to be a nuisance variable.

A regression model with one continuous and one dummy-coded variable is the same model (actually, you’d need two dummy variables to cover the three categories, but that’s another story).

The focus of that model may differ–perhaps the main research question is about the continuous predictor.

But it’s the same mathematical model.

The software will run it the same way. YOU may focus on different parts of the output or select different options, but it’s the same model.

And that’s where the model names can get in the way of understanding the relationships among your variables. The model itself doesn’t care if the categorical variable was manipulated. It doesn’t care if the categorical independent variable and the continuous covariate are mildly correlated.

If those ANCOVA assumptions aren’t met, it does not change the analysis at all. It only affects how parameter estimates are interpreted and the kinds of conclusions you can draw.

In fact, those assumptions really aren’t about the model. They’re about the design. It’s the design that affects the conclusions. It doesn’t matter if a covariate is a nuisance variable or an interesting phenomenon to the model. That’s a design issue.

The General Linear Model

So what do you do instead of labeling models? Just call them a General Linear Model. It’s hard to think of regression and ANOVA as the same model because the equations look so different. But it turns out they aren’t.

Regression and ANOVA model equations

If you look at the two models, first you may notice some similarities.

  • Both are modeling Y, an outcome.
  • Both have a “fixed” portion on the right with some parameters to estimate–this portion estimates the mean values of Y at the different values of X.
  • Both equations have a residual, which is the random part of the model. It is the variation in Y that is not affected by the Xs.

But wait a minute, Karen, are you nuts?–there are no Xs in the ANOVA model!

Actually, there are. They’re just implicit.

Since the Xs are categorical, they have only a few values, to indicate which category a case is in. Those j and k subscripts? They’re really just indicating the values of X.

(And for the record, I think a couple Xs are a lot easier to keep track of than all those subscripts.  Ever have to calculate an ANOVA model by hand?  Just sayin’.)

So instead of trying to come up with the right label for a model, focus instead on understanding (and describing in your paper) the measurement scales of your variables, if and how much they’re related, and how that affects the conclusions.

In my client’s situation, it was not a problem that the continuous and the categorical variables were mildly correlated. The data were not experimental and she was not trying to draw causal conclusions about only the categorical predictor.

So she had to call this ANCOVA model a multiple regression.

 

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Reader Interactions

Comments

  1. S Banerjee says

    What is the role of the coefficient of determination. . In our experiment, post test score is DV, the group of students is subdivided into control unit and experimental units and one test scores form the coverage data.. The covariate is not linearly related to the dependent variable. We are facing a situation where the regression line slopes are significantly different , one of the regression lines corresponding to treatment level is parallel to the covariate axis with coefficient of determination close to zero ..What should be our course of action here. Can we use the difference in slopes to some advantage?

    • Karen Grace-Martin says

      Sure. You’ll want to use an interaction term to reflect the difference in slopes and it sounds like you may need something like a quadratic term to deal with the non-linear relationship between the dependent variable and the predictor.

      (And I fixed your typo as per your request)

  2. Franziska says

    HI Karen,

    in my analysis ANOVA (or better: its post tests) and Regression differ in significance. I only have dummy variables of one treatment (for the regression I insert four of the five in the estimation). I get the exact same effect sized, thus mean difference in post hoc test equals beta of the regression, BUT the coefficient is only significant for the regression, not in the post hoc test. Can you please hel me figure out why?

    Thanks and regards,
    Franziska

  3. Cornelia says

    Hi Karen! Thanks for clarifying.

    I have one more question: What (if any) would be the difference between running ANCOVA and a dummy coded forced entry LM? Can I in fact do it both ways?

    I am currently using a dataset with 4 categorial variables and 3 continuous ones. I would like to do hypothesis testing on my dataset. My trusted statistician told me yesterday that I should be doing F-Tests on that one (would be ANCOVA, right?) – so I am a little confused as I thought LR would be fine.

    Thank you very much!

    • Karen says

      Hi Cornelia, I’m not sure what you mean by forced entry LM. But yes, you should get the same results from running an ANCOVA or a linear regression.

      That said, some software has different defaults for things like which interactions get included if you run it through an ANCOVA procedure vs. a linear model procedure. But if the model is specified the same, you will get identical results. Either can give you F tests, but again, often linear regression *procedures* don’t print them all out.

  4. Hemanth says

    Hi Karen, would you please suggest me a thesis topic for my masters which must be working on R and SPSS softwares.
    It helps me to find a job after complition of my masters here in Europe

  5. Amber Ward says

    Love the title! So I now know I want to use ANCOVA. Just struggling to do a power analysis using GPower. Do you know how I should work out “df numerator” and “number of groups” (does this refer to each time a measurement will be made)?

    If it helps – the study design is an intervention and I want to compare treatment group and control group test scores after treatment both at 6 and 12 months, controlling for baseline test scores.

    Thank you!

  6. LIndsey says

    I have estimated a GLM and included one 3-level categorical variable and several continuous variables — one of which is hypothesized to interact with the categorical variable. I asked SPSS for the regression parameters. The continuous variable is significant as a predictor in the ANOVA results table but is not a significant predictor in the Regression parameter results.

  7. Daniel says

    Ok so let’s say I knew there was significant positive correlation between the number of kids a couple has and their happiness. Could I then say that the
    anova would be significant even if I made the catagorical 0,1-2, 3+ such That I may have lost informtation in my groupings
    ?

  8. Jennifer says

    Hi Karen,

    Just wondering – does the covariate have to be continuous?

    I am testing for group differences and want to use a MANOVA approach due to the DVs being meaningfully related.

    Can I control for gender? (One of the groups has significantly more females than the other)

    • Karen says

      Hi Jennifer,

      Many people mean a continuous variable when they say “covariate,” but not everyone. Yes, you can control for a categorical variable.

  9. Joan Hendrikz says

    Hi Karen,
    I really love the way you explain various stats concepts, using metaphors and analogies as well to aid the process of understanding. I have also been in a stats advisory role throughout my career and it is great to see people walk away happy and excited with a new understanding of something heretofore a mystery. I find metaphors and analogies a very powerful tool to this end. Cheers, Joan.

    • Karen says

      Thanks, Joan!

      I agree. I call it “the click.” You can see when someone gets a concept that they were bewildered by. It’s especially rewarding when they believed they couldn’t learn it at all. 🙂

      Karen

  10. Tobias Musyoka says

    Karen you are a great statistician, and a good teacher too. I just wish that you taught me in class.

    Congratulations.

    Tobias

    • Karen says

      Aw, shucks. Thanks, Tobias. I’m glad you find the site helpful.

      I do have to humbly admit, though, that you’re finding this helpful in part because it’s a review and it’s now in the context of your own research. You’ll really learn it now, and it is helpful to have good support at this stage (and that’s why I’m here), but I couldn’t teach so well if you didn’t have the background.

      Karen


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.