Just recently, a client got some feedback from a committee member that the Analysis of Covariance (ANCOVA) model she ran did not meet all the assumptions.
Specifically, the assumption in question is that the covariate to be uncorrelated to the independent variable.
This committee member is, in the strictest sense, correct. Analysis of Covariance was developed for experimental situations in which the independent variables are categorical and usually manipulated, not observed. The covariate–continuous and observed–is considered a nuisance variable. There are no research questions about how this covariate itself affects the dependent variable. The only hypothesis tests of interest are about the independent variables, controlling for the effects of the nuisance covariate.
A typical example would be to compare the math scores of students who were enrolled in three different learning programs at the end of the school year. The only research question would be about whether the math scores differed on average among the three programs. It would be useful to control for a covariate like IQ scores, but we are not really interested in the relationship between IQ and math scores.
But that’s really just one application of a linear model with one categorical and one continuous predictor. The research question of interest doesn’t have to be about the categorical predictor, and the covariate doesn’t have to be a nuisance variable.
A regression model with one continuous and one dummy variable is the same model (actually, you’d need two dummy variables to cover the three categories, but that’s another story).
The focus of that model may differ–perhaps the main research question is about the continuous predictor. But it’s the same model. And your software will run it the same way. YOU may focus on different parts of the output or select different options, but it’s the same model.
And that’s where the model names can get in the way of understanding the relationships among your variables. The model itself doesn’t care if the categorical variable was manipulated. It doesn’t care if the categorical independent variable and her continuous covariate are mildly correlated.
If those ANCOVA assumptions aren’t met, it does not change the analysis at all. It only affects how parameter estimates are interpreted and the kinds of conclusions you can draw.
In fact, those assumptions really aren’t about the model. They’re about the design. It’s the design that affects the conclusions. It doesn’t matter if a covariate is a nuisance variable or an interesting phenomenon to the model. That’s a design issue.
So what do you do instead of labeling models? Just call them a General Linear Model. It’s hard to think of regression and ANOVA as the same model because the equations look so different. But it turns out they aren’t.
If you look at the two models, first you may notice some similarities. Both are modeling Y, an outcome. Both have a “fixed” portion on the right with some parameters to estimate–this portion estimates the mean values of Y at the different values of X.
Both equations have a residual, which is the random part of the model–the variation in Y that is not affected by the Xs.
But wait a minute, Karen, are you nuts?–there are no Xs in the ANOVA model!
Actually, there are. They’re just implicit. Since the Xs are categorical, they have only a few values, to indicate which category a case is in. Those j and k subscripts? They’re really just indicating the values of X.
(And for the record, I think a couple Xs are a lot easier to keep track of than all those subscripts. Ever have to calculate an ANOVA model by hand? Just sayin’.)
So instead of trying to come up with the right label for a model, focus instead on understanding (and describing in your paper) the measurement scales of your variables, if and how much they’re related, and how that affects the conclusions.