Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means.
The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to control for an observed, continuous variable–the covariate.
In this context, the covariate is always continuous, always a control variable, and always observed (i.e. observations weren’t randomly assigned its values, you just measured what was there).
A simple example is a study looking at the effect of a training program on math ability. The independent variable is the training condition–whether participants received the math training or some irrelevant training. The dependent variable is their math score after receiving the training.
But even within each training group, there is going to be a lot of variation in people’s math ability. If you don’t control for that, it is just unexplained variation. Having a lot of unexplained variation makes it pretty tough to see the actual effect of the training–it gets lost in all the noise.
So if you use pretest math score as a covariate, you can control for where people started out. So you get a clearer picture of whether people do well on the final test due to the training or due to the math ability they had coming in.
Okay, great. Where’s the confusion?
Covariates as Continuous Predictor Variables
The confusion is that, really, the model doesn’t care that the covariate is something you don’t have a hypothesis about. Something you’re just controlling for. Mathematically, it’s the same model, and you run it the same way.
And so people who understand this often use the term covariate to mean ANY continuous predictor variable in your model, whether it’s just a control variable or the most important predictor in your hypothesis. And I’m guilty as charged. It’s a lot easier to say covariate than continuous predictor variable.
But SPSS does this too. You can run a linear regression model with only continuous predictor variables in SPSS GLM by putting them in the Covariate box. All the Covariate box does is define the predictor variable as continuous.
(SAS’s PROC GLM does the same thing, but it doesn’t specifically label them as Covariates. In PROC GLM, the assumption is all predictor variables are continuous. If they’re categorical, it’s up to you, the user, to specify them as such in the CLASS statement.)
Covariates as Control Variables
But the other part of the original ANCOVA definition is that a covariate is a control variable.
So sometimes people use the term Covariate to mean any control variable. Because really, you can covary out the effects of a categorical control variable just as easily.
In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training.
You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.
Once again, there isn’t really a good term for categorical control variable, so people sometimes refer to it as the covariate.
So what is a covariate then?
It’s hard to say. There is no disputing the first definition, so it’s clear there.
I prefer to just be careful, in setting up hypotheses, running analyses, and in writing up results, to be clear about which variables I’m hypothesizing about and which ones I’m controlling for, and whether each variable is continuous or categorical.
The names that the variables, or the models, end up having, aren’t important as long as you’re clear about what you’re testing and how.