Confusing Statistical Terms #5: Covariate

by Karen Grace-Martin

Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts.

Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways.  And these different ways of using the term have BIG implications for what your model means.

The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to control for an observed, continuous variable–the covariate.

In this context, the covariate is always continuous, always a control variable, and always observed (i.e. observations weren’t randomly assigned it’s values, you just measured what was there).

A simple example is a study looking at the effect of a training program on math ability.  The independent variable is the training condition–whether participants received the math training or some irrelevant training.  The dependent variable is their math score after receiving the training.

But even within each training group, there is going to be a lot of variation in people’s math ability.  If you don’t control for that, it is just unexplained variation.  Having a lot of unexplained variation makes it pretty tough to see the actual effect of the training–it gets lost in all the noise.

So if you use pretest math score as a covariate, you can control for where people started out.  So you get a clearer picture of whether people do well on the final test due to the training or due to the math ability they had coming in.

Okay, great.  Where’s the confusion?

Covariates as Continuous Predictor Variables

The confusion is that, really, the model doesn’t care that the covariate is something you don’t have a hypothesis about.  Something you’re just controlling for. Mathematically, it’s the same model, and you run it the same way.

And so people who understand this often use the term covariate to mean ANY continuous predictor variable in your model, whether it’s just a control variable or the most important predictor in your hypothesis.  And I’m guilty as charged.  It’s a lot easier to say covariate than continuous predictor variable.

But SPSS does this too.  You can run a linear regression model with only continuous predictor variables in SPSS GLM by putting them in the Covariate box.  All the Covariate box does is define the predictor variable as continuous.

(SAS’s PROC GLM does the same thing, but it doesn’t specifically label them as Covariates.  In PROC GLM, the assumption is all predictor variables are continuous.  If they’re categorical, it’s up to you, the user, to specify them as such in the CLASS statement.)

Covariates as Control Variables

But the other part of the original ANCOVA definition is that a covariate is a control variable.

So sometimes people use the term Covariate to mean any control variable.  Because really, you can covary out the effects of a categorical control variable just as easily.

In our little math training example, you may be unable to pretest the participants.  Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?”  It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training.

You’d use it in the model in the exact same way you would the pretest score.  You’d just have to define it as categorical.

Once again, there isn’t really a good term for categorical control variable, so people sometimes refer to it as the covariate.

So what is a covariate then?

It’s hard to say.  There is no disputing the first definition, so it’s clear there.

I prefer to just be careful, in setting up hypotheses, running analyses, and in writing up results, to be clear about which variables I’m hypothesizing about and which ones I’m controlling for, and whether each variable is continuous or categorical.

The names that the variables, or the models, end up having, aren’t important as long as you’re clear about what you’re testing and how.

———————————————————————————-

Read other posts on Confusing Statistical Terms and/or check out our workshops and other resources related to ANOVA and ANCOVA.

Bookmark and Share

Send to Kindle

{ 30 comments… read them below or add one }

Akinboboye joseph April 10, 2012 at 7:57 pm

I really enjoy this write-up. Kindly send me details on how to use spss. Thanks

Reply

Karen April 11, 2012 at 9:33 am

Hi Akinboboye,

I would suggest starting with the SPSS category link at the right.

If you need more help at the beginning level, I’d be happy to send you my book (I have a few extra copies) for the cost of shipping. Please email me directly.

If you want more help with the concepts discussed above, you really want the Running Regressions and ANCOVAs in SPSS GLM workshop. It walks you through the univariate GLM procedure step-by-step and shows where it’s the same and where it’s different from the regression procedure. You can get to that here: http://www.theanalysisinstitute.com/workshops/SPSS-GLM/index.html

It’s not running right now, but you can use our contact form to get access to it as a home study workshop.

Best,
Karen

Reply

Ben May 2, 2012 at 1:05 am

Very clear. Found the answer I was looking for. Thanks much! I’ll pass on word of your site.

Reply

Karen May 3, 2012 at 2:39 pm

Thanks, Ben! Glad it was helpful.

Karen

Reply

Nur Barizah July 19, 2012 at 9:26 am

Thanks a lot Caren. Your notes were very helpful. I have been looking for the answers in tens of books for several months. Thank God, many of my uncertainties on GLM command are answered today in your site. FYI, in my area of study (accountancy), GLM command is almost nonexistent in literatures.

Reply

Karen July 19, 2012 at 10:43 am

Hi Nur,

I’m so glad.

Most of the time in the literature it will be called ANOVA, ANCOVA or linear regression. But they’re all the same model and all can be run in GLM.

Karen

Reply

FASASI R.A August 15, 2012 at 10:40 am

Pls what is the relationship between covariate partial eta squared in ancova result and partial eta squared of treatment and moderator variables. what is the implication of that of covariate which can be the pretest being higher than that of main treatment or moderator variable

Reply

Karen September 11, 2012 at 4:54 pm

Hi Fasasi,

I don’t know–I’d need more information on the model. For example, is the covariate different from the moderator? Which interactions are being included?

Thanks,
Karen

Reply

Emily October 6, 2012 at 12:12 am

Thanks Karen! I’m so happy this website exists.
I found this page because I am stuck on something related but at a way lower level (I am no statistician). I’m running a repeated measures ANOVA in SPSS, using GLM. I need to control for a between-subjects categorical variable that might be adding noise to the data and washing out any effects of my factors. I can’t figure out if the right way to do this is to put it in as a between-subjects factor, or pretend that is a continuous variable and put it in the covariate box. Can you help?

Reply

Karen October 8, 2012 at 9:02 am

Hi Emily,

Yes. In fact, there is already an article here on that exact topic. It’s the same in all SPSS glm procedures, whether you’re using univarate, repeated measures, etc. http://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/

Karen

Reply

Debra February 11, 2013 at 7:25 am

Hi Karen

I’m still a little confused on the same issue as Emily, despite reading the article you suggested. Although the suggested article seems to clearly spell out that any true categorical variable, including dichotomous variables, should be included in an ANOVA model as a fixed factor rather than a covariate, the article above states

“In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training…You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.”

It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor.

Sorry if this seems obvious, perhaps I’m getting caught up in the terminology too! Any clarification would be greatly appreciated, I’ve found your explanations to be more helpful than most!

Reply

Karen February 13, 2013 at 3:04 pm

Hi Debra,

SPSS’s definitition of “Covariate” is “continuous predictor variable.” It’s definition of “fixed factor” is categorical predictor variable. (There’s actually more to this in comparing fixed and random factors, but that’s a tangent here).

“It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor. ”

I don’t mean define it the same way, I mean use it as a control variable in the same way. I’m trying to separate out the use of the variable in the model (as something to control for vs. something about whose effect you have a hypothesis) from the way it was measured and therefore needs to be defined (categorical vs. continuous).

So yes, it goes into Fixed Factors because it’s categorical.

And don’t apologize for getting confused with terminology–that’s my whole point. The inprecision of the terminology is what makes it so confusing! :)
Karen

Reply

Ayesha October 31, 2013 at 5:29 pm

Hello Karen

I was going through the discussion and had same confusion. my study evaluates effectiveness of a school based program on preschoolers behavior problems. my intervention and control groups differ on strength of students in class measured in categories and fathers education also measured in categories. I need to see if ANCOVA results remain significant after controlling for these baseline differences. the covariate option in SPSS should be a continues measure. So how should I do the analysis with categorical variable. Please answer soon.

Alexa October 9, 2012 at 2:37 pm

What is the difference between a confound and a covariate in simple terms?

Reply

Karen October 23, 2012 at 3:36 pm

Alexa, that’s a really great question. A confound is a variable that is perfectly (or so near perfectly that you can’t distinguish) associated with another variable that you can’t tell their effects apart.

In most areas of the US, for example, neighborhood and school attended overlap so much because most kids from a neighborhood all go to the same school. So you couldn’t separate out the school effects on say, grade 3 test scores, from the neighborhood effects.

A covariate is a variable that affects the DV in addition to the IV. It doesn’t have to be correlated with the independent variable. If not, it may just explain some of the otherwise unexplained variation in the DV.

Karen

Reply

norhawa March 21, 2013 at 9:58 pm

Hi Karen,

Thanks for the information. The way you write it very clear and easy to understand. Thanks~

Reply

Karen April 2, 2013 at 5:43 pm

Thanks!

Reply

zahir March 13, 2014 at 9:15 pm

So can we say based on your answer that every confounding variable is a covariate but not every covariate is a confounding variable?

Zahir

Reply

Karen April 4, 2014 at 9:55 am

Depends on how you’re using them. :)

Reply

Joanne April 24, 2013 at 3:05 pm

Hi Karen,

I have a follow-up question please. I am looking at memory performance in young and older adults under two conditions: when a negative or a positive stereotype is activated. I’m using a mixed model to compare performance across time (before/after the intervention) across the two age groups and conditions.

I obtain a significant difference between the two age groups on levels of verbal IQ (NART scores) which is continuous, and hence a covariate (that exerts a significant effect).

The paper that I am basing my study on also obtains a difference between age groups over verbal IQ scores. They do not obtain a significant difference within age groups but between conditions, however, and so have not included it as a covariate. This seems wrong to me. Surely if a significant difference over a background variable occurs on one of your IVs you should include the covariate in the model, regardless of whether there’s a difference between groups on the second IV?

If you could clear this up for me I’d really appreciate it, as I am confused!

Thanks,

Joanne

Reply

Karen April 29, 2013 at 6:45 pm

Hi Joanne,

I am missing something. Does verbal IQ relate to the DV? (It seems it would but you don’t mention that). So in this paper, the two stereotype condition groups have different verbal IQ scores, but age groups didn’t? It really comes down to whether the potential covariate is related to the DV.

Reply

Anika Kunz May 31, 2013 at 5:01 am

Hi,

I’m running an ANOVA with repeated measures. As I can see from the correllation matrix there are significant correllations between my possible cofounding variables and my dependent variable but here is my problem: if the possible co-variate is correllating with the dep. variable at time2 but not at time1, do I have to include it as a “normal” co-variate when I perform the GLM?

Thanks for any help.
Anika

Reply

Karen June 6, 2013 at 5:30 pm

It would be a good idea to include a covariate*time interaction. That will allow the effect of the covariate to be different at time 1 and time 2.

Reply

Kathy June 16, 2013 at 2:06 pm

Easy to understand definition of Covariate for ANCOVA. Thank you!

Reply

Alyssa July 11, 2013 at 7:50 pm

Hi Karen,
Thanks for this article! I just want to further clarify some of the discussion on dichotomous covariates/fixed factors. Specifically, I am running a chi squared with one dichotomous outcome and one dichotomous predictor- too see how well group membership of the two-level predictor discriminates groups membership of the two level outcome. Now, I want to ensure that gender (I think my covariate/fixed factor) does not modify the relationship of my predictor’s ability to discriminate membership of my outcome variable. To answer this question I plan to run the model where both genders are combined and separately for each gender. The answer is that discriminate ability of my predictor does depend on gender to some extent.

So my question is – is gender a fixed factor here? is there a better word for it? Random factor?

Thank you so much!

Alyssa

Reply

Karen July 15, 2013 at 3:44 pm

Hi Alyssa,

Within the context of SPSS GLM, Gender is a fixed factor. Don’t make it random–that’s a whole other thing!

But if you’re doing a chi-square, Fixed Factor and covariate aren’t really issues. Just add it in as another variable

Reply

rk October 15, 2013 at 4:29 pm

Hi,
I am new to this site and this has always confused me. How do you “control” for a variable? Could you please explain, how “controlling “works? One of the ways to control a variable might be just taking random samples, i.e., if you want to control for age, then take a rs from all age groups. Another example would be like a “control group” (placebo in a drug experiment).
Am I correct in the above examples? Also, I assume there are other standard techniques, could you please clarify how they work?

Reply

Karen October 16, 2013 at 9:56 am

Hi RK,

That’s a big question. Or rather, a small question with a big answer. I will see if I can write a post (or two or 10) explaining it.

Reply

rk October 17, 2013 at 10:11 am

Thanks Karen! Looking forward to it!!

Reply

Karen November 8, 2013 at 11:36 am

Hi Ayesha,

If a control variable is simply a categorical variable, put it into “Fixed Factors” instead of Covariate in Univariate GLM. By default, SPSS will also add in an interaction term, but you can take that out in the Design dialog box.

fyi, if it’s helpful, we have a workshop available on demand that goes through all these details of SPSS GLM: http://theanalysisinstitute.com/spss-glm-ondemand-workshop/

Reply

Leave a Comment

{ 1 trackback }

Previous post:

Next post: