Confusing Statistical Terms #5: Covariate

by Karen Grace-Martin 73 Comments

Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts.

Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means.

The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to adjust for the effect of an observed, continuous variable–the covariate.

In this context, the covariate is always continuous, never the key independent variable, and always observed (i.e. observations weren’t randomly assigned its values, you just measured what was there).

A simple example is a study looking at the effect of a training program on math ability. The independent variable is the training condition–whether participants received the math training or some irrelevant training. The dependent variable is their math score after receiving the training.

But even within each training group, there is going to be a lot of variation in people’s math ability. If you don’t adjust for that, it is just unexplained variation. Having a lot of unexplained variation makes it pretty tough to see the actual effect of the training–it gets lost in all the noise.

So if you use pretest math score as a covariate, you can adjust for where people started out. So you get a clearer picture of whether people do well on the final test due to the training or due to the math ability they had coming in.

Okay, great. Where’s the confusion?

Covariates as Continuous Predictor Variables

The confusion is that, really, the model doesn’t care that the covariate is something you don’t have a hypothesis about. Something you’re just adjusting for. Mathematically, it’s the same model, and you run it the same way.

And so people who understand this often use the term covariate to mean ANY continuous predictor variable in your model, whether it’s just a control variable or the most important predictor in your hypothesis. And I’m guilty as charged. It’s a lot easier to say covariate than continuous predictor variable.

But SPSS does this too. You can run a linear regression model with only continuous predictor variables in SPSS GLM by putting them in the Covariate box. All the Covariate box does is define the predictor variable as continuous.

(SAS’s PROC GLM does the same thing, but it doesn’t specifically label them as Covariates. In PROC GLM, the assumption is all predictor variables are continuous. If they’re categorical, it’s up to you, the user, to specify them as such in the CLASS statement.)

Covariates as Control Variables

But the other part of the original ANCOVA definition is that a covariate is a control variable.

So sometimes people use the term Covariate to mean any control variable. Because really, you can covary out the effects of a categorical control variable just as easily.

In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training.

You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.

Once again, there isn’t really a good term for categorical control variable, so people sometimes refer to it as the covariate.

So what is a covariate then?

It’s hard to say. There is no disputing the first definition, so it’s clear there.

I prefer to just be careful, in setting up hypotheses, running analyses, and in writing up results, to be clear about which variables I’m hypothesizing about and which ones I’m controlling for, and whether each variable is continuous or categorical.

The names that the variables, or the models, end up having, aren’t important as long as you’re clear about what you’re testing and how.

———————————————————————————-

Read other posts on Confusing Statistical Terms and/or check out our other resources related to ANOVA and ANCOVA.

The Four Kinds of Extra-Confusing Statistical Terms

Applied statistics has a terminology problem. Four kinds of them, actually.

Comments

Precious says

May 23, 2022 at 10:53 am

Hello everyone,

In my study, I have plant (plant population) density as factor, is it necessary to include the same plant density (number of plants at harvest) as covariate? I have been advised to do so and I urgently need your help on this.

Thanks

Reply
Lisa says

May 14, 2020 at 3:36 am

This really helped me a lot – thanks!

Reply
Jade says

September 14, 2019 at 2:53 am

Hi
Thank you for your article here. I’ve read a paper which said “including gender as a covariate”, but I learned the covariate must be continuous in ANCOVA, so it really baffled me until I found your explanation here!

But there is still a question, if the categorical covariate has an interaction with the IV, how can i report it? Report the main effect of IV hierarchically? Or if there is an interaction ,the variate can not be taken as a covariate?

Reply
- Brad says
  
  November 29, 2021 at 9:20 pm
  
  Your definition of a covariate in ANCOVA is completely at odds with that given in Whitlock and Schluter (2020). The Analysis of Biological Data, 3rd ed. They specify that the covariate is categorical, and the main effect (factor/explanatory variable) is numerical (pretty much the opposite of what you state). And by the way, a numerical variable does not have to be “continuous”; it may be discrete (e.g, counts, which are integers). So you’ve even confused the term continuous with numerical.
  
  Reply
  - Karen Grace-Martin says
    
    November 30, 2021 at 3:39 pm
    
    Hi Brad,
    
    Very interesting. I don’t have that book. Yes, it’s absolutely true that many authors use “covariate” to means different things. But I’ve never, ever, heard of a factor in an ANCOVA being numerical. Look in any other book on ANOVA. That is so backward from usual usage I wonder if it’s a typo. I’d have to see the exact wording to really comment. That said, Factor is also another confusing term, in that it means something entirely different in the context of Factor Analysis, where it is continuous.
    
    And you’re absolutely correct that numerical variables don’t have to be continuous. The difference is very important for dependent/outcome/response variables, since it affects the type of model you use. It’s not a big difference for predictors, since if you fit a line to a discrete predictor, you’re technically treating it as continuous.
    
    Reply
Geoff says

April 14, 2019 at 7:22 pm

Hi Karen,

Brilliant post, that really unlogged a lot for me.

I was just wondering if you had information on what the actual math looked like for adding a covariate into a regression? Just as a basic example. i.e. How do you ‘adjust for the effect of said covariate’?

Sorry if I’m getting my terms mixed up.

Reply
Jan Stautemas says

January 31, 2019 at 6:50 am

I am confused by your example?
Why do you use pre math scores as a covariate and not as timepoint 1 in a repeated measures design?
Within variables: time (pre post), dependent variable (training 1 or two), no covariate.
Thank you in advance
Jan

Reply
- Karen Grace-Martin says
  
  March 4, 2019 at 11:17 am
  
  Hi Jan,
  
  It depends on the research question. See this: https://www.theanalysisfactor.com/pre-post-data-repeated-measures/
  
  Reply
Zione Kathleen Esic says

December 6, 2018 at 8:43 pm

Hi Karen,
Our study’s respondents are the left behind emerging adults. The IV is the psychological distress they are undergoing and it includes academic distress. Is it right to ask if they are in college level? Though it won’t be ideal to use it because not all emerging adults are college students but mostly are. It would also help determine the academic distress which I said earlier. Thank you!

Reply
Armindo Marques says

November 30, 2018 at 8:34 am

Hi Karen,

I really need your help. I don’t know what test (and I cant find any source of information about it) should I perform in this circumstances:

One IV (e.g. Gender)
Two DV (continuous)
One or Two Covariates (both ordinal).

What to do if the Covariates are ordinal?

Reply
- Karen Grace-Martin says
  
  November 30, 2018 at 11:52 am
  
  Hi Armindo,
  I can’t give advice without really digging into the details of things like your research questions and the roles of these variables.
  
  But I can comment on a couple things. Ordinal predictors usually need to be treated as categorical:https://www.theanalysisfactor.com/pros-and-cons-of-treating-ordinal-variables-as-nominal-or-continuous/
  
  Once you’ve got more than one DV, you’re into multivariate statistics. So some version of a MANOVA or Multivariate linear model.https://www.theanalysisfactor.com/multiple-regression-model-univariate-or-multivariate-glm/
  
  Reply
Tasha says

August 22, 2018 at 3:56 pm

Hi Karen,

What should you do if you have dramatically different sample sizes across levels of a categorical variable you are including as a covariate? For example, if you are controlling for gender with 4 categories (i.e., man, woman, prefer to self-describe, prefer not to say), is there a citation that supports either collapsing the last gender categories or even excluding them from the analysis because of their extremely small sample size to avoid skewing your results? Even with collapsing, however, you could still run into the same sample size problem. Is there a general consensus about how to handle this type of issue?

Reply
- Karen Grace-Martin says
  
  October 12, 2018 at 11:28 am
  
  Hi Tasha,
  
  I don’t know there is a consensus, other than to be thoughtful about it and consider the pros and cons of different approaches. You may want to read this: https://www.theanalysisfactor.com/when-unequal-sample-sizes-are-and-are-not-a-problem-in-anova/
  
  Reply
Zoe says

August 31, 2017 at 12:26 pm

Hi Karen,

I want to run a report where I think I will need an ANCOVA for the analysis.

I want to test 2 types of clinical outcomes for one rehab programme to see how they compare in picking up changes in pain and function. I have asked participants to fill out the 2 outcomes pre and post intervention (2 times points only).

I have the data and was thinking to run simple t-tests to show that the intervention has been successful in reducing pain and increasing function for each of the outcomes used.

Then, I want to ensure that both outcomes have picked up change successfully. In order to do this I want to run an ANCOVA, using the baseline pain scores as the covariate. This is in order to show that those with higher pain to start were perhaps less likely to end up with lowest pain scores post. Does this make sense? Here the IV would be the outcome used (either A or B) and the DV would be the pain score recorded post programme.

I would hope that the p value would be insignificant from the ANCOVA print out , showing that both outcomes (A and B) had no differenced in means , however I am getting a little confused about how I would report on the data from the covariate?

Reply
Verda says

June 19, 2017 at 7:12 am

Hi!

I want to do an analysis, I think it has to be an ANCOVA, but my independent variables are not probably/hopefully not independent of each other (whoch is an assumption of the ANCOVA, right).
I want to analyse: first independent variable: condition (4) and second independent variable: the difference score of two measurements points.

But the difference score should be dependent of the conditions (at least I hope so)
My dependent variable is the recognition score, which should not be dependent of the conditions and the difference score

So my questions is if I can use an ANCOVA because my two independent variables are linked to each other…

Thank you very much in advance!!!

Verda Simsek

Reply
shane says

May 17, 2017 at 1:58 pm

Hi Karen,
I just want to ask if the covariates will have their own F value in the ANOVA. I was running alinear mixed model ANOVA with 3 fixed factors and 1 random factor but with an extra 2 covariates. But on the Anova output, the covariates are not there…should their significance ´be shown in spss too?

Reply
- Andrew says
  
  April 10, 2018 at 2:07 pm
  
  Hi Shane,
  
  I’m not sure when your reply was posted, but I figured I would reply. If too late to help you, at least others may benefit:
  
  The primary purpose of a covariate is to illustrate an effect above an observed effect that goes beyond your manipulation. Therefore, there is already an assumption that your covariates are significantly correlated with the dependent variable. If this were not the case, there would be no point of putting them into your analysis. Basically, SPSS has no need to tell you of your significance of the covariate because you should already know that. Instead, what you should do to observe the ANCOVA at work is to analyze your model with and without the 2 covariates. What you should see is that you had more power without the covariates. However, in exchange for less power, you provide evidence that your effect extends beyond known potential confounds.
  
  Best of luck!!
  
  Reply
  - Karen says
    
    April 11, 2018 at 10:25 am
    
    Actually, I’m surprised that SPSS didn’t include those covariates in the ANOVA table. Yes, they should be there and yes, you need to test them.
    
    I disagree that covariates should only be there to look for effects beyond a manipulation. That may be true in experimental studies that actually have a manipulation. But many good models have both categorical predictors and continuous ones. Whether that categorical predictor is a manipulation only affects the interpretation, not the model.
    
    Reply
mulia says

May 5, 2017 at 1:54 am

Hi Karen,

Thanks for this. Its very clear. Its very helpful for me to communicate with my economist collaborator who keep insisting on using regression instead of ancova.

I must choose if I will analyze my data using regression or ANCOVA. The main problem is, if I use ancova then I will have to use all variables as covariate and use no fixed factor at all.

So my problem is to decide if a sensory data (ranking of respondents liking 1-4, like most to dislike most) can be used as fixed factor. Or should I dummy code them and put under covariate instead of fixed factor?

Please helpppp.

Best regards
Mulia

Reply
Dmytro Voloshyn says

March 18, 2017 at 6:23 pm

Thanks for this awesome article. I was analysing paper(https://eric.ed.gov/?id=EJ850774) that used “covariate” term in its abstract and it was really hard to get an idea of what the authors think covariate is. But now, after reading your explanations, it is much more clear!

Reply
joy says

February 21, 2017 at 5:26 am

hi karen,
I would like to ask how to compare scores of anxiety for two means between independent variables [group 1 VS group 2].
with no experimental design or manipulation.

however, i want to prove that the difference in mean scores is accounted for even after the introduction of a control variable [which is a continuous score on a Life Stress Scale].

please advice on how i should do this? using SPSS. thank you so much 🙂

Reply
Ashwani Marwah says

November 25, 2016 at 5:19 am

Hi Karen,
I want to do covariate analysis and my response variable is Binomial (Res ponder vs Non Responder) but covariate is continuous variables. So which stat tool you would suggest to do the robust analysis.

Thanks@Ashwani

Reply
John Chalisque Allsup says

September 27, 2016 at 2:43 am

One other issue with covariates arises when actually using the results in a subsequent decision process. Variables controlled for are akin to what is lost when partially differentiating. It is important to understand the need to put this information back when making decisions. By this, if we take the math training example you used above, within the population studied, there may be subsets who do well with a particular method and those who don’t. If you simply control for this variation with pre-test scores, you effectively average the variation away in you analysis. When this is done, it is critically important to understand that you are seeing an averaged picture (and that many potentially important pieces of information are absent). I loved how Good and Hardin saw fit at the beginning of the first chapter of their Common Errors in Statistics, to state unequivocably that statistics should never be used as the sole basis for making decisions. Thus ‘missing or discarded information’ is in part, to me, why. (Essentially you need either strong uniformity conditions on a population, or explicit inspection, before you can be confident a statistical result applies to a particular instance.)

Reply
- Andrew says
  
  April 10, 2018 at 2:18 pm
  
  I’m not sure if you’re mistakenly generalizing one concept to another. A successful ANCOVA will discard unwanted information. For example, let’s say a cognitive task is known to have a gender effect. I could use gender as a discrete covariate (either in ANCOVA or a multiple regression with gender as a grouping variable) to show an effect exists beyond gender. Essentially, I would be saying gender does not matter for my decision.
  
  The topic you are talking about (I believe) is how many statistics deal with averages. In doing so, we need to keep in mind that individual differences sexist. While this is true of covariates as well, I think that while a single measure is never enough for determining an effect, a covariate can help someone decide if something should not be a factor.
  
  Reply
- Karen says
  
  April 11, 2018 at 10:30 am
  
  Agree. Much of what you’re describing sounds like situations where there is an interaction involved.
  
  But yes, there is a fundamental concept in the decision making literature that statistics apply to groups, not individuals, and that what is best for the average may not be best for any given individual or situation.
  
  Reply
Ishrat Islam says

August 25, 2015 at 1:40 pm

Hi Karen,
I have 45 participants who received an exercise intervention. I am observing the change in their physical performance pre-post and after 8 months of the intervention. I have found group differences by using RM ANOVA. Now I need to control some covariates/ confounding variables (categorical variables) like age group, marital status, education level etc. How shall I do it? I am using SPSS version 22. Please suggest the most suitable tool to use. Please also comment if I could get that tool in the following options:

Option 1: While defining the independent variable in the process of RM ANOVA, there are two more spaces available to put data on ‘between subject factor’ and ‘covariate’. Shall I put all 8 covariates under the space labelled ‘covariates’? If I can do that,
Q1. Will that actually control these variables all together?
Q2. Will it lose power in doing so?
Q3. How shall I then term the tool of analysis? RM ANOVA with covariates or RM ANOVA or any other term?
Q4. Shall I use them as time*cov for all the covariate separately?

Option 2: Shall I use mixed method ANOVA by putting one particular covariate in the ‘between subject factor’ and see if it has any effect?
Q.1 Shall I look at the between subject effect in the output file to see the impact? Or need to look at the effect size or both?
Q2. Shall I need to put other 7 covariates under the space labelled ‘covariates’ while considering one covariate as a between subject factor?
Q3. How shall I then term the tool of analysis?

Looking forward to hearing from you.

Many thanks in advance.

Reply
Claudia says

July 5, 2015 at 8:01 am

Hi, Karen,

I really like your explanation about the term. Enjoyed reading it. Thanks a lot. It certainly helps to stop the arguments between my students and me.

Best,
Claudia

Reply
C says

May 31, 2015 at 3:57 pm

Hi Karen,
Thanks for the helpful article. I’m trying to decide which variables to include as control variables in my regression model. Aside from theoretical reasons, I have examined my correlation matrix. I have a variable that correlates with my IV, but not with my DV…Am I correct in assuming that this variable should NOT be controlled for in the model (since it is unrelated to the DV)?

Thanks for your help!

Reply
- Andrew says
  
  April 10, 2018 at 2:22 pm
  
  Hi C,
  
  Not sure if it is too late to help, but generally, you are correct. Covariates are variables that vary with the DV, and can be an alternative explanation for the effect of your IV. IF no correlation exists with the DV, there is no need to control for it. However, Keep in mind that for an ANCOVA your IV needs to be a discrete variable, and the DV needs to be continuous. Linear regression would not properly diagnose a relationship between these two.
  
  Reply
Cianny says

May 22, 2015 at 12:28 pm

Karen,

Thanks so much for this clarification.
So, in other words, if I want to control for a categorical variable, I still run ANOVA. and if I want to control for a continous variable, I run ANCOVA. *phew*

Reply
- Karen says
  
  June 3, 2016 at 9:15 am
  
  Yes, exactly.
  
  Reply
Walid says

May 10, 2015 at 2:27 pm

Hi Karen, thank a lot for the site. Its really great.
Currently, I’m designing an experiment on stress treatment of plants with single (heat or drought) or combined (heat+drought) fixed effect IVs to measure one or more response DVs. Each stress treatment will be applied with several levels at (combined with) 3 time points. Measurement(s) are from different experimental units and not from the same individual plant. I’m thinking using either 2-way ANOVA or MANOVA depends on the no. of DV to be measured. However, I’m a little bit confused about the time point factor. I would expect that the measurements from control plants will not be significantly changed by time (as there is no stress), but under (e.g. heat treatment) I would expect to find a significant change in the main effect factor between the 3 time points. Shall I consider the time point as a covariance factor in this case, and use, for example, one-way ANOVA instead of two-way to measure one DV under one heat. Do you recommend any specific analysis or model different from above.
I would appreciate a lot your advice and help

Reply
sarah says

March 14, 2015 at 9:28 pm

Hi Karen,
I need some help with choosing the statistical analysis.
Details of my study.
all participants will complete surveys measuring self compassion, whether their psychological needs are met, how high on non attachment scale they are and other trait scales.

then, Half will be given some induction training on how to meditate. half will not.

after that, everyone will be tested to see how mindful and meditative they were using some breath counting measures.

i want to see relationship between the outcome and the survey responses and the training received.

for eg. someone whose needs are met and who received training did well on being mindful.
what about someone whose needs are met but was part of the group that did not receive training, what if he does well or what if he doesn’t.

is the survey responses the covariates? How would i write up the research question?

would the test be correlation or regression? ANCOVA?

My original hypothesis was needs met led to being good at meditation. But now the control group has been included, I am confused.

Please help.
Thanks in advance

Reply
Wael Hussein says

January 8, 2015 at 2:46 pm

Just wanted to say thank you for the site and the easy to follow text. Great job!

Reply
Isabel says

December 5, 2014 at 11:07 am

Hi Karen!

First of all, thank you for your site because it has helped me a lot of times 😉

I have a doubt in the statistical analysis of my study. I am studying stress on mothers and fathers (two independent samples). An important variable in my study is the number of children (usually mothers or fathers with more children feel more stress) and my subjects have between 1, 2 or 3 children. However because I am only looking for gender (mothers vs fathers) differences in stress I want to control this variable (number of children) in my sample. Thus, I decided to apply a chi-square analysis to see if my sample of mothers differed from my sample of fathers in the number of children. The test is non significant so i assumed that my two independent samples do not differ in number of children and that number of children is not a covariate when comparing the mothers and fathers of my sample. Is this correct?

Thanks a lot*

Reply
Abbie A says

December 1, 2014 at 11:20 pm

Very helpful. Thanks

Reply
Lee Fountain says

November 25, 2014 at 2:16 pm

Hi Karen,

Maybe you could clarify something for me. I have a model that consist of 3 independent variables and on dependent variable. However, in my research I identify two other concepts that acts as mediators (social exchange and perceived organizational support). Would these concepts be considered as covariates?

Thanks

Reply
- Karen says
  
  November 30, 2014 at 11:56 am
  
  Hi Lee,
  
  Mediators are a little different than covariates. See this: https://www.theanalysisfactor.com/five-common-relationships-among-three-variables-in-a-statistical-model/
  
  Reply
Keely says

July 30, 2014 at 1:12 am

Hi Karen,

I found your review of covariates really helpful. My research involves exploring the impact of anxiety on communication. I’m using a transmission chain methodology where one person reads a story and reproduces that for the next person in the chain, who reproduces it for the next person in the chain and so on until 4 people have read and reproduced the story. I used a mood induction procedure, which is my between subjects factor. Typically these studies classify the ‘generations’ in the transmission chain as a within subjects factor as the output of person 2-4 is dependent on what they received from the previous person (as if you are taken measurements at different points in time). I am measuring the number of positive and negative statements produced by each person in their reproductions. So what I have done is a 2x4x2 ANOVA.

What I also want to do is explore the impact of trait anxiety, which I measured prior to testing and is a continuous variable. Is it possible to enter trait anxiety as a covariate in an ANOVA/ANCOVA to determine if trait anxiety was related to/or differentially impacted performance under each condition? The hypothesis is that high trait anxiety participants, under a negative mood induction would show a different pattern of results to low trait anxious.

I can perform a median split and enter it as a between subjects factor but this will result in a small number of observations, particularly once I try to look at more than one generation and I’m worried about the loss of power.

I would really appreciate your advice.

All the best,
Keely

Reply
- Patty says
  
  May 26, 2016 at 6:18 am
  
  I’m having the same problem 🙁
  
  Reply
Per says

June 12, 2014 at 7:45 am

Dear Karen,

i was wondering if a covariate is the same as the mediator in a repeated measurement model? I am using it that way.
But maybe there is another way to test a mediation?

Thanks in advance for your help

Reply
Liz says

April 22, 2014 at 3:46 pm

Hi Karen,

What does the phrase “covary out the effects” mean? Your article has been very helpful in helping to clear up confusion with terms!! But I’m still at a loss as to why this phrase was used in the context of a test evaluation. In discussing potential changes to a control group (outside the bounds of the test), we were told they could “covary out the effects” of a change.

Reply
- Karen says
  
  May 7, 2014 at 10:59 am
  
  Hi Liz,
  
  Great question, and one that will take another article to describe. 🙂 I’ll add that to my list of future articles to write.
  
  Reply
rk says

October 15, 2013 at 4:29 pm

Hi,
I am new to this site and this has always confused me. How do you “control” for a variable? Could you please explain, how “controlling “works? One of the ways to control a variable might be just taking random samples, i.e., if you want to control for age, then take a rs from all age groups. Another example would be like a “control group” (placebo in a drug experiment).
Am I correct in the above examples? Also, I assume there are other standard techniques, could you please clarify how they work?

Reply
- Karen says
  
  October 16, 2013 at 9:56 am
  
  Hi RK,
  
  That’s a big question. Or rather, a small question with a big answer. I will see if I can write a post (or two or 10) explaining it.
  
  Reply
  - rk says
    
    October 17, 2013 at 10:11 am
    
    Thanks Karen! Looking forward to it!!
    
    Reply
Alyssa says

July 11, 2013 at 7:50 pm

Hi Karen,
Thanks for this article! I just want to further clarify some of the discussion on dichotomous covariates/fixed factors. Specifically, I am running a chi squared with one dichotomous outcome and one dichotomous predictor- too see how well group membership of the two-level predictor discriminates groups membership of the two level outcome. Now, I want to ensure that gender (I think my covariate/fixed factor) does not modify the relationship of my predictor’s ability to discriminate membership of my outcome variable. To answer this question I plan to run the model where both genders are combined and separately for each gender. The answer is that discriminate ability of my predictor does depend on gender to some extent.

So my question is – is gender a fixed factor here? is there a better word for it? Random factor?

Thank you so much!

Alyssa

Reply
- Karen says
  
  July 15, 2013 at 3:44 pm
  
  Hi Alyssa,
  
  Within the context of SPSS GLM, Gender is a fixed factor. Don’t make it random–that’s a whole other thing!
  
  But if you’re doing a chi-square, Fixed Factor and covariate aren’t really issues. Just add it in as another variable
  
  Reply
Kathy says

June 16, 2013 at 2:06 pm

Easy to understand definition of Covariate for ANCOVA. Thank you!

Reply
Anika Kunz says

May 31, 2013 at 5:01 am

Hi,

I’m running an ANOVA with repeated measures. As I can see from the correllation matrix there are significant correllations between my possible cofounding variables and my dependent variable but here is my problem: if the possible co-variate is correllating with the dep. variable at time2 but not at time1, do I have to include it as a “normal” co-variate when I perform the GLM?

Thanks for any help.
Anika

Reply
- Karen says
  
  June 6, 2013 at 5:30 pm
  
  It would be a good idea to include a covariate*time interaction. That will allow the effect of the covariate to be different at time 1 and time 2.
  
  Reply
Joanne says

April 24, 2013 at 3:05 pm

Hi Karen,

I have a follow-up question please. I am looking at memory performance in young and older adults under two conditions: when a negative or a positive stereotype is activated. I’m using a mixed model to compare performance across time (before/after the intervention) across the two age groups and conditions.

I obtain a significant difference between the two age groups on levels of verbal IQ (NART scores) which is continuous, and hence a covariate (that exerts a significant effect).

The paper that I am basing my study on also obtains a difference between age groups over verbal IQ scores. They do not obtain a significant difference within age groups but between conditions, however, and so have not included it as a covariate. This seems wrong to me. Surely if a significant difference over a background variable occurs on one of your IVs you should include the covariate in the model, regardless of whether there’s a difference between groups on the second IV?

If you could clear this up for me I’d really appreciate it, as I am confused!

Thanks,

Joanne

Reply
- Karen says
  
  April 29, 2013 at 6:45 pm
  
  Hi Joanne,
  
  I am missing something. Does verbal IQ relate to the DV? (It seems it would but you don’t mention that). So in this paper, the two stereotype condition groups have different verbal IQ scores, but age groups didn’t? It really comes down to whether the potential covariate is related to the DV.
  
  Reply
norhawa says

March 21, 2013 at 9:58 pm

Hi Karen,

Thanks for the information. The way you write it very clear and easy to understand. Thanks~

Reply
- Karen says
  
  April 2, 2013 at 5:43 pm
  
  Thanks!
  
  Reply
- zahir says
  
  March 13, 2014 at 9:15 pm
  
  So can we say based on your answer that every confounding variable is a covariate but not every covariate is a confounding variable?
  
  Zahir
  
  Reply
  - Karen says
    
    April 4, 2014 at 9:55 am
    
    Depends on how you’re using them. 🙂
    
    Reply
Alexa says

October 9, 2012 at 2:37 pm

What is the difference between a confound and a covariate in simple terms?

Reply
- Karen says
  
  October 23, 2012 at 3:36 pm
  
  Alexa, that’s a really great question. A confound is a variable that is perfectly (or so near perfectly that you can’t distinguish) associated with another variable that you can’t tell their effects apart.
  
  In most areas of the US, for example, neighborhood and school attended overlap so much because most kids from a neighborhood all go to the same school. So you couldn’t separate out the school effects on say, grade 3 test scores, from the neighborhood effects.
  
  A covariate is a variable that affects the DV in addition to the IV. It doesn’t have to be correlated with the independent variable. If not, it may just explain some of the otherwise unexplained variation in the DV.
  
  Karen
  
  Reply
Emily says

October 6, 2012 at 12:12 am

Thanks Karen! I’m so happy this website exists.
I found this page because I am stuck on something related but at a way lower level (I am no statistician). I’m running a repeated measures ANOVA in SPSS, using GLM. I need to control for a between-subjects categorical variable that might be adding noise to the data and washing out any effects of my factors. I can’t figure out if the right way to do this is to put it in as a between-subjects factor, or pretend that is a continuous variable and put it in the covariate box. Can you help?

Reply
- Karen says
  
  October 8, 2012 at 9:02 am
  
  Hi Emily,
  
  Yes. In fact, there is already an article here on that exact topic. It’s the same in all SPSS glm procedures, whether you’re using univarate, repeated measures, etc. https://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/
  
  Karen
  
  Reply
  - Debra says
    
    February 11, 2013 at 7:25 am
    
    Hi Karen
    
    I’m still a little confused on the same issue as Emily, despite reading the article you suggested. Although the suggested article seems to clearly spell out that any true categorical variable, including dichotomous variables, should be included in an ANOVA model as a fixed factor rather than a covariate, the article above states
    
    “In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training…You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.”
    
    It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor.
    
    Sorry if this seems obvious, perhaps I’m getting caught up in the terminology too! Any clarification would be greatly appreciated, I’ve found your explanations to be more helpful than most!
    
    Reply
    - Karen says
      
      February 13, 2013 at 3:04 pm
      
      Hi Debra,
      
      SPSS’s definitition of “Covariate” is “continuous predictor variable.” It’s definition of “fixed factor” is categorical predictor variable. (There’s actually more to this in comparing fixed and random factors, but that’s a tangent here).
      
      “It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor. ”
      
      I don’t mean define it the same way, I mean use it as a control variable in the same way. I’m trying to separate out the use of the variable in the model (as something to control for vs. something about whose effect you have a hypothesis) from the way it was measured and therefore needs to be defined (categorical vs. continuous).
      
      So yes, it goes into Fixed Factors because it’s categorical.
      
      And don’t apologize for getting confused with terminology–that’s my whole point. The inprecision of the terminology is what makes it so confusing! 🙂
      Karen
      
      Reply
      - Ayesha says
        
        October 31, 2013 at 5:29 pm
        
        Hello Karen
        
        I was going through the discussion and had same confusion. my study evaluates effectiveness of a school based program on preschoolers behavior problems. my intervention and control groups differ on strength of students in class measured in categories and fathers education also measured in categories. I need to see if ANCOVA results remain significant after controlling for these baseline differences. the covariate option in SPSS should be a continues measure. So how should I do the analysis with categorical variable. Please answer soon.
      - Karen says
        
        November 8, 2013 at 11:36 am
        
        Hi Ayesha,
        
        If a control variable is simply a categorical variable, put it into “Fixed Factors” instead of Covariate in Univariate GLM. By default, SPSS will also add in an interaction term, but you can take that out in the Design dialog box.
        
        fyi, if it’s helpful, we have a workshop available on demand that goes through all these details of SPSS GLM: http://theanalysisinstitute.com/spss-glm-ondemand-workshop/
FASASI R.A says

August 15, 2012 at 10:40 am

Pls what is the relationship between covariate partial eta squared in ancova result and partial eta squared of treatment and moderator variables. what is the implication of that of covariate which can be the pretest being higher than that of main treatment or moderator variable

Reply
- Karen says
  
  September 11, 2012 at 4:54 pm
  
  Hi Fasasi,
  
  I don’t know–I’d need more information on the model. For example, is the covariate different from the moderator? Which interactions are being included?
  
  Thanks,
  Karen
  
  Reply
Nur Barizah says

July 19, 2012 at 9:26 am

Thanks a lot Caren. Your notes were very helpful. I have been looking for the answers in tens of books for several months. Thank God, many of my uncertainties on GLM command are answered today in your site. FYI, in my area of study (accountancy), GLM command is almost nonexistent in literatures.

Reply
- Karen says
  
  July 19, 2012 at 10:43 am
  
  Hi Nur,
  
  I’m so glad.
  
  Most of the time in the literature it will be called ANOVA, ANCOVA or linear regression. But they’re all the same model and all can be run in GLM.
  
  Karen
  
  Reply
Ben says

May 2, 2012 at 1:05 am

Very clear. Found the answer I was looking for. Thanks much! I’ll pass on word of your site.

Reply
- Karen says
  
  May 3, 2012 at 2:39 pm
  
  Thanks, Ben! Glad it was helpful.
  
  Karen
  
  Reply
Karen says

April 11, 2012 at 9:33 am

Hi Akinboboye,

I would suggest starting with the SPSS category link at the right.

If you need more help at the beginning level, I’d be happy to send you my book (I have a few extra copies) for the cost of shipping. Please email me directly.

If you want more help with the concepts discussed above, you really want the Running Regressions and ANCOVAs in SPSS GLM workshop. It walks you through the univariate GLM procedure step-by-step and shows where it’s the same and where it’s different from the regression procedure. You can get to that here: http://www.theanalysisinstitute.com/workshops/SPSS-GLM/index.html

It’s not running right now, but you can use our contact form to get access to it as a home study workshop.

Best,
Karen

Reply
Akinboboye joseph says

April 10, 2012 at 7:57 pm

I really enjoy this write-up. Kindly send me details on how to use spss. Thanks

Reply

Covariates as Continuous Predictor Variables

Covariates as Control Variables

So what is a covariate then?

Reader Interactions

Comments

Leave a Reply Cancel reply