Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts.

Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means.

The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to adjust for the effect of an observed, continuous variable–the covariate.

In this context, the covariate is always continuous, never the key independent variable, and always observed (i.e. observations weren’t randomly assigned its values, you just measured what was there).

A simple example is a study looking at the effect of a training program on math ability. The independent variable is the training condition–whether participants received the math training or some irrelevant training. The dependent variable is their math score after receiving the training.

But even within each training group, there is going to be a lot of variation in people’s math ability. If you don’t adjust for that, it is just unexplained variation. Having a lot of unexplained variation makes it pretty tough to see the actual effect of the training–it gets lost in all the noise.

So if you use pretest math score as a covariate, you can adjust for where people started out. So you get a clearer picture of whether people do well on the final test due to the training or due to the math ability they had coming in.

Okay, great. Where’s the confusion?

### Covariates as Continuous Predictor Variables

The confusion is that, really, the model doesn’t care that the covariate is something you don’t have a hypothesis about. Something you’re just adjusting for. Mathematically, it’s the same model, and you run it the same way.

And so people who understand this often use the term covariate to mean ANY continuous predictor variable in your model, whether it’s just a control variable or the most important predictor in your hypothesis. And I’m guilty as charged. It’s a lot easier to say covariate than continuous predictor variable.

But SPSS does this too. You can run a linear regression model with only continuous predictor variables in SPSS GLM by putting them in the Covariate box. All the Covariate box does is define the predictor variable as continuous.

(SAS’s PROC GLM does the same thing, but it doesn’t specifically label them as Covariates. In PROC GLM, the assumption is all predictor variables are continuous. If they’re categorical, it’s up to you, the user, to specify them as such in the CLASS statement.)

### Covariates as Control Variables

But the other part of the original ANCOVA definition is that a covariate is a control variable.

So sometimes people use the term Covariate to mean any control variable. Because really, you can covary out the effects of a categorical control variable just as easily.

In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training.

You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.

Once again, there isn’t really a good term for categorical control variable, so people sometimes refer to it as the covariate.

### So what is a covariate then?

It’s hard to say. There is no disputing the first definition, so it’s clear there.

I prefer to just be careful, in setting up hypotheses, running analyses, and in writing up results, to be clear about which variables I’m hypothesizing about and which ones I’m controlling for, and whether each variable is continuous or categorical.

The names that the variables, or the models, end up having, aren’t important as long as you’re clear about what you’re testing and how.

———————————————————————————-

Read other posts on Confusing Statistical Terms and/or check out our other resources related to ANOVA and ANCOVA.

Precious says

Hello everyone,

In my study, I have plant (plant population) density as factor, is it necessary to include the same plant density (number of plants at harvest) as covariate? I have been advised to do so and I urgently need your help on this.

Thanks

Lisa says

This really helped me a lot – thanks!

Jade says

Hi

Thank you for your article here. I’ve read a paper which said “including gender as a covariate”, but I learned the covariate must be continuous in ANCOVA, so it really baffled me until I found your explanation here!

But there is still a question, if the categorical covariate has an interaction with the IV, how can i report it? Report the main effect of IV hierarchically? Or if there is an interaction ,the variate can not be taken as a covariate?

Brad says

Your definition of a covariate in ANCOVA is completely at odds with that given in Whitlock and Schluter (2020). The Analysis of Biological Data, 3rd ed. They specify that the covariate is categorical, and the main effect (factor/explanatory variable) is numerical (pretty much the opposite of what you state). And by the way, a numerical variable does not have to be “continuous”; it may be discrete (e.g, counts, which are integers). So you’ve even confused the term continuous with numerical.

Karen Grace-Martin says

Hi Brad,

Very interesting. I don’t have that book. Yes, it’s absolutely true that many authors use “covariate” to means different things. But I’ve never, ever, heard of a factor in an ANCOVA being numerical. Look in any other book on ANOVA. That is so backward from usual usage I wonder if it’s a typo. I’d have to see the exact wording to really comment. That said, Factor is also another confusing term, in that it means something entirely different in the context of Factor Analysis, where it is continuous.

And you’re absolutely correct that numerical variables don’t have to be continuous. The difference is very important for dependent/outcome/response variables, since it affects the type of model you use. It’s not a big difference for predictors, since if you fit a line to a discrete predictor, you’re technically treating it as continuous.

Geoff says

Hi Karen,

Brilliant post, that really unlogged a lot for me.

I was just wondering if you had information on what the actual math looked like for adding a covariate into a regression? Just as a basic example. i.e. How do you ‘adjust for the effect of said covariate’?

Sorry if I’m getting my terms mixed up.

Jan Stautemas says

I am confused by your example?

Why do you use pre math scores as a covariate and not as timepoint 1 in a repeated measures design?

Within variables: time (pre post), dependent variable (training 1 or two), no covariate.

Thank you in advance

Jan

Karen Grace-Martin says

Hi Jan,

It depends on the research question. See this: https://www.theanalysisfactor.com/pre-post-data-repeated-measures/

Zione Kathleen Esic says

Hi Karen,

Our study’s respondents are the left behind emerging adults. The IV is the psychological distress they are undergoing and it includes academic distress. Is it right to ask if they are in college level? Though it won’t be ideal to use it because not all emerging adults are college students but mostly are. It would also help determine the academic distress which I said earlier. Thank you!

Armindo Marques says

Hi Karen,

I really need your help. I don’t know what test (and I cant find any source of information about it) should I perform in this circumstances:

One IV (e.g. Gender)

Two DV (continuous)

One or Two Covariates (both ordinal).

What to do if the Covariates are ordinal?

Karen Grace-Martin says

Hi Armindo,

I can’t give advice without really digging into the details of things like your research questions and the roles of these variables.

But I can comment on a couple things. Ordinal predictors usually need to be treated as categorical:https://www.theanalysisfactor.com/pros-and-cons-of-treating-ordinal-variables-as-nominal-or-continuous/

Once you’ve got more than one DV, you’re into multivariate statistics. So some version of a MANOVA or Multivariate linear model.https://www.theanalysisfactor.com/multiple-regression-model-univariate-or-multivariate-glm/

Tasha says

Hi Karen,

What should you do if you have dramatically different sample sizes across levels of a categorical variable you are including as a covariate? For example, if you are controlling for gender with 4 categories (i.e., man, woman, prefer to self-describe, prefer not to say), is there a citation that supports either collapsing the last gender categories or even excluding them from the analysis because of their extremely small sample size to avoid skewing your results? Even with collapsing, however, you could still run into the same sample size problem. Is there a general consensus about how to handle this type of issue?

Karen Grace-Martin says

Hi Tasha,

I don’t know there is a consensus, other than to be thoughtful about it and consider the pros and cons of different approaches. You may want to read this: https://www.theanalysisfactor.com/when-unequal-sample-sizes-are-and-are-not-a-problem-in-anova/

Zoe says

Hi Karen,

I want to run a report where I think I will need an ANCOVA for the analysis.

I want to test 2 types of clinical outcomes for one rehab programme to see how they compare in picking up changes in pain and function. I have asked participants to fill out the 2 outcomes pre and post intervention (2 times points only).

I have the data and was thinking to run simple t-tests to show that the intervention has been successful in reducing pain and increasing function for each of the outcomes used.

Then, I want to ensure that both outcomes have picked up change successfully. In order to do this I want to run an ANCOVA, using the baseline pain scores as the covariate. This is in order to show that those with higher pain to start were perhaps less likely to end up with lowest pain scores post. Does this make sense? Here the IV would be the outcome used (either A or B) and the DV would be the pain score recorded post programme.

I would hope that the p value would be insignificant from the ANCOVA print out , showing that both outcomes (A and B) had no differenced in means , however I am getting a little confused about how I would report on the data from the covariate?

Verda says

Hi!

I want to do an analysis, I think it has to be an ANCOVA, but my independent variables are not probably/hopefully not independent of each other (whoch is an assumption of the ANCOVA, right).

I want to analyse: first independent variable: condition (4) and second independent variable: the difference score of two measurements points.

But the difference score should be dependent of the conditions (at least I hope so)

My dependent variable is the recognition score, which should not be dependent of the conditions and the difference score

So my questions is if I can use an ANCOVA because my two independent variables are linked to each other…

Thank you very much in advance!!!

Verda Simsek

shane says

Hi Karen,

I just want to ask if the covariates will have their own F value in the ANOVA. I was running alinear mixed model ANOVA with 3 fixed factors and 1 random factor but with an extra 2 covariates. But on the Anova output, the covariates are not there…should their significance ´be shown in spss too?

Andrew says

Hi Shane,

I’m not sure when your reply was posted, but I figured I would reply. If too late to help you, at least others may benefit:

The primary purpose of a covariate is to illustrate an effect above an observed effect that goes beyond your manipulation. Therefore, there is already an assumption that your covariates are significantly correlated with the dependent variable. If this were not the case, there would be no point of putting them into your analysis. Basically, SPSS has no need to tell you of your significance of the covariate because you should already know that. Instead, what you should do to observe the ANCOVA at work is to analyze your model with and without the 2 covariates. What you should see is that you had more power without the covariates. However, in exchange for less power, you provide evidence that your effect extends beyond known potential confounds.

Best of luck!!

Karen says

Actually, I’m surprised that SPSS didn’t include those covariates in the ANOVA table. Yes, they should be there and yes, you need to test them.

I disagree that covariates should only be there to look for effects beyond a manipulation. That may be true in experimental studies that actually have a manipulation. But many good models have both categorical predictors and continuous ones. Whether that categorical predictor is a manipulation only affects the interpretation, not the model.

mulia says

Hi Karen,

Thanks for this. Its very clear. Its very helpful for me to communicate with my economist collaborator who keep insisting on using regression instead of ancova.

I must choose if I will analyze my data using regression or ANCOVA. The main problem is, if I use ancova then I will have to use all variables as covariate and use no fixed factor at all.

So my problem is to decide if a sensory data (ranking of respondents liking 1-4, like most to dislike most) can be used as fixed factor. Or should I dummy code them and put under covariate instead of fixed factor?

Please helpppp.

Best regards

Mulia

Dmytro Voloshyn says

Thanks for this awesome article. I was analysing paper(https://eric.ed.gov/?id=EJ850774) that used “covariate” term in its abstract and it was really hard to get an idea of what the authors think covariate is. But now, after reading your explanations, it is much more clear!

joy says

hi karen,

I would like to ask how to compare scores of anxiety for two means between independent variables [group 1 VS group 2].

with no experimental design or manipulation.

however, i want to prove that the difference in mean scores is accounted for even after the introduction of a control variable [which is a continuous score on a Life Stress Scale].

please advice on how i should do this? using SPSS. thank you so much 🙂

Ashwani Marwah says

Hi Karen,

I want to do covariate analysis and my response variable is Binomial (Res ponder vs Non Responder) but covariate is continuous variables. So which stat tool you would suggest to do the robust analysis.

Thanks@Ashwani

John Chalisque Allsup says

One other issue with covariates arises when actually using the results in a subsequent decision process. Variables controlled for are akin to what is lost when partially differentiating. It is important to understand the need to put this information back when making decisions. By this, if we take the math training example you used above, within the population studied, there may be subsets who do well with a particular method and those who don’t. If you simply control for this variation with pre-test scores, you effectively average the variation away in you analysis. When this is done, it is critically important to understand that you are seeing an averaged picture (and that many potentially important pieces of information are absent). I loved how Good and Hardin saw fit at the beginning of the first chapter of their Common Errors in Statistics, to state unequivocably that statistics should never be used as the sole basis for making decisions. Thus ‘missing or discarded information’ is in part, to me, why. (Essentially you need either strong uniformity conditions on a population, or explicit inspection, before you can be confident a statistical result applies to a particular instance.)

Andrew says

I’m not sure if you’re mistakenly generalizing one concept to another. A successful ANCOVA will discard unwanted information. For example, let’s say a cognitive task is known to have a gender effect. I could use gender as a discrete covariate (either in ANCOVA or a multiple regression with gender as a grouping variable) to show an effect exists beyond gender. Essentially, I would be saying gender does not matter for my decision.

The topic you are talking about (I believe) is how many statistics deal with averages. In doing so, we need to keep in mind that individual differences sexist. While this is true of covariates as well, I think that while a single measure is never enough for determining an effect, a covariate can help someone decide if something should not be a factor.

Karen says

Agree. Much of what you’re describing sounds like situations where there is an interaction involved.

But yes, there is a fundamental concept in the decision making literature that statistics apply to groups, not individuals, and that what is best for the average may not be best for any given individual or situation.

Ishrat Islam says

Hi Karen,

I have 45 participants who received an exercise intervention. I am observing the change in their physical performance pre-post and after 8 months of the intervention. I have found group differences by using RM ANOVA. Now I need to control some covariates/ confounding variables (categorical variables) like age group, marital status, education level etc. How shall I do it? I am using SPSS version 22. Please suggest the most suitable tool to use. Please also comment if I could get that tool in the following options:

Option 1: While defining the independent variable in the process of RM ANOVA, there are two more spaces available to put data on ‘between subject factor’ and ‘covariate’. Shall I put all 8 covariates under the space labelled ‘covariates’? If I can do that,

Q1. Will that actually control these variables all together?

Q2. Will it lose power in doing so?

Q3. How shall I then term the tool of analysis? RM ANOVA with covariates or RM ANOVA or any other term?

Q4. Shall I use them as time*cov for all the covariate separately?

Option 2: Shall I use mixed method ANOVA by putting one particular covariate in the ‘between subject factor’ and see if it has any effect?

Q.1 Shall I look at the between subject effect in the output file to see the impact? Or need to look at the effect size or both?

Q2. Shall I need to put other 7 covariates under the space labelled ‘covariates’ while considering one covariate as a between subject factor?

Q3. How shall I then term the tool of analysis?

Looking forward to hearing from you.

Many thanks in advance.

Claudia says

Hi, Karen,

I really like your explanation about the term. Enjoyed reading it. Thanks a lot. It certainly helps to stop the arguments between my students and me.

Best,

Claudia

C says

Hi Karen,

Thanks for the helpful article. I’m trying to decide which variables to include as control variables in my regression model. Aside from theoretical reasons, I have examined my correlation matrix. I have a variable that correlates with my IV, but not with my DV…Am I correct in assuming that this variable should NOT be controlled for in the model (since it is unrelated to the DV)?

Thanks for your help!

Andrew says

Hi C,

Not sure if it is too late to help, but generally, you are correct. Covariates are variables that vary with the DV, and can be an alternative explanation for the effect of your IV. IF no correlation exists with the DV, there is no need to control for it. However, Keep in mind that for an ANCOVA your IV needs to be a discrete variable, and the DV needs to be continuous. Linear regression would not properly diagnose a relationship between these two.

Cianny says

Karen,

Thanks so much for this clarification.

So, in other words, if I want to control for a categorical variable, I still run ANOVA. and if I want to control for a continous variable, I run ANCOVA. *phew*

Karen says

Yes, exactly.

Walid says

Hi Karen, thank a lot for the site. Its really great.

Currently, I’m designing an experiment on stress treatment of plants with single (heat or drought) or combined (heat+drought) fixed effect IVs to measure one or more response DVs. Each stress treatment will be applied with several levels at (combined with) 3 time points. Measurement(s) are from different experimental units and not from the same individual plant. I’m thinking using either 2-way ANOVA or MANOVA depends on the no. of DV to be measured. However, I’m a little bit confused about the time point factor. I would expect that the measurements from control plants will not be significantly changed by time (as there is no stress), but under (e.g. heat treatment) I would expect to find a significant change in the main effect factor between the 3 time points. Shall I consider the time point as a covariance factor in this case, and use, for example, one-way ANOVA instead of two-way to measure one DV under one heat. Do you recommend any specific analysis or model different from above.

I would appreciate a lot your advice and help

sarah says

Hi Karen,

I need some help with choosing the statistical analysis.

Details of my study.

all participants will complete surveys measuring self compassion, whether their psychological needs are met, how high on non attachment scale they are and other trait scales.

then, Half will be given some induction training on how to meditate. half will not.

after that, everyone will be tested to see how mindful and meditative they were using some breath counting measures.

i want to see relationship between the outcome and the survey responses and the training received.

for eg. someone whose needs are met and who received training did well on being mindful.

what about someone whose needs are met but was part of the group that did not receive training, what if he does well or what if he doesn’t.

is the survey responses the covariates? How would i write up the research question?

would the test be correlation or regression? ANCOVA?

My original hypothesis was needs met led to being good at meditation. But now the control group has been included, I am confused.

Please help.

Thanks in advance

Wael Hussein says

Just wanted to say thank you for the site and the easy to follow text. Great job!

Isabel says

Hi Karen!

First of all, thank you for your site because it has helped me a lot of times 😉

I have a doubt in the statistical analysis of my study. I am studying stress on mothers and fathers (two independent samples). An important variable in my study is the number of children (usually mothers or fathers with more children feel more stress) and my subjects have between 1, 2 or 3 children. However because I am only looking for gender (mothers vs fathers) differences in stress I want to control this variable (number of children) in my sample. Thus, I decided to apply a chi-square analysis to see if my sample of mothers differed from my sample of fathers in the number of children. The test is non significant so i assumed that my two independent samples do not differ in number of children and that number of children is not a covariate when comparing the mothers and fathers of my sample. Is this correct?

Thanks a lot*

Abbie A says

Very helpful. Thanks

Lee Fountain says

Hi Karen,

Maybe you could clarify something for me. I have a model that consist of 3 independent variables and on dependent variable. However, in my research I identify two other concepts that acts as mediators (social exchange and perceived organizational support). Would these concepts be considered as covariates?

Thanks

Karen says

Hi Lee,

Mediators are a little different than covariates. See this: https://www.theanalysisfactor.com/five-common-relationships-among-three-variables-in-a-statistical-model/

Keely says

Hi Karen,

I found your review of covariates really helpful. My research involves exploring the impact of anxiety on communication. I’m using a transmission chain methodology where one person reads a story and reproduces that for the next person in the chain, who reproduces it for the next person in the chain and so on until 4 people have read and reproduced the story. I used a mood induction procedure, which is my between subjects factor. Typically these studies classify the ‘generations’ in the transmission chain as a within subjects factor as the output of person 2-4 is dependent on what they received from the previous person (as if you are taken measurements at different points in time). I am measuring the number of positive and negative statements produced by each person in their reproductions. So what I have done is a 2x4x2 ANOVA.

What I also want to do is explore the impact of trait anxiety, which I measured prior to testing and is a continuous variable. Is it possible to enter trait anxiety as a covariate in an ANOVA/ANCOVA to determine if trait anxiety was related to/or differentially impacted performance under each condition? The hypothesis is that high trait anxiety participants, under a negative mood induction would show a different pattern of results to low trait anxious.

I can perform a median split and enter it as a between subjects factor but this will result in a small number of observations, particularly once I try to look at more than one generation and I’m worried about the loss of power.

I would really appreciate your advice.

All the best,

Keely

Patty says

I’m having the same problem 🙁

Per says

Dear Karen,

i was wondering if a covariate is the same as the mediator in a repeated measurement model? I am using it that way.

But maybe there is another way to test a mediation?

Thanks in advance for your help

Liz says

Hi Karen,

What does the phrase “covary out the effects” mean? Your article has been very helpful in helping to clear up confusion with terms!! But I’m still at a loss as to why this phrase was used in the context of a test evaluation. In discussing potential changes to a control group (outside the bounds of the test), we were told they could “covary out the effects” of a change.

Karen says

Hi Liz,

Great question, and one that will take another article to describe. 🙂 I’ll add that to my list of future articles to write.

rk says

Hi,

I am new to this site and this has always confused me. How do you “control” for a variable? Could you please explain, how “controlling “works? One of the ways to control a variable might be just taking random samples, i.e., if you want to control for age, then take a rs from all age groups. Another example would be like a “control group” (placebo in a drug experiment).

Am I correct in the above examples? Also, I assume there are other standard techniques, could you please clarify how they work?

Karen says

Hi RK,

That’s a big question. Or rather, a small question with a big answer. I will see if I can write a post (or two or 10) explaining it.

rk says

Thanks Karen! Looking forward to it!!

Alyssa says

Hi Karen,

Thanks for this article! I just want to further clarify some of the discussion on dichotomous covariates/fixed factors. Specifically, I am running a chi squared with one dichotomous outcome and one dichotomous predictor- too see how well group membership of the two-level predictor discriminates groups membership of the two level outcome. Now, I want to ensure that gender (I think my covariate/fixed factor) does not modify the relationship of my predictor’s ability to discriminate membership of my outcome variable. To answer this question I plan to run the model where both genders are combined and separately for each gender. The answer is that discriminate ability of my predictor does depend on gender to some extent.

So my question is – is gender a fixed factor here? is there a better word for it? Random factor?

Thank you so much!

Alyssa

Karen says

Hi Alyssa,

Within the context of SPSS GLM, Gender is a fixed factor. Don’t make it random–that’s a whole other thing!

But if you’re doing a chi-square, Fixed Factor and covariate aren’t really issues. Just add it in as another variable

Kathy says

Easy to understand definition of Covariate for ANCOVA. Thank you!

Anika Kunz says

Hi,

I’m running an ANOVA with repeated measures. As I can see from the correllation matrix there are significant correllations between my possible cofounding variables and my dependent variable but here is my problem: if the possible co-variate is correllating with the dep. variable at time2 but not at time1, do I have to include it as a “normal” co-variate when I perform the GLM?

Thanks for any help.

Anika

Karen says

It would be a good idea to include a covariate*time interaction. That will allow the effect of the covariate to be different at time 1 and time 2.

Joanne says

Hi Karen,

I have a follow-up question please. I am looking at memory performance in young and older adults under two conditions: when a negative or a positive stereotype is activated. I’m using a mixed model to compare performance across time (before/after the intervention) across the two age groups and conditions.

I obtain a significant difference between the two age groups on levels of verbal IQ (NART scores) which is continuous, and hence a covariate (that exerts a significant effect).

The paper that I am basing my study on also obtains a difference between age groups over verbal IQ scores. They do not obtain a significant difference within age groups but between conditions, however, and so have not included it as a covariate. This seems wrong to me. Surely if a significant difference over a background variable occurs on one of your IVs you should include the covariate in the model, regardless of whether there’s a difference between groups on the second IV?

If you could clear this up for me I’d really appreciate it, as I am confused!

Thanks,

Joanne

Karen says

Hi Joanne,

I am missing something. Does verbal IQ relate to the DV? (It seems it would but you don’t mention that). So in this paper, the two stereotype condition groups have different verbal IQ scores, but age groups didn’t? It really comes down to whether the potential covariate is related to the DV.

norhawa says

Hi Karen,

Thanks for the information. The way you write it very clear and easy to understand. Thanks~

Karen says

Thanks!

zahir says

So can we say based on your answer that every confounding variable is a covariate but not every covariate is a confounding variable?

Zahir

Karen says

Depends on how you’re using them. 🙂

Alexa says

What is the difference between a confound and a covariate in simple terms?

Karen says

Alexa, that’s a really great question. A confound is a variable that is perfectly (or so near perfectly that you can’t distinguish) associated with another variable that you can’t tell their effects apart.

In most areas of the US, for example, neighborhood and school attended overlap so much because most kids from a neighborhood all go to the same school. So you couldn’t separate out the school effects on say, grade 3 test scores, from the neighborhood effects.

A covariate is a variable that affects the DV in addition to the IV. It doesn’t have to be correlated with the independent variable. If not, it may just explain some of the otherwise unexplained variation in the DV.

Karen

Emily says

Thanks Karen! I’m so happy this website exists.

I found this page because I am stuck on something related but at a way lower level (I am no statistician). I’m running a repeated measures ANOVA in SPSS, using GLM. I need to control for a between-subjects categorical variable that might be adding noise to the data and washing out any effects of my factors. I can’t figure out if the right way to do this is to put it in as a between-subjects factor, or pretend that is a continuous variable and put it in the covariate box. Can you help?

Karen says

Hi Emily,

Yes. In fact, there is already an article here on that exact topic. It’s the same in all SPSS glm procedures, whether you’re using univarate, repeated measures, etc. https://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/

Karen

Debra says

Hi Karen

I’m still a little confused on the same issue as Emily, despite reading the article you suggested. Although the suggested article seems to clearly spell out that any true categorical variable, including dichotomous variables, should be included in an ANOVA model as a fixed factor rather than a covariate, the article above states

“In our little math training example, you may be unable to pretest the participants. Maybe you can only get them for one session. But it’s quick and easy to ask them, even after the test, “Did you take Calculus?” It’s not as good of a control variable as a pretest score, but you can at least get at their previous math training…You’d use it in the model in the exact same way you would the pretest score. You’d just have to define it as categorical.”

It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor.

Sorry if this seems obvious, perhaps I’m getting caught up in the terminology too! Any clarification would be greatly appreciated, I’ve found your explanations to be more helpful than most!

Karen says

Hi Debra,

SPSS’s definitition of “Covariate” is “continuous predictor variable.” It’s definition of “fixed factor” is categorical predictor variable. (There’s actually more to this in comparing fixed and random factors, but that’s a tangent here).

“It’s the last sentence that I get stuck on – because if you included a dichotomous variable in the model in the exact same way you would a continuous pre-test score, you would include it as a covariate, not as a fixed factor. ”

I don’t mean

defineit the same way, I meanuse it as a control variablein the same way. I’m trying to separate out the use of the variable in the model (as something to control for vs. something about whose effect you have a hypothesis) from the way it was measured and therefore needs to be defined (categorical vs. continuous).So yes, it goes into Fixed Factors because it’s categorical.

And don’t apologize for getting confused with terminology–that’s my whole point. The inprecision of the terminology is what makes it so confusing! 🙂

Karen

Ayesha says

Hello Karen

I was going through the discussion and had same confusion. my study evaluates effectiveness of a school based program on preschoolers behavior problems. my intervention and control groups differ on strength of students in class measured in categories and fathers education also measured in categories. I need to see if ANCOVA results remain significant after controlling for these baseline differences. the covariate option in SPSS should be a continues measure. So how should I do the analysis with categorical variable. Please answer soon.

Karen says

Hi Ayesha,

If a control variable is simply a categorical variable, put it into “Fixed Factors” instead of Covariate in Univariate GLM. By default, SPSS will also add in an interaction term, but you can take that out in the Design dialog box.

fyi, if it’s helpful, we have a workshop available on demand that goes through all these details of SPSS GLM: http://theanalysisinstitute.com/spss-glm-ondemand-workshop/

FASASI R.A says

Pls what is the relationship between covariate partial eta squared in ancova result and partial eta squared of treatment and moderator variables. what is the implication of that of covariate which can be the pretest being higher than that of main treatment or moderator variable

Karen says

Hi Fasasi,

I don’t know–I’d need more information on the model. For example, is the covariate different from the moderator? Which interactions are being included?

Thanks,

Karen

Nur Barizah says

Thanks a lot Caren. Your notes were very helpful. I have been looking for the answers in tens of books for several months. Thank God, many of my uncertainties on GLM command are answered today in your site. FYI, in my area of study (accountancy), GLM command is almost nonexistent in literatures.

Karen says

Hi Nur,

I’m so glad.

Most of the time in the literature it will be called ANOVA, ANCOVA or linear regression. But they’re all the same model and all can be run in GLM.

Karen

Ben says

Very clear. Found the answer I was looking for. Thanks much! I’ll pass on word of your site.

Karen says

Thanks, Ben! Glad it was helpful.

Karen

Karen says

Hi Akinboboye,

I would suggest starting with the SPSS category link at the right.

If you need more help at the beginning level, I’d be happy to send you my book (I have a few extra copies) for the cost of shipping. Please email me directly.

If you want more help with the concepts discussed above, you really want the Running Regressions and ANCOVAs in SPSS GLM workshop. It walks you through the univariate GLM procedure step-by-step and shows where it’s the same and where it’s different from the regression procedure. You can get to that here: http://www.theanalysisinstitute.com/workshops/SPSS-GLM/index.html

It’s not running right now, but you can use our contact form to get access to it as a home study workshop.

Best,

Karen

Akinboboye joseph says

I really enjoy this write-up. Kindly send me details on how to use spss. Thanks