Confusing Statistical Terms #1: The Many Names of Independent Variables

by Karen

FacebookTwitterGoogle+Share

Statistical models, such as general linear models (linear regression, ANOVA, mixed models) and generalized linear models (logistic, Poisson, proportional hazard regression, etc.) all have the same general form.  On the left side of the equation is one or more response variables, Y.  On the right hand side is one or more predictor variables, X,  and their coefficients, BX, the variables on the right hand side can have many forms and are called by many names.

There are subtle distinctions in the meanings of these names, but they are often used interchangeably. Even worse, statistical software packages use different names for similar concepts, even among their own procedures.  This quest for accuracy often renders confusion.  (It’s hard enough without switching the words!).

Here are some common terms that all refer to a variable in a model that is proposed to affect or predict another variable.  There are slight differences in the meanings of these terms, but they are often used interchangeably.

  • Independent Variable: It implies causality:  the independent variable affects the dependent variable.  Used predominantly in ANOVA, but often in regression as well.  It can be either continuous or categorical.
  • Predictor Variable: It does not imply causality.  A predictor variable is simply useful for predicting the value of the response variable.  Used predominantly in regression.  Predictor variables can be continuous or categorical.
  • Predictor: Same as Predictor Variable.
  • Covariate: A continuous predictor variable.  Used in both ANCOVA (analysis of covariance) and regression.  Some people use this to refer to all predictor variables in regression, but it really means continuous predictors.  Adding a covariate to ANOVA (analysis of variance) turns it into ANCOVA (analysis of covariance).
  • Factor:  A categorical predictor variable.  It may or may not indicate a cause/effect relationship with the response variable (this depends on the study design, not the analysis).  Independent variables in ANOVA are almost always called factors.  In regression, they are often referred to as indicator variables, categorical predictors, or dummy variables.  They are all the same thing in this context.
  • Grouping Variable: Same as a factor.  Used in SPSS in the independent samples t-test.
  • Fixed factor: A categorical independent variable in which the specific values of the categories are specific and important, often chosen by the experimenter.  Examples include experimental treatments or demographic categories, such as sex and race.  If you’re not doing a mixed model (and you should know if you are), all your factors are fixed factors.  For a more thorough explanation of fixed and random factors, see Specifying Fixed and Random Factors in Mixed or Multi-Level Models
  • Dummy variable: A categorical variable that has been dummy coded.  Dummy coding (also called indicator coding) is usually used in regression models, but not ANOVA.  A dummy variable can have only two values: 0 and 1.  When a categorical variable has more than two values, it is recoded into multiple dummy variables.
  • Indicator variable: See dummy variable.


Bookmark and Share

{ 7 comments… read them below or add one }

Davie Damiano April 26, 2015 at 6:11 am

Hi what onther name of variable?

Reply

ben morris March 11, 2014 at 11:45 pm

Thanks for this. When coding sports contest, I use “dummy variable” values of 1 for first team, -1 for opponent, and 0 for others. I’ve found that this reduces standard error relative to using 2 categorical variables, even though they are functionally equivalent.

Reply

Karen March 12, 2014 at 9:48 am

Hi Ben,

You’re right, that’s functionally equivalent, although it does change interpretation of the coefficients. That’s generally called “effect coding” and it has real advantages, especially when there are interactions in the model.

Karen

Reply

Anthony Gambino November 3, 2013 at 2:22 am

This is very helpful, however there is one part that is slightly misleading. A dummy variable can be many more values than just 0 or 1. For example, simple contrast coding involves creating dummy variables such that, if you have k groups, you would make the observations in the group have a dummy variable value of (k-1)/k, and all the other observations have a dummy variable value of -1/k. Perhaps a better way of explaining the dummy variable would be to say that it is always one of two values (in the case of binary coding, either 1 or 0, and in the case of simple contrast coding, either (k-1)/k or -1/k). The reason this is important is because if your multiple regression model has an interaction term between two categorical variables, then coding with zeroes and ones does not work.

Reply

Karen November 8, 2013 at 11:41 am

Hi Anthony,

I’ve never heard of any contrast coding other than 0/1 being called dummy coding. There are many ways of coding contrasts other than this one.

And regression models with interactions terms between two categorical variables does work with dummy coding. You just have to know how to interpret it. You’re right, though, that the interpretation it gives may not be the best choice to answer your research question and another coding scheme may work better.

Reply

Lara March 14, 2013 at 5:36 am

Thank you so much for the information! Almost everything was pretty clear. However, I didn’t get the difference between predictive and independent variable.
As far as I understood a predictive variable in the following equation would be X1 as well as X2.

Y = aX1 + cX2

But this does not mean they are independent between each other, right?
Then, is this correct? ‘All independent variables are predictives but the opposite does not need to be true.’

Thanks!

Reply

Karen March 15, 2013 at 11:06 am

Hi Lara,

You’re right–independent variable doesn’t mean independent of each other. It’s just in relation to the dependent variable. The idea is that an independent variable couldn’t possibly be affected by the dependent variable. The direction is very clear.

The term Predictor variable hedges a little. Use it when you can’t be sure the direction of causality.

Karen

Reply

Leave a Comment

Please note that Karen receives hundreds of comments at The Analysis Factor website each week. Since Karen is also busy teaching workshops, consulting with clients, and running a membership program, she seldom has time to respond to these comments anymore. If you have a question to which you need a timely response, please check out our low-cost monthly membership program, or sign-up for a quick question consultation.

{ 4 trackbacks }

Previous post:

Next post: