Statistical models, such as general linear models (linear regression, ANOVA, mixed models) and generalized linear models (logistic, Poisson, proportional hazard regression, etc.) all have the same general form. On the left side of the equation is one or more response variables, Y. On the right hand side is one or more predictor variables, X, and their coefficients, B. X, the variables on the right hand side can have many forms and are called by many names.

There are subtle distinctions in the meanings of these names, but they are often used interchangeably. Even worse, statistical software packages use different names for similar concepts, even among their own procedures. This quest for accuracy often renders confusion. (It’s hard enough without switching the words!).

Here are some common terms that all refer to a variable in a model that is proposed to affect or predict another variable. There are slight differences in the meanings of these terms, but they are often used interchangeably.

**Independent Variable**: It*implies*causality: the independent variable affects the dependent variable. Used predominantly in ANOVA, but often in regression as well. It can be either continuous or categorical.

**Predictor Variable**: It does not imply causality. A predictor variable is simply useful for predicting the value of the response variable. Used predominantly in regression. Predictor variables can be continuous or categorical.

**Predictor**: Same as Predictor Variable.

**Covariate**: A*continuous*predictor variable. Used in both ANCOVA (analysis of covariance) and regression. Some people use this to refer to all predictor variables in regression, but it really means continuous predictors. Adding a covariate to ANOVA (analysis of variance) turns it into ANCOVA (analysis of covariance).

**Factor**: A*categorical*predictor variable. It may or may not indicate a cause/effect relationship with the response variable (this depends on the study design, not the analysis). Independent variables in ANOVA are almost always called factors. In regression, they are often referred to as indicator variables, categorical predictors, or dummy variables. They are all the same thing in this context.

**Grouping Variable**: Same as a factor. Used in SPSS in the independent samples t-test.

**Fixed factor**: A categorical independent variable in which the specific values of the categories are specific and important, often chosen by the experimenter. Examples include experimental treatments or demographic categories, such as sex and race. If you’re not doing a mixed model (and you should know if you are), all your factors are fixed factors. For a more thorough explanation of fixed and random factors, see Specifying Fixed and Random Factors in Mixed or Multi-Level Models

**Random factor**: A categorical independent variable in which the values of the categories were randomly assigned. Generally used in mixed modeling. Examples include subjects or random blocks. For a more thorough explanation of fixed and random factors, see Specifying Fixed and Random Factors in Mixed or Multi-Level Models

**Dummy variable**: A categorical variable that has been dummy coded. Dummy coding (also called indicator coding) is usually used in regression models, but not ANOVA. A dummy variable can have only two values: 0 and 1. When a categorical variable has more than two values, it is recoded into multiple dummy variables.

**Indicator variable**: See dummy variable.

{ 8 comments… read them below or add one }

Hi,

i want to ask a question concerned with ordinal logistic regression model

1. it is more difficult to estimate the parameters calculation by hand

2. artificial data on ordinal logistic regression if avaialble

3. package to do this data

with regards

thanks

Hi what onther name of variable?

Thanks for this. When coding sports contest, I use “dummy variable” values of 1 for first team, -1 for opponent, and 0 for others. I’ve found that this reduces standard error relative to using 2 categorical variables, even though they are functionally equivalent.

Hi Ben,

You’re right, that’s functionally equivalent, although it does change interpretation of the coefficients. That’s generally called “effect coding” and it has real advantages, especially when there are interactions in the model.

Karen

This is very helpful, however there is one part that is slightly misleading. A dummy variable can be many more values than just 0 or 1. For example, simple contrast coding involves creating dummy variables such that, if you have k groups, you would make the observations in the group have a dummy variable value of (k-1)/k, and all the other observations have a dummy variable value of -1/k. Perhaps a better way of explaining the dummy variable would be to say that it is always one of two values (in the case of binary coding, either 1 or 0, and in the case of simple contrast coding, either (k-1)/k or -1/k). The reason this is important is because if your multiple regression model has an interaction term between two categorical variables, then coding with zeroes and ones does not work.

Hi Anthony,

I’ve never heard of any contrast coding other than 0/1 being called dummy coding. There are many ways of coding contrasts other than this one.

And regression models with interactions terms between two categorical variables does work with dummy coding. You just have to know how to interpret it. You’re right, though, that the interpretation it gives may not be the best choice to answer your research question and another coding scheme may work better.

Thank you so much for the information! Almost everything was pretty clear. However, I didn’t get the difference between predictive and independent variable.

As far as I understood a predictive variable in the following equation would be X1 as well as X2.

Y = aX1 + cX2

But this does not mean they are independent between each other, right?

Then, is this correct? ‘All independent variables are predictives but the opposite does not need to be true.’

Thanks!

Hi Lara,

You’re right–independent variable doesn’t mean independent of each other. It’s just in relation to the dependent variable. The idea is that an independent variable couldn’t possibly be affected by the dependent variable. The direction is very clear.

The term Predictor variable hedges a little. Use it when you can’t be sure the direction of causality.

Karen

{ 4 trackbacks }