You think a linear regression might be an appropriate statistical analysis for your data, but you’re not entirely sure. What should you check before running your model to find out?

*by Manolo Romero Escobar*

General Linear Model (GLM) is a tool used to understand and analyse linear relationships among variables. It is an umbrella term for many techniques that are taught in most statistics courses: ANOVA, multiple regression, etc.

In its simplest form it describes the relationship between two variables, “y” (dependent variable, outcome, and so on) and “x” (independent variable, predictor, etc). These variables could be both categorical (how many?), both continuous (how much?) or one of each.

Moreover, there can be more than one variable on each side of the relationship. One convention is to use capital letters to refer to multiple variables. Thus ** Y** would mean multiple dependent variables and

I would love to promise that the reason there is so much confusing terminology in statistics is NOT because statisticians like to laugh at hapless users of statistics as they try to figure out already confusing concepts. See my post on the different meanings of the term “level” in statistics. (There are other examples–how many different meanings does “beta” have in statistics? I can think of three off the top of my head. That will have to be another post).

But today I talk about the difference between multivariate and multiple, as they relate to regression.

A regression analysis with one dependent variable and 8 independent variables is NOT a multivariate regression. It’s a multiple regression. Multivariate analysis ALWAYS refers to the dependent variable.

So when you’re in SPSS, choose univariate GLM for this model, not multivariate.

I know what you’re thinking–but what about multivariate analyses like cluster analysis and factor analysis, where there is no dependent variable, per se?

Well, I respond, it’s not really about dependency. It’s about which variable’s variance is being analyzed. A regression model is really about the dependent variable. We’re just using the predictors to model the mean and the variation in the dependent variable.

Note: this is actually a situation where the subtle differences in what we call that Y variable can help. Calling it the outcome or response variable, rather than dependent, is more applicable to something like factor analysis.

So when to choose multivariate GLM? When you’re jointly modeling the variation in multiple response variables.

