Regression models are just a subset of the General Linear Model, so you can use GLMs to analyze regressions. It is what I usually use. But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other. How do you decide when to use GLM and when to use Regression?

GLM has these options that Regression doesn’t:

1. It will dummy code categorical variables for you. If you have only one or two binary categorical variables, this isn’t a huge advantage. But if you have several, and many of them are multi-category, this is a big advantage, both as a time saver, and for getting an overall p-value for the variable as a whole.

2. You can add in interactions. In Regression, you have to create each interaction as a separate variable. Once again, this can become very tedious, especially if those interactions contain dummy variables.

Regression has these options that GLM doesn’t:

1. It automatically gives standardized regression coefficients.

2. It will do model selection procedures, such as stepwise regression and hierarchical model building, that allows you to enter variables in blocks.

3. It will do multicollinearity diagnostics.

These are really an advantage when your model is exploratory in nature and contains only continuous variables. Of these three options, only the third is really useful when you are testing specific hypotheses that contain interactions and categorical predictors. Remember, you can’t use standardized coefficients on dummy variables anyway (well, SPSS will let you, but they don’t mean anything). And the stepwise procedures are only useful with truly exploratory analyses, and even then you need to be able to test the models on another data set.

So my approach is to generally use GLM for my regression analysis, then rerun the model in regression if I see a reason to be concerned about multicollinearity.

{ 15 comments… read them below or add one }

Hi Karen,

I have a question about the similarity between the GLM procedure and regression in SPSS. When running my analyses, usually the results are the same across both methods (including with categorical x categorical interactions). However, things change once I add in categorical x continuous predictors. The interaction p-values remain the same across both GLM and regression but the main effects of the continuous variable begin to differ between the two. I wonder why this might be and which procedure should be used/results reported when they differ following the inclusion of categorical x continuous interactions.

Hi,

I’m having exactly the same issue.

I have a continuous DV, categorical IV, continuous IV, and I would like to include the interaction of IVs. The SPSS GLM and multiple regression procedures give different p-values for the continuous IV. The p-values for the categorical IV and the interaction term are the same across models. This discrepancy only occurs when the interaction term is included in the models; otherwise, the output of the two procedures matches.

I was able to make the GLM results match the multiple regression, but I don’t understand why. If I enter the variable that I computed for the interaction term of the regression as a covariate in the ANCOVA (and thus do not ask SPSS to calculate the interaction itself), the GLM output matches the regression output.

I’m really puzzled here. Why would providing the interaction term as a covariate in GLM work differently from asking SPSS to calculate the interaction via a custom model?

If you have a moment to answer, Karen, thank you! If not, thanks for a very helpful website, anyway.

Isn’t this part of the debate among the NFL deflate gate testimonies? Many are having trouble replicating the work of Exponent, the scientific analysis company. They said they used a linear mixed mode analysis and not mutivariable regression analysis. Is that case a good example of when to use which model?

Hi ! I ran an interaction both in regression mode and in univariate GLM

The weird thing is that when I run it as a regression the interaction is significant but when I do it as a GLM is not. Is there a reason why is this happening ?

Thanks

How can I run Multiple Linear Regression analysis with Analysis of Covariance using SPSS?

Hi Christopher,

We have a whole 3-hour workshop on this: Running Regressions and ANOVAs in SPSS GLM

Hi. I ran an Ordered Probit regression on SPSS to analyse the relationship between desirability rating for different vaccine attributes. The goodness of fit and test of parallel lines have very low p-values, meaning my model isnt appropriate? What does one do in such a case?

Hi Ramona,

Yes, that’s what it means, basically. You have to be careful, though, with using p-values to test assumptions as large samples will make even small departures from the null significant.

Even so, if you do conclude it isn’t appropriate, the best alternative is usually a generalized ordered regression

This article is about logit, but you can do a probit link as well.

Karen,

I seem to remember in one of your discussions, that GLM handles missing data on a case-wise basis better(probably not the best word) versus the Logistic regression option in SPSS. I have a data set with many variables evaluating predictors on the dependent variable, but almost all cases have at least one variable with missing data.

Am I remember this correctly?

Hi Bob,

You’re close. Mixed models will retain whatever data is available for a cluster (in repeated measures a cluster is an individual unit). So if there are multiple responses for the same person, they don’t get dropped entirely.

The GLM I’m referring to here is the general linear model, which isn’t appropriate for binar outcomes and has the same default mechanism for missing data as logistic regression.

If predictors are missing, even mixed models are less likely to be helpful. You’ll probably need multiple imputation.

Karen

hey there, that is very useful thanks. This also means that you need to get the same results out of regression and GLM with the same variables, right? If you don’t what might be the reason? I have sign. results with the GLM, but I can’t find sign. results with a linear regression. I made the interaction terms myself (w/o standardization) and included them into the regression.

Pls Karen, i’m very new to the subject topic but i greatly need your help. I’m checking the effect of storage time (0,2, 4 and 6 months) on the property(moisture content) of a sample when given two separate treatments-boiling and no boiling. For boiling, the moisture values are 0.9, 1.3, 1.5, 1.8 for no boiling, it is 0.2, 0.4, 0.6, 2.2. Pls i’ve got other properties to check for but pls show me how to analyse this one. I would greatly appreciate your help. I intend using SPSS which i just got. You could send the detailed screen shots to my mail maxillaboy@gmail.com

Pls help me. I’m trying to see the effect of storage time in months(0,2,4 and 6) on moisture content of a sample given two treatment (Boiling and no boiling). Could you help me on how to go about it? The independent variables are the month and treatment. While the dependent is moisture. It will include the interaction but i have no idea on how to go about it.

I ran logistic regression using regression and GLM .. got different answers. Why is that?

In SPSS? You need genlin or binary logistic to run logistic regression….