Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions. It is what I usually use.
But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other. How do you decide when to use GLM and when to use Regression?
GLM has these options that Regression doesn’t:
1. It will dummy code categorical variables for you. If you have only one or two binary categorical variables, this isn’t a huge advantage. But if you have several, and many of them are multi-category, this is a big advantage, both as a time saver, and for getting an overall p-value for the variable as a whole.
2. You can add in interactions. In Regression, you have to create each interaction as a separate variable. Once again, this can become very tedious, especially if those interactions contain dummy variables.
Regression has these options that GLM doesn’t:
1. It automatically gives standardized regression coefficients.
2. It will do model selection procedures, such as stepwise regression and hierarchical model building, that allows you to enter variables in blocks.
3. It will do multicollinearity diagnostics.
These are really an advantage when your model is exploratory in nature and contains only continuous variables. Of these three options, only the third is really useful when you are testing specific hypotheses that contain interactions and categorical predictors.
Remember, you can’t use standardized coefficients on dummy variables anyway (well, SPSS will let you, but they don’t mean anything). And the stepwise procedures are only useful with truly exploratory analyses, and even then you need to be able to test the models on another data set.
So my approach is to generally use GLM for my regression analysis, then rerun the model in regression if I see a reason to be concerned about multicollinearity.
Edited to add:
A number of commenters below are wondering why the results aren’t matching between SPSS’s GLM and Linear Regression.
They will match if:
- You’re comparing apples to apples. Both procedures will give you a table of F statistics and can give a table of regression coefficients along with p-values, but they are labeled differently, look different, and don’t all appear by default. Make sure you’re not trying to compare p-values from regression coefficients in one procedure to the p-values from the F table in the other.GLM doesn’t give you the regression coefficients by default. You have to ask for them, and in GLM they’re called “Parameter Estimates” in the Options button.
- When you dummy code your variables yourself in Regression, you’re matching GLM’s default coding. If you have them backwards, everything will look different. For more info, see:
Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups
Sorry if it is a silly question: why you said “Make sure you’re not trying to compare p-values from regression coefficients in one to the p-values from the F table in the other. “? Thank you very much in advance!
Karen Grace-Martin says
Because the regression coefficients and the F table are testing different null hypotheses.
Can I get a Cox & Snell R square and log likelihood in the GLM model method? Or how is variance in the model determined in the GLM?
Karen Grace-Martin says
By GLM, SPSS means GENERAL linear model. Cox & Snell R square and log likelihood are used for GENERALIZED linear models, which are different. See https://www.theanalysisfactor.com/confusing-statistical-term-7-glm/
I have a question about the similarity between the GLM procedure and regression in SPSS. When running my analyses, usually the results are the same across both methods (including with categorical x categorical interactions). However, things change once I add in categorical x continuous predictors. The interaction p-values remain the same across both GLM and regression but the main effects of the continuous variable begin to differ between the two. I wonder why this might be and which procedure should be used/results reported when they differ following the inclusion of categorical x continuous interactions.
I’m having exactly the same issue.
I have a continuous DV, categorical IV, continuous IV, and I would like to include the interaction of IVs. The SPSS GLM and multiple regression procedures give different p-values for the continuous IV. The p-values for the categorical IV and the interaction term are the same across models. This discrepancy only occurs when the interaction term is included in the models; otherwise, the output of the two procedures matches.
I was able to make the GLM results match the multiple regression, but I don’t understand why. If I enter the variable that I computed for the interaction term of the regression as a covariate in the ANCOVA (and thus do not ask SPSS to calculate the interaction itself), the GLM output matches the regression output.
I’m really puzzled here. Why would providing the interaction term as a covariate in GLM work differently from asking SPSS to calculate the interaction via a custom model?
If you have a moment to answer, Karen, thank you! If not, thanks for a very helpful website, anyway.
Hi Joshua & Jon,
I was having this same problem and I think I figured it out. If the Categorical binomial variable was coded with a “dummy coding scheme” (condition 1 = 0, condition 2 = 1) then I was having this problem. But if I used an “effect coding scheme” for the same data (condition 1 = -1, condition 2 = 1), then my GLM and my regression gave me the same p-values. Read more about it here: https://methodology.psu.edu/media/techreports/12-120.pdf
Karen Grace-Martin says
It’s not really about the effect vs. dummy coding, it’s about the defaults in SPSS’s GLM procedure. Because GLM will automatically dummy code for you, it has to decide which group to make the “reference group.” It’s the one that comes last alphabetically.
I wrote about it here:
Karen Grace-Martin says
The regression coefficients table will match if your categorical IV is dummy coded according to the GLM default. It will make the value that is last alphabetically the reference group (code it as 0). If your categorical variable is already coded 0/1, that means it will recode the 0 as 1 and the 1 as 0.
Isn’t this part of the debate among the NFL deflate gate testimonies? Many are having trouble replicating the work of Exponent, the scientific analysis company. They said they used a linear mixed mode analysis and not mutivariable regression analysis. Is that case a good example of when to use which model?
Hi ! I ran an interaction both in regression mode and in univariate GLM
The weird thing is that when I run it as a regression the interaction is significant but when I do it as a GLM is not. Is there a reason why is this happening ?
Christopher Ibenegbu says
How can I run Multiple Linear Regression analysis with Analysis of Covariance using SPSS?
We have a whole 3-hour workshop on this: Running Regressions and ANOVAs in SPSS GLM
Hi. I ran an Ordered Probit regression on SPSS to analyse the relationship between desirability rating for different vaccine attributes. The goodness of fit and test of parallel lines have very low p-values, meaning my model isnt appropriate? What does one do in such a case?
Yes, that’s what it means, basically. You have to be careful, though, with using p-values to test assumptions as large samples will make even small departures from the null significant.
Even so, if you do conclude it isn’t appropriate, the best alternative is usually a generalized ordered regression
This article is about logit, but you can do a probit link as well.
I seem to remember in one of your discussions, that GLM handles missing data on a case-wise basis better(probably not the best word) versus the Logistic regression option in SPSS. I have a data set with many variables evaluating predictors on the dependent variable, but almost all cases have at least one variable with missing data.
Am I remember this correctly?
You’re close. Mixed models will retain whatever data is available for a cluster (in repeated measures a cluster is an individual unit). So if there are multiple responses for the same person, they don’t get dropped entirely.
The GLM I’m referring to here is the general linear model, which isn’t appropriate for binar outcomes and has the same default mechanism for missing data as logistic regression.
If predictors are missing, even mixed models are less likely to be helpful. You’ll probably need multiple imputation.
Darja Gutnick says
hey there, that is very useful thanks. This also means that you need to get the same results out of regression and GLM with the same variables, right? If you don’t what might be the reason? I have sign. results with the GLM, but I can’t find sign. results with a linear regression. I made the interaction terms myself (w/o standardization) and included them into the regression.
Pls Karen, i’m very new to the subject topic but i greatly need your help. I’m checking the effect of storage time (0,2, 4 and 6 months) on the property(moisture content) of a sample when given two separate treatments-boiling and no boiling. For boiling, the moisture values are 0.9, 1.3, 1.5, 1.8 for no boiling, it is 0.2, 0.4, 0.6, 2.2. Pls i’ve got other properties to check for but pls show me how to analyse this one. I would greatly appreciate your help. I intend using SPSS which i just got. You could send the detailed screen shots to my mail firstname.lastname@example.org
Pls help me. I’m trying to see the effect of storage time in months(0,2,4 and 6) on moisture content of a sample given two treatment (Boiling and no boiling). Could you help me on how to go about it? The independent variables are the month and treatment. While the dependent is moisture. It will include the interaction but i have no idea on how to go about it.
I ran logistic regression using regression and GLM .. got different answers. Why is that?
In SPSS? You need genlin or binary logistic to run logistic regression….