SPSS GLM or Regression? When to use each

Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions.  It is what I usually use.

But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other.  How do you decide when to use GLM and when to use Regression?

GLM has these options that Regression doesn’t:

1. It will dummy code categorical variables for you.  If you have only one or two binary categorical variables, this isn’t a huge advantage.  But if you have several, and many of them are multi-category, this is a big advantage, both as a time saver, and for getting an overall p-value for the variable as a whole.

2. You can add in interactions.  In Regression, you have to create each interaction as a separate variable.  Once again, this can become very tedious, especially if those interactions contain dummy variables.

Regression has these options that GLM doesn’t:

1. It automatically gives standardized regression coefficients.

2. It will do model selection procedures, such as stepwise regression and hierarchical model building, that allows you to enter variables in blocks.

3. It will do multicollinearity diagnostics.

These are really an advantage when your model is exploratory in nature and contains only continuous variables.  Of these three options, only the third is really useful when you are testing specific hypotheses that contain interactions and categorical predictors.

Remember, you can’t use standardized coefficients on dummy variables anyway (well, SPSS will let you, but they don’t mean anything).  And the stepwise procedures are only useful with truly exploratory analyses, and even then you need to be able to test the models on another data set.

So my approach is to generally use GLM for my regression analysis, then rerun the model in regression if I see a reason to be concerned about multicollinearity.

Edited to add:

A number of commenters below are wondering why the results aren’t matching between SPSS’s GLM and Linear Regression.

They will match if:

  1. You’re comparing apples to apples. Both procedures will give you a table of F statistics and can give a table of regression coefficients along with p-values, but they are labeled differently, look different, and don’t all appear by default. Make sure you’re not trying to compare p-values from regression coefficients in one procedure to the p-values from the F table in the other.GLM doesn’t give you the regression coefficients by default. You have to ask for them, and in GLM they’re called “Parameter Estimates” in the Options button.
  2. When you dummy code your variables yourself in Regression, you’re matching GLM’s default coding. If you have them backwards, everything will look different. For more info, see:

    Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups

 

Interpreting Linear Regression Coefficients: A Walk Through Output
Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Reader Interactions

Comments

  1. natalie says

    Sorry if it is a silly question: why you said “Make sure you’re not trying to compare p-values from regression coefficients in one to the p-values from the F table in the other. “? Thank you very much in advance!

  2. Amanda says

    Can I get a Cox & Snell R square and log likelihood in the GLM model method? Or how is variance in the model determined in the GLM?

    Thanks!

  3. Joshua says

    Hi Karen,

    I have a question about the similarity between the GLM procedure and regression in SPSS. When running my analyses, usually the results are the same across both methods (including with categorical x categorical interactions). However, things change once I add in categorical x continuous predictors. The interaction p-values remain the same across both GLM and regression but the main effects of the continuous variable begin to differ between the two. I wonder why this might be and which procedure should be used/results reported when they differ following the inclusion of categorical x continuous interactions.

    • Jon says

      Hi,

      I’m having exactly the same issue.

      I have a continuous DV, categorical IV, continuous IV, and I would like to include the interaction of IVs. The SPSS GLM and multiple regression procedures give different p-values for the continuous IV. The p-values for the categorical IV and the interaction term are the same across models. This discrepancy only occurs when the interaction term is included in the models; otherwise, the output of the two procedures matches.

      I was able to make the GLM results match the multiple regression, but I don’t understand why. If I enter the variable that I computed for the interaction term of the regression as a covariate in the ANCOVA (and thus do not ask SPSS to calculate the interaction itself), the GLM output matches the regression output.

      I’m really puzzled here. Why would providing the interaction term as a covariate in GLM work differently from asking SPSS to calculate the interaction via a custom model?

      If you have a moment to answer, Karen, thank you! If not, thanks for a very helpful website, anyway.

  4. Tony says

    Isn’t this part of the debate among the NFL deflate gate testimonies? Many are having trouble replicating the work of Exponent, the scientific analysis company. They said they used a linear mixed mode analysis and not mutivariable regression analysis. Is that case a good example of when to use which model?

  5. Paulette says

    Hi ! I ran an interaction both in regression mode and in univariate GLM

    The weird thing is that when I run it as a regression the interaction is significant but when I do it as a GLM is not. Is there a reason why is this happening ?

    Thanks

  6. Christopher Ibenegbu says

    How can I run Multiple Linear Regression analysis with Analysis of Covariance using SPSS?

  7. Ramona says

    Hi. I ran an Ordered Probit regression on SPSS to analyse the relationship between desirability rating for different vaccine attributes. The goodness of fit and test of parallel lines have very low p-values, meaning my model isnt appropriate? What does one do in such a case?

    • Karen says

      Hi Ramona,

      Yes, that’s what it means, basically. You have to be careful, though, with using p-values to test assumptions as large samples will make even small departures from the null significant.

      Even so, if you do conclude it isn’t appropriate, the best alternative is usually a generalized ordered regression

      This article is about logit, but you can do a probit link as well.

  8. Bob says

    Karen,
    I seem to remember in one of your discussions, that GLM handles missing data on a case-wise basis better(probably not the best word) versus the Logistic regression option in SPSS. I have a data set with many variables evaluating predictors on the dependent variable, but almost all cases have at least one variable with missing data.

    Am I remember this correctly?

    • Karen says

      Hi Bob,

      You’re close. Mixed models will retain whatever data is available for a cluster (in repeated measures a cluster is an individual unit). So if there are multiple responses for the same person, they don’t get dropped entirely.

      The GLM I’m referring to here is the general linear model, which isn’t appropriate for binar outcomes and has the same default mechanism for missing data as logistic regression.

      If predictors are missing, even mixed models are less likely to be helpful. You’ll probably need multiple imputation.

      Karen

  9. Darja Gutnick says

    hey there, that is very useful thanks. This also means that you need to get the same results out of regression and GLM with the same variables, right? If you don’t what might be the reason? I have sign. results with the GLM, but I can’t find sign. results with a linear regression. I made the interaction terms myself (w/o standardization) and included them into the regression.

  10. Maxi says

    Pls Karen, i’m very new to the subject topic but i greatly need your help. I’m checking the effect of storage time (0,2, 4 and 6 months) on the property(moisture content) of a sample when given two separate treatments-boiling and no boiling. For boiling, the moisture values are 0.9, 1.3, 1.5, 1.8 for no boiling, it is 0.2, 0.4, 0.6, 2.2. Pls i’ve got other properties to check for but pls show me how to analyse this one. I would greatly appreciate your help. I intend using SPSS which i just got. You could send the detailed screen shots to my mail maxillaboy@gmail.com

  11. Maxi says

    Pls help me. I’m trying to see the effect of storage time in months(0,2,4 and 6) on moisture content of a sample given two treatment (Boiling and no boiling). Could you help me on how to go about it? The independent variables are the month and treatment. While the dependent is moisture. It will include the interaction but i have no idea on how to go about it.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.