SPSS GLM or Regression? When to use each

Regression models are just a subset of the General Linear Model, so you can use GLM procedures to run regressions. It is what I usually use.

But in SPSS there are options available in the GLM and Regression procedures that aren’t available in the other. How do you decide when to use GLM and when to use Regression?

GLM has these options that Regression doesn’t:

1. It will dummy code categorical variables for you. If you have only one or two binary categorical variables, this isn’t a huge advantage. But if you have several, and many of them are multi-category, this is a big advantage, both as a time saver, and for getting an overall p-value for the variable as a whole.

2. You can add in interactions. In Regression, you have to create each interaction as a separate variable. Once again, this can become very tedious, especially if those interactions contain dummy variables.

Regression has these options that GLM doesn’t:

1. It automatically gives standardized regression coefficients.

2. It will do model selection procedures, such as stepwise regression and hierarchical model building, that allows you to enter variables in blocks.

3. It will do multicollinearity diagnostics.

These are really an advantage when your model is exploratory in nature and contains only continuous variables. Of these three options, only the third is really useful when you are testing specific hypotheses that contain interactions and categorical predictors.

Remember, you can’t use standardized coefficients on dummy variables anyway (well, SPSS will let you, but they don’t mean anything). And the stepwise procedures are only useful with truly exploratory analyses, and even then you need to be able to test the models on another data set.

So my approach is to generally use GLM for my regression analysis, then rerun the model in regression if I see a reason to be concerned about multicollinearity.

Edited to add:

A number of commenters below are wondering why the results aren’t matching between SPSS’s GLM and Linear Regression.

They will match if:

You’re comparing apples to apples. Both procedures will give you a table of F statistics and can give a table of regression coefficients along with p-values, but they are labeled differently, look different, and don’t all appear by default. Make sure you’re not trying to compare p-values from regression coefficients in one procedure to the p-values from the F table in the other.GLM doesn’t give you the regression coefficients by default. You have to ask for them, and in GLM they’re called “Parameter Estimates” in the Options button.
When you dummy code your variables yourself in Regression, you’re matching GLM’s default coding. If you have them backwards, everything will look different. For more info, see:

Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups

Interpreting Linear Regression Coefficients: A Walk Through Output

Learn the approach for understanding coefficients in that regression as we walk through output of a model that includes numerical and categorical predictors and an interaction.

Comments

natalie says

April 9, 2023 at 5:25 pm

Sorry if it is a silly question: why you said “Make sure you’re not trying to compare p-values from regression coefficients in one to the p-values from the F table in the other. “? Thank you very much in advance!

Reply
- Karen Grace-Martin says
  
  April 11, 2023 at 10:00 am
  
  Because the regression coefficients and the F table are testing different null hypotheses.
  
  Reply
Amanda says

January 19, 2019 at 2:12 pm

Can I get a Cox & Snell R square and log likelihood in the GLM model method? Or how is variance in the model determined in the GLM?

Thanks!

Reply
- Karen Grace-Martin says
  
  April 11, 2023 at 10:01 am
  
  By GLM, SPSS means GENERAL linear model. Cox & Snell R square and log likelihood are used for GENERALIZED linear models, which are different. See https://www.theanalysisfactor.com/confusing-statistical-term-7-glm/
  
  Reply
Joshua says

January 25, 2016 at 4:51 pm

Hi Karen,

I have a question about the similarity between the GLM procedure and regression in SPSS. When running my analyses, usually the results are the same across both methods (including with categorical x categorical interactions). However, things change once I add in categorical x continuous predictors. The interaction p-values remain the same across both GLM and regression but the main effects of the continuous variable begin to differ between the two. I wonder why this might be and which procedure should be used/results reported when they differ following the inclusion of categorical x continuous interactions.

Reply
- Jon says
  
  May 23, 2016 at 12:43 am
  
  Hi,
  
  I’m having exactly the same issue.
  
  I have a continuous DV, categorical IV, continuous IV, and I would like to include the interaction of IVs. The SPSS GLM and multiple regression procedures give different p-values for the continuous IV. The p-values for the categorical IV and the interaction term are the same across models. This discrepancy only occurs when the interaction term is included in the models; otherwise, the output of the two procedures matches.
  
  I was able to make the GLM results match the multiple regression, but I don’t understand why. If I enter the variable that I computed for the interaction term of the regression as a covariate in the ANCOVA (and thus do not ask SPSS to calculate the interaction itself), the GLM output matches the regression output.
  
  I’m really puzzled here. Why would providing the interaction term as a covariate in GLM work differently from asking SPSS to calculate the interaction via a custom model?
  
  If you have a moment to answer, Karen, thank you! If not, thanks for a very helpful website, anyway.
  
  Reply
  - Katherine says
    
    January 10, 2019 at 12:07 pm
    
    Hi Joshua & Jon,
    
    I was having this same problem and I think I figured it out. If the Categorical binomial variable was coded with a “dummy coding scheme” (condition 1 = 0, condition 2 = 1) then I was having this problem. But if I used an “effect coding scheme” for the same data (condition 1 = -1, condition 2 = 1), then my GLM and my regression gave me the same p-values. Read more about it here: https://methodology.psu.edu/media/techreports/12-120.pdf
    
    Reply
    - Karen Grace-Martin says
      
      January 10, 2019 at 12:32 pm
      
      It’s not really about the effect vs. dummy coding, it’s about the defaults in SPSS’s GLM procedure. Because GLM will automatically dummy code for you, it has to decide which group to make the “reference group.” It’s the one that comes last alphabetically.
      
      I wrote about it here:
      https://www.theanalysisfactor.com/dummy-coding-in-spss-glm/
      https://www.theanalysisfactor.com/dummy-coding-in-spss-glm-more-on-fixed-factors-covariates-and-reference-groups-part-2/
      
      Reply
  - Karen Grace-Martin says
    
    April 11, 2023 at 10:07 am
    
    The regression coefficients table will match if your categorical IV is dummy coded according to the GLM default. It will make the value that is last alphabetically the reference group (code it as 0). If your categorical variable is already coded 0/1, that means it will recode the 0 as 1 and the 1 as 0.
    
    Reply
Tony says

August 12, 2015 at 12:18 am

Isn’t this part of the debate among the NFL deflate gate testimonies? Many are having trouble replicating the work of Exponent, the scientific analysis company. They said they used a linear mixed mode analysis and not mutivariable regression analysis. Is that case a good example of when to use which model?

Reply
Paulette says

June 19, 2015 at 12:09 pm

Hi ! I ran an interaction both in regression mode and in univariate GLM

The weird thing is that when I run it as a regression the interaction is significant but when I do it as a GLM is not. Is there a reason why is this happening ?

Thanks

Reply
Christopher Ibenegbu says

June 9, 2015 at 3:22 pm

How can I run Multiple Linear Regression analysis with Analysis of Covariance using SPSS?

Reply
- Karen says
  
  June 23, 2015 at 12:30 pm
  
  Hi Christopher,
  
  We have a whole 3-hour workshop on this: Running Regressions and ANOVAs in SPSS GLM
  
  Reply
Ramona says

February 6, 2015 at 7:26 am

Hi. I ran an Ordered Probit regression on SPSS to analyse the relationship between desirability rating for different vaccine attributes. The goodness of fit and test of parallel lines have very low p-values, meaning my model isnt appropriate? What does one do in such a case?

Reply
- Karen says
  
  February 6, 2015 at 5:10 pm
  
  Hi Ramona,
  
  Yes, that’s what it means, basically. You have to be careful, though, with using p-values to test assumptions as large samples will make even small departures from the null significant.
  
  Even so, if you do conclude it isn’t appropriate, the best alternative is usually a generalized ordered regression
  
  This article is about logit, but you can do a probit link as well.
  
  Reply
Bob says

February 2, 2015 at 11:25 am

Karen,
I seem to remember in one of your discussions, that GLM handles missing data on a case-wise basis better(probably not the best word) versus the Logistic regression option in SPSS. I have a data set with many variables evaluating predictors on the dependent variable, but almost all cases have at least one variable with missing data.

Am I remember this correctly?

Reply
- Karen says
  
  February 2, 2015 at 5:00 pm
  
  Hi Bob,
  
  You’re close. Mixed models will retain whatever data is available for a cluster (in repeated measures a cluster is an individual unit). So if there are multiple responses for the same person, they don’t get dropped entirely.
  
  The GLM I’m referring to here is the general linear model, which isn’t appropriate for binar outcomes and has the same default mechanism for missing data as logistic regression.
  
  If predictors are missing, even mixed models are less likely to be helpful. You’ll probably need multiple imputation.
  
  Karen
  
  Reply
Darja Gutnick says

April 15, 2014 at 7:14 am

hey there, that is very useful thanks. This also means that you need to get the same results out of regression and GLM with the same variables, right? If you don’t what might be the reason? I have sign. results with the GLM, but I can’t find sign. results with a linear regression. I made the interaction terms myself (w/o standardization) and included them into the regression.

Reply
Maxi says

April 10, 2014 at 1:23 am

Pls Karen, i’m very new to the subject topic but i greatly need your help. I’m checking the effect of storage time (0,2, 4 and 6 months) on the property(moisture content) of a sample when given two separate treatments-boiling and no boiling. For boiling, the moisture values are 0.9, 1.3, 1.5, 1.8 for no boiling, it is 0.2, 0.4, 0.6, 2.2. Pls i’ve got other properties to check for but pls show me how to analyse this one. I would greatly appreciate your help. I intend using SPSS which i just got. You could send the detailed screen shots to my mail maxillaboy@gmail.com

Reply
Maxi says

April 10, 2014 at 12:52 am

Pls help me. I’m trying to see the effect of storage time in months(0,2,4 and 6) on moisture content of a sample given two treatment (Boiling and no boiling). Could you help me on how to go about it? The independent variables are the month and treatment. While the dependent is moisture. It will include the interaction but i have no idea on how to go about it.

Reply
bacharach says

March 7, 2014 at 3:11 pm

I ran logistic regression using regression and GLM .. got different answers. Why is that?

Reply
- Karen says
  
  March 10, 2014 at 5:00 pm
  
  In SPSS? You need genlin or binary logistic to run logistic regression….
  
  Reply

Edited to add:

Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups

Reader Interactions

Comments

Leave a Reply Cancel reply