Can I use SPSS MIXED models for (a) ordinal logistic regression, and (b) multi-nomial logistic regression?

Every once in a while I get emailed a question that I think others will find helpful. This is definitely one of them.

My answer:

No.

(And by the way, this is all true in SAS as well. I’ll include the SAS versions in parentheses).

You can think of SPSS Mixed (SAS proc mixed) as the clustered-data version of SPSS GLM (proc glm). They have a lot of similarities in both their syntax and the kinds of models they can run.

Any model you can run in GLM, you can run in Mixed (but not vice-versa).

But both require an outcome variable that is unbounded, continuous, and measured on an interval or ratio scale.

So logistic regression, along with other generalized linear models, is out.

But there is another option (or two, depending on which version of SPSS you have).

You can run a Generalized Estimating Equation model for a repeated measures logistic regression using GEE (proc genmod in SAS). It has a repeated statement, and can run equivalent models to a model in Mixed with a repeated statement.

These are called population averaged or marginal models in both procedures, because you’re fitting a single model to all clusters, but controlling for within-cluster correlation.

In contrast are true Mixed Models, which actually fit a variance parameter for random effects, usually random intercepts and slopes. Rather than just control for within-cluster similarity in responses, they model it. Mixed models are run in Mixed using the Random statement.

(One of the reasons this gets so confusing is that for some designs, you can get the *exact same* results with either type of model. But they’re taking different routes to the same destination).

Mixed Models have a lot more flexibility than Population Averaged Models–you can, for example, run a 3-level mixed model, but Population Averaged Models are restricted to two levels.

To run a true Mixed Model for logistic regression, you need to run a Generalized Linear Mixed Model using the GLMM procedure, which is only available as of version 19.

(In SAS, use proc glimmix).

If you want to learn more about Mixed Models, check out our webinar recording: Random Intercept and Random Slope Models. It’s free.

{ 39 comments… read them below or add one }

Hi, you wrote: “to run a true Mixed Model for logistic regression, you need to run a Generalized Linear Mixed Model using the GLMM procedure, which is only available as of version 19″. Well, i have this version, and i need to run a mixed model of logistic regression. However, i don’t really know what to do in the first window of “data structure” – i don’t have any repeated measure, just have subject ID, and one random effect, which was a clinic location. I don’t think it was truly random though because i “handpicked” the clinics from the whole city to match my needs.

The problem is a reviewer to my manuscript wants me to account for the clinics in my logistic regression models as a random effect.

Can you direct me somewhere where i can get an “how to” explanation?

Thanks so much.

Hi Hilit,

First, make sure you really want to make clinic random. I’m not sure what “match my needs” implies, but if you actually want to compare them (Clinic A, with these characteristics, has a mean 3 points higher than Clinic F, with these characteristics), then make them a fixed effect. If the point is to control for multiple subjects at each clinic, then you do want it random.

Second, I don’t ever use the SPSS menus for mixed models. As you’ve noticed, they are completely unintuitive. I can’t imagine what they’re asking with “data structure”. It’s really easy to mis-specify a mixed model and not realize it, so you really need the control of using syntax.

Third, if all you need to do is add a random effect of clinic (which is what it looks like), that is the same as adding a random intercept for clinic. I recently did a webinar on random intercept and random slope models, and demonstrated using SPSS. It’s not exactly what you need help with, but it might clear some things up. I address there treating the clusters as a fixed or random effect, and show the syntax for specifying the random intercept. The syntax is slightly different in GLMM than in mixed, but it’s very similar.

You can get the recording at http://www.theanalysisfactor.com/learning/webinar21.htmlIt’s free.

Hi Karen,

I will look at the webinar as soon as i can (children and everything). Meanwhile, can you just tell me if what you suggest there can be done in a regular logistic regression analysis in SPSS (i mean not through the various GLM’s but using logistic regression command)?

Thanks

Hi Hilit,

If you treat clinic as a fixed effect, you can. Just enter clinic as a categorical predictor variable into your model. This will only work, though, if you have no predictors measured at the clinic level (eg. clinic size) because they’ll be confounded.

If you treat clinic as random, you will need the GLMM.

If you want more help, we can always set up a Quick Question consultation, but these models are too complex to advise much without seeing the data and research questions. The devil is definitely in the details.

Karen

So, what do you suggest i do?

Go see a consultant?

Yes, a good consultant could really clear things up in an hour or less (assuming there are no surprises). These are complicated models with many issues involved, and if you’re still learning, it would behoove you to get some guidance, if that’s an option.

Hi Karen,

Thanks for the information you have provided above. I want to run a binary logistic regression with one categorical predictor and one interval predictor, in addition to adding a random variable. Can I just confirm that this is not possible through SPSS 16, and I would need SPSS 19 to do this? If so, is there any information that you know of that shows you how to do the GLMM with a logistic DV in SPSS 19 without using syntax?

Any help would be appreciated!!

Hi Simon,

Confirmed–you need SPSS 19 to run that model.

I don’t know of any video resources on this. This procedure is pretty new in SPSS.

And I know you don’t want to use syntax, but I will say it really is a good idea to use syntax with models this complicated. It’s just too easy to make a mistake in the menus and not realize it….

Karen

Hi Karen,

Wondering if you can direct me to any information about how to a GLMM with a logistic DV in SPSS using syntax?

Menus are extremely confusing.

Thanks

Farah

Hi Farah,

Hmm, it’s pretty new and I haven’t seen anything on GLMM beyond the manual. Have you looked at the Command Syntax Reference?

They should have some examples.

Unfortunately, the commands are similar, but different than MIXED.

I’ll try to put together a post that shows the same analysis in MIXED and GLMM.

Karen

Hi Karen,

I am also looking for SPSS command syntax and have the same problem. My dependent variable is categorical (with dichotomous and continuous predictors and I need to run a multinomial logistic regression controlling for the random effects of site. Our data was collected at 3 different sites. Also, I saw your response about entering the site variable as a fixed effect predictor. I wonder if it will be OK to do that. I would appreciate any help.

Thank you.

Hi Bushra,

It should work as long as you don’t have any covariates measured at the site level. For example, if your sites are something like hospitals, and you had a predictor that was hospital size, it would be confounded with hospital if hospital was fixed. That’s one of the advantages of random factors.

If you don’t have any site-level covariates, then you’re golden.

Karen

Thank you so much Karen. I don’t have covariates measured at the site level. So I was able to use your approach.

Hello Karen!

I am a doctoral student from Germany running a study about mental health of 2 different groups of students. I have several diagnoses as dependent variables (dichotomous) and I am studying the risk factors (dichotomous and continuous predictors) and if there are differences between these two groups of students. I have 1 follow up where these diagnoses were assessed again and two new risk factors were included. My questions is: which of the 2 methods you suggested is better for my samples and study design? I am using SPSS 20 at the moment. And it would be great if you have some indications how to run this analysis because I am not able to find any SPSS tutorial.

Best regards,

Marcela

Hi Marcela,

With only two time points, run the GEE. It’s much simpler and you’ll get the same results.

I would start here for info on GEE: http://jeromyanglim.blogspot.com/2009/11/generalized-estimating-equations.html

Jeremy gives a nice list there–I’m familiar with some of those websites he links to, and they’re very understandable.

Best,

Karen

Since one can run a mixed model for logistic regression in SPSS as of version 19 (i.e., GLMM), why was your initial answer to the question “No” at the beginning of the thread ?

Hi Michael,

Maybe I misread it, but the initial question was whether you could use SPSS MIXED, and I took that to mean the MIXED procedure in SPSS.

You have to use GENLIN to fit a generalized linear mixed model. MIXED only fits linear mixed models (which assume normality of residuals and have an identity link function).

Karen

Hi Karen

I found very interesting advise about generalized linear model using spss.

i am using spss 19 and would like to use mixed model. i want to check effect of 4 factor on seed viability. i check normality and i can not work in normal distribution. i can only use Poisson or binomial distribution. i can consider my data as count or binomial both. i have ziro a lot and either a high value frequency a lot there for i have overdispersion. what is your suggestion to select model? is GLMM under Poisson distribution is a good choice? in some other references they recommend quasi-poisson, ziro inflated ..

it would be great to have you idea about using spss 19 for doing mixed model.

All the best

Mehdi

Hi Mehdi,

It sounds like indeed GLMM with a Poisson (or some model in that family) would be a good choice. You don’t mention why you need a mixed model–do you have randomized blocks or repeated measures?

One way to test if a negative binomial is necessary due to over-dispersion is to run the model with a negative binomial residual and a log link, and allow the negative bionomial parameter to be estimated from the data. If it’s around 1, a Poisson is adequate. If it’s much larger, you need a negative binomial.

These are tricky models, and if you aren’t up on them, you’ll want to do a lot of reading. I like the book by J. Scott Long, but there are others.

Karen

I wanted to know how to run in SPSS 19.0 an ordinal logistic regression when I have a mixed model. The ordinal response data are in the form: no response (1), minimal response (2), high response (3). I have two fixed predictors (location and treatment) and subjects that received both a treatment and a control (random effect?). I initially ran it under GLMM: data = ordinal; distribution multinomial and got output, but I am wondering whether I really want a multinomial; doesn’t this ignore the order effect of my data? Or should I being running this as a GEE with ordinal data and repeated measures? What is the difference?

Ian

Hi Ian,

You do want to include the ordering. The language used in SPSS GLMM is strange. I believe that as long as you specify ordinal, it is taking that into account.

You could run it either as a GEE or a mixed model, from what you’ve said about your design. This is the quick response to what is the difference: GEE is a marginal model, and GLMM is a true mixed model. I’ve started writing a newsletter article on it in response to your question, and that should come out next week. To get you started until then, this article explains marginal and mixed models in a linear context: The Repeated and Random Statements in Mixed Models for Repeated Measures

Karen

Hi Karen,

I’ve got clustered data (familial design), and I need to run a multinomial logistic regression. Somebody told me to use the Genmod Procedure but I’ve got this message : “The response variable as2 has 3 levels. A binary response must have two levels.” So, is the genmod Procedure really adequate to deal w/ multinomial regression ?

Thanks a lot.

Hi Epitaf,

It can. Did you specify the distribution as multinomial? I believe it’s in the model statement, after the slash, include link=logit dist=multinomial.

Karen

Hi Karen,

Thanks for your answer !

Actually, I forgot to say the most important thing (I think). I was talking about nominal multinomial regression and it seems (but tell me if I’m wrong) that Genmod can do a GEE fit for ordinal multinomial data with residual correlated structures.

” Only the cumulative logit, cumulative probit, and cumulative complementary log-log link functions are available for the multinomial distribution.”

Thank you.

ps : your blog should be mandatory for all epidemiologists and statisticians students. It’s a gift. Don’t you plan to summarize all your science and your tips (#StatWisdom ;-)) in a magic book ? Please, think about it !!!

Hi Epitaf,

You are right–I just looked it up. I’m surprised, but genmod can’t do multinomial. CATMOD can, but can’t do GEE.

I found this paper, which discusses the issue in detail (much of it mathematical detail). It looks like the best option is to use proc glimmix with the METHOD=MMPL option. See section 4.2.2.

http://www.oliverkuss.de/science/publications/Kuss_McLerran_Second_Revision_CPMB.pdf

Karen

Oh, and thanks. Glad you find it helpful. I had not thought of a magic book, and will have to think about how to do that.

Thank you for the link !

Hi Karen,

I have a simple data set with 1 within subjects factor (2 levels) and binary outcomes (0 or 1). As I understand, I should do a repeated measures logistic regression. When doing this in SPSS under GEE it does not run. It only works when I ad a factor, but then it only estimates the effect of this added factor (corrected for my repeated measurement) and not that of my repeated measurement itself. Do you know if and how I can analyze my data using SPSS in a way that will show me an effect of my within subjects factor (rather than correcting for it)?

It would be great if you could help, thanks a lot!

Lisa

Hmm, it should test your within subjects factor if you’re including it in the model.

I’m not 100% sure from the way you describe it, but if you’re only interested in the one within-subjects factor, you may be able to get away with a McNemar test. Much simpler.

Karen

Hi Karen,

I have probably missed something very obvious, but despite reading through the posts, I am struggling to add a random effect to my binary logistic regression model in SPSS. I have V21, but have never used syntax and would prefer to stick to menus where possible. My final model has only 3 categorical predictors, but my observations are clustered within 29 clinics (no predictor variables remotely associated with clinic) so I would like to assess this as a random effect but the menu and IBM help are of no use in informing how to do this. Any help would be gratefully received!

Hi Jo,

I find the GLMM in SPSS extremely unintuitive, so I’m sure you didn’t miss something obvious.

Seriously, this is a case where you want to use syntax.

Then in the

Jo, this would be the syntax for the random statement to create a random intercept for clinic:

/RANDOM USE_INTERCEPT=TRUE SUBJECTS=Clinic COVARIANCE_TYPE=VARIANCE_COMPONENTS

Karen

Hi Karen,

I have found your website and blog to be very helpful during graduate school and early employment (everything is explained so clearly!). I was hoping you may be able to help me with a bit of confusion I have had with the generalized linear mixed model in spss (version 22). I agree with you that the menus are unintuitive and I am looking into using syntax as an option (I have found some examples through your site). However, in the menus there are separate menus to input my fixed and random factors and a couple of sites I have read have stated that a factor cannot be listed in the random factor menu if I have not already placed it in the fixed menu. This has not made sense to me, and I have never found an explanation for this though, so was hesitant to believe if it was the best option (or I wasn’t sure if there was an aspect of the mixed model I was not completely understanding at this point).

When all factors are placed in the fixed menu (and appropriate ones also listed in the random menu, plus interaction terms [which were only in the random menu tab]) in the results output for the fixed effects I see all results from each individual factor and in the random effects output I only see results from the interaction terms (that include a random factor).

Thanks for any direction you can provide.

Sincerely, Kyle

Hi Kyle,

This is a great question, but one that would probably take me a half hour to explain in conversation. It is partially due to the way SPSS defines a random effect and partially due to the difference between a random factor and a random intercept. It’s not always intuitive. If you want to discuss it, I would recommend a consultation or joining our Data Analysis Brown Bag membership. It’s all in the details.

We actually discussed this a bit in my recent webinar: Random Intercept and Random Slope Models. You can download the recording for free. You may be beyond that in general, but we did talk in there about how SPSS defines these things. It’s all in the context of a linear model, but it applies in general to the GLMM case.

And this is why you need to abandon the menus.

Hi Karen, I really need your help.

I’m studying the election payment instruments of consumers in a store.

I have a lot of rows in my data set. One row is one transaction and has the following information:

Payment instrument (dependent variable), price of the transaction, discount, category of the product, geographic zone of the local.

I estimated a multinomial logit and I obtained good results. My problem is that before year 2005 I had just 4 alternatives of payment instrument. After 2005 i have 5 alternatives of election.

I think that is better to use a mixed logit, but i don’t know how to use it with multiple alternatives in spss.

What you recommend me?

Hi Lena,

I’m not sure there is a clear way to go here. What you essentially have is a DV that was measured two different ways. One option is to include some sort of interactaion for before/after 2005. The trick, though, is that you’ll have to include all 5 options, and of course before 2005, that one option is going to have no observations, leaving you with a problem of complete separation. Depending on exactly what you’re trying to figure out from these data, you may have to model it separately before and after 2005.

I ran an experiment which looks at article production in English in different controlled conditions. There were two groups of subjects (Monolingual, Bilingual), four different conditions based on Definiteness and Specificity ([+definite, +specific],[-definite, +specific], etc.) with four test items per condition, and three possible response types (correct, omission, substitution). All data are categorical. We’re interested in how each variable influences the response, but would also like to compare the two groups.

I think it should be a multinomial logistic regression model, something like Response~Group*Definiteness*Specificity, but also accounting for repeated measures on subjects and subjects as a random effect. As far as I know, this isn’t possible in SPSS, and I’m quite new to R so I haven’t had much luck there. I guess I’m wondering 1) if I’m correct about how these data should be analyzed, 2) if it’s possible with existing software and 3) what my options are if it’s not possible.

You are correct on the approach.

I just checked, and indeed you can specify multinomial logistic regression in the GENLINMIXED procedure in spss, which can do repeated measures. I would suggest checking out the Command Syntax Reference manual. Very helpful.

Hi Karen,

I am analyzing some data but I have difficulties in deciding the correct random structure. Every day, 2-4 focal follows were conducted and during each focal follows I have data for two individuals ( a male and a female). female ID changes (tot females = 4) every follow. I expect correlations between focal follows conducted in the same days and between individuals within the same focal follows. In addition Individual ID is repeated through time. Do you have any suggestion? I am working in SPSS and I get confused when using data structure.

Kind regards,

R