Logistic Regression Models for Multinomial and Ordinal Variables

Multinomial Logistic Regression

The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model. They are used when the dependent variable has more than two nominal (unordered) categories.

Dummy coding of independent variables is quite common. In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables. There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables. All but one category has its own dummy variable. Each category’s dummy variable has a value of 1 for its category and a 0 for all others. One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.

The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables. The result is M-1 binary logistic regression models. Each one tells the effect of the predictors on the probability of success in that category in comparison to the reference category. Each model has its own intercept and regression coefficients—the predictors can affect each category differently.

Why not just run a series of binary regression models? You could, and people used to, before multinomial regression models were widely available in software. You will likely get similar results. But running them together means they are estimated simultaneously, which means the parameter estimates are more efficient–there is less overall unexplained error.

Ordinal Logistic Regression: The Proportional Odds Model

When the response categories are ordered, you could run a multinomial regression model. The disadvantage is that you are throwing away information about the ordering. An ordinal logistic regression model preserves that information, but it is slightly more involved.

In the Proportional Odds Model, the event being modeled is not having an outcome in a single category as is done in the binary and multinomial models. Rather, the event being modeled is having an outcome in a particular category or any previous category.

For example, for an ordered response variable with three categories, the possible events are defined as:

being in group 1
being in group 2 or 1
being in group 3, 2 or 1.

In the proportional odds model, each outcome has its own intercept but the same regression coefficients. This means:

1. the overall odds of any event can differ, but

2. the the effect of the predictors on the odds of an event occurring in every subsequent category is the same for every category. This is an assumption of the model that you need to check. It is often violated.

The model is written somewhat differently in SPSS than usual with a minus sign between the intercept and all the regression coefficients. This is a convention ensuring that for positive coefficients, increases in X values lead to an increase of probability in the higher-numbered response categories. In SAS, the sign is a plus, so increases in predictor values lead to an increase of probability in the lower-numbered response categories. Make sure you understand how the model is set up in your statistical package before interpreting results.

Binary, Ordinal, and Multinomial Logistic Regression for Categorical Outcomes

Get beyond the frustration of learning odds ratios, logit link functions, and proportional odds assumptions on your own. See the incredible usefulness of logistic regression and categorical data analysis in this one-hour training.

Comments

Kirsty says

November 19, 2021 at 6:52 am

Hello, I am conducting an analysis for school on R-Studio and I need some help!

I am investigating whether change in the yearly liberal democracy rating of a country (DV) ( 0 = no change, 1=increase, 2=decrease) has an effect on whether migration policy restrictiveness changes (IV) (0= no change, 1= increase in restrictiveness, 2 = decrease in policy restrictiveness). I have dataset measuring this, and I also have a number of control variables.

I am assuming my DV is nominal as it cannot really be ordered, and would a multinomial regression be a good approach?
thanks

Reply
- Karen Grace-Martin says
  
  November 30, 2021 at 3:54 pm
  
  Those variables could both be considered ordinal if you recoded them. Increase, No Change, Decrease.
  
  But whether you should treat it as ordinal depends on a research question. It’s never wrong to use a multinomial model on ordinal data. You just lose any interpretation about ordering (so that’s only wrong if that’s part of your research question).
  
  Reply
martin palermo says

June 4, 2021 at 10:53 am

I am running a multinomial logistic regression in SPSS. Some of the literature I have read suggests putting all independent variables into the model first to see what is significant in the likelihood ratio test. Then remove those that aren’t significant and re-run the regression for the final output. Is this the proper way to do this and if so do you have any information on this process that I could read/cite? Thank you.

Reply
- Karen Grace-Martin says
  
  June 22, 2021 at 9:57 am
  
  Hi Martin,
  
  There isn’t a “proper” way to do model building, though many books and articles will tell you there is. What they really mean is this is the proper way to model build in order to make the specific kinds of inferences about the model and the relationships that we’re interested in making.
  
  Here is some info on model building and different approaches: https://www.theanalysisfactor.com/?s=model+building
  
  Reply
Joshua Anka says

March 11, 2020 at 5:08 pm

Hi, how do you determine the number of respondent(s) in logistic regression analysis output. In a printed output how do you determine the number of respondent(s) used?

Reply
- Karen Grace-Martin says
  
  April 17, 2020 at 2:47 pm
  
  It will completely depend on which software you’re using. Some are better at displaying this info than others.
  
  Reply
Tom says

April 24, 2019 at 6:12 am

Hi Karen!

I am running a multinomial logit regression with three possible outcomes as the dependent variable. Can you recommend some useful tests one should do when performing this kind of regression?

Kind regards,

Tom

Reply
Regina says

April 17, 2019 at 10:46 pm

Hi Karen,

I am running an ordinal logistic regression
My dependent variable has 4 levels (policy score from 0-3).
My independent variables are all scale.

When I run my model with no interaction terms, my IV (GDP and corruption) coefficients are positive and odds ratio > 1.

When I run my model with interaction terms, my IV coefficients become negative, odds ratio < 1, but the interaction is positive (GDP*corruption)/

Is there a reason why this is happening? Is it because when I add interactions, the main effects are comparing different reference groups?

Thanks so much.

Reply
Sarah says

February 5, 2019 at 7:31 am

Greetings Karen,

I am doing research on educational attainment. my dv has 7 categories which I collapsed into 4 1=GED 2=HS 3=ASSC 4=Ba or higher. I was not sure if I should use ordered or multinomial. While the education levels are ordered I am not sure the space in between each level and the convention in sociology.

Reply
- Karen Grace-Martin says
  
  March 4, 2019 at 11:07 am
  
  Hi Sarah,
  
  I don’t feel comfortable giving advice on what you should do without completely understanding the research context. That said, for a variable to be considered ordered, the spacing between categories does not need to be equal.
  
  Reply
Elise says

December 18, 2018 at 11:07 am

Hi,

I’m doing a multinominal analysis for school.
I put ‘education’ from scale (1= low educaton to scale 5= high education) under ‘covariates’. Now I want to make an interactionterm Gender*Education (with gender being a kwalitative variable 0,1)
But now I’m not sure wether I need to put this interactionterm under ‘factor’ or covariates’?

Can you help me please?
Elise

Reply
- Karen Grace-Martin says
  
  March 4, 2019 at 11:49 am
  
  Hi Elise,
  
  I would need way more information, incluing which software you’re using and which specific procedure.
  
  Reply
Paa says

June 10, 2018 at 3:26 am

Is it appropriate to use binary logistic regression when you have polytonomus nominal predictor variables eg. Level of education and nominal dichotomous outcome eg. Yes and No?
Thanks
Paa Kwesi.

Reply
- Karen Grace-Martin says
  
  October 26, 2018 at 5:16 pm
  
  Yes.
  
  Reply
Ngo Quynh says

March 10, 2017 at 10:17 am

Hi everyone 🙂
I have some problems like that “There are 230 (66.7%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies.
Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged.
The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.”
Could somebody help me ?
Thanks 🙂

Reply
- Vincent says
  
  December 20, 2017 at 11:41 am
  
  Ngo Quynh
  Hi everyone
  I have some problems like that “There are 230 (66.7%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies.
  Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged.
  The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.”
  Could somebody help me ?
  Thanks
  
  ^ All you need to do is either drop the observations where you have missings (if you can justify it) or recode some of your variables so that they don’t have empty (or close to empty) categories. E.g. if you have a categorical variable that has 5 levels (5 different values it can assume), and there is sufficient data for 3 of its levels whereas 2 are empty or very sparse, you could create a new variable containing only the 3 levels for which there is sufficient data. Alternatively you could restrict your analysis to those observations where there are no missings for specific variables. Or you could drop the problematic predictors altogether (which is also among the suggestions in your cited output), but this is not always necessary.
  
  Reply
Rakshit says

February 21, 2017 at 1:25 am

Hi All,

I have a questionnaire data in which we need to find the expectations of the respondent based on the product they are using .
Suppose there are 9 products being used a respondent might be using all of the products or 1 or 2 or 3 or others.
so for each product we have a likert question with scaling from 1-5.
thus the varibles become 9 (One for each product with rating of 1-5 and if he doesnt use the product then there are blanks for the product variable)
how to interpret this information)
the data consist of 64 respondent and the response of product variables each might be having 10 ratings out of 64.

Can you help me to get an answer or how i can use the statistical method to derive anything.

Reply
tilahu eshetu says

September 7, 2016 at 10:21 am

Ordered logistic regression Number of obs = 248
LR chi2(5) = 93.12
Prob > chi2 = 0.0000
Log likelihood = -245.07184 Pseudo R2 = 0.1597

——————————————————————————-
servqual | Coef. Std. Err. z P>|z| [95% Conf. Interval]
————–+—————————————————————-
tangible | .3610695 .2114817 1.71 0.088 -.0534271 .775566
realiability | -.2337004 .2429432 -0.96 0.336 -.7098603 .2424595
responsivness | .5860527 .2810081 2.09 0.037 .035287 1.136818
assurance | .0150324 .2674593 0.06 0.955 -.5091783 .5392431
emphaty | 1.282419 .2467206 5.20 0.000 .7988559 1.765983
————–+—————————————————————-
/cut1 | 2.717063 .9459124 .8631089 4.571017
/cut2 | 4.932767 .8850435 3.198114 6.667421
/cut3 | 6.254481 .9243937 4.442703 8.06626
/cut4 | 9.629553 1.039574 7.592025 11.66708
——————————————————————————-

Reply
tilahu eshetu says

September 7, 2016 at 10:10 am

Ordered logistic regression Number of obs = 248
LR chi2(5) = 93.12
Prob > chi2 = 0.0000
Log likelihood = -245.07184 Pseudo R2 = 0.1597

——————————————————————————-
servqual | Coef. Std. Err. z P>|z| [95% Conf. Interval]
————–+—————————————————————-
tangible | .3610695 .2114817 1.71 0.088 -.0534271 .775566
realiability | -.2337004 .2429432 -0.96 0.336 -.7098603 .2424595
responsivness | .5860527 .2810081 2.09 0.037 .035287 1.136818
assurance | .0150324 .2674593 0.06 0.955 -.5091783 .5392431
emphaty | 1.282419 .2467206 5.20 0.000 .7988559 1.765983
————–+—————————————————————-
/cut1 | 2.717063 .9459124 .8631089 4.571017
/cut2 | 4.932767 .8850435 3.198114 6.667421
/cut3 | 6.254481 .9243937 4.442703 8.06626
/cut4 | 9.629553 1.039574 7.592025 11.66708
——————————————————————————-
hi there how to interpret the above table specially cut points .

Reply
tilahu eshetu says

September 7, 2016 at 10:07 am

hi
I want to ask how to interpret these table …specially cut points on the issue the effect of service quality on customer satisfaction

Reply
adinew says

April 21, 2016 at 8:47 am

Would you mind showing me all the steps how to re-code the 5 scale Likert scale into dichotomous (0,1 or yes no )form?

Reply
- tilahu eshetu says
  
  September 7, 2016 at 10:11 am
  
  yes!
  
  Reply
NC says

March 13, 2016 at 5:41 pm

Hello,
I am trying to run an analysis to see whether area of distress varies based on age, point in cancer diagnosis, etc. I thought I would be running a multinomal logistical regression originally. However, some participants have just one area of distress (hence just one category) and some have multiple areas of distress at the same time. Thus, many participants qualify for multiple categories of the dependent variable. I am not sure what to do about this, as it seems to run counter to basic assumptions about categorical variables. Any ideas about what analysis to run in this situation, or how to code the responses? I am using SPSS. Any help is MUCH APPRECIATED.

Reply
Yohannes A says

February 24, 2016 at 4:19 am

I want to run ordinal logistic regression (OLR) in SPSS. My data include 3 predictor variables (all continuous) and my outcome variables are 6 (ordinal), although the composite is one. My dependent variable is narcissism, which has 6 dimensions or subscales (self-interest, manipulation, impulsivity, unawareness of others, pride and self-love).
• Is it possible to run OLR with each DV and then summarize the results?
• OR is there any other method for reducing the DV and running OLR?
• OR is there any other method than ordinal logistic regression for
analyzing this data?

Reply
Maurice says

September 30, 2015 at 11:36 pm

Hello, I wanted to know how to perform a generalized ordered logit model.

I have lots of data and a dependent variable which is a scale from 1-14 on security consciousness (each individual is given a score based on answers to previous questions) and then a bunch of categorical variables (age:18-21,22-25,etc.; income:<=$20,000,<=40,000,<=$60,000,etc.; education:Some high school, high school, some college, bachelor's). I want to know how much of a factor each (age, income, education) plays in security consciousness. Am I correct that I would need a generalized ordered logit model? And do you have any help in how to perform one using Excel or R?

Reply
Ashan says

May 30, 2015 at 3:24 pm

Hi
i am carrying out a research on interdependent decision making in the food context (i.e. the effect of other actors on the consumers’ choices). bothe ID and DV are nominal data.
the DV is consumers choices coded 1-5 (not exactly ordered), and IDs are the behavior of 4 other groups each coded (group one binary, group two from 1 to 3, and group three from 1to4).
what model you suggest to investigate the relationship between IDs and DV? Can I use Mix-multinominal logestic regression? if yes I appreciate if let me know any stepwise sourse on how to do it (preferably in R). Do I need to consider the random effect? Thanks

Reply
stanley says

April 16, 2015 at 11:42 am

Hi,
i am carrying out a research on the effect of risk management on construction project success. i have listed the risk management techniques as my independent variable and project success as my dependent variable.
i also designed a likert scale for respondents to rate the effect of each risk management technique on project success.
please which technique should i use to analyze the likert scale data. can i use SPSS.
THANK YOU

Reply
Seb says

January 25, 2015 at 5:51 am

Hi,

I have to predict in which of four quartiles a subject will end up. Those quartiles deal about their performance in a high jump test.
Q1 120 and < 130cm…
Is it more correct to use ordered logit regression or ordered probit regression.

Reply
- Karen says
  
  January 26, 2015 at 5:47 pm
  
  At least in a binary situation, you tend to get very, very similar results from probit and logit. I can’t imagine a reason why ordered responses would be any different. Neither is considered more correct, but there are definitely fields where one or the other is more common.
  
  Reply
Ronald says

December 17, 2013 at 12:16 am

Hi Karen
I am working on a data whose independent variable is nominal with five categories and the dependent variable is also nominal with ten categories. I have several control variables. I want to use multinomial I logistic regression to predict the outcome variable but the problem is that my control variables are too many and might strain my models for nothing. Which analysis can I use to determine those that I can fit In my MLR model.

Reply
- Karen says
  
  December 23, 2013 at 1:27 pm
  
  Hi Ronald,
  
  You’re right. That will probably be too much for nominal logistic regression. I would start with doing a series of chi-square tests to see which of those control variables have any effect on the outcome. If they don’t, then don’t include them in the overall model.
  
  Reply
Phillip Schnarrs says

August 14, 2013 at 6:23 pm

Hi,

Just a quick question. I am working on a data analysis that will be using demographic characteristics to understand sexual risk behaviors. The question asks about a specific behaviors to which there were 11 possible responses (so obviously nominal data). However, a new variable was constructed following Center for Disease Control guidelines concerning risk (0 = No Risk, 1 = Low Risk, 2 = Moderate Risk, and 3 = High Risk). Now, I’m stuck because the original variable is nominal, but by because the behaviors a listed as No Risk to High Risk it seems as if they could be ordinal. Additionally, I was thinking to possibly even reduce this to no risk to possible risk of HIV transmission and running a logistic regression. Any advice would be great.

Reply
Ariel says

July 2, 2013 at 10:46 am

hi Karen,
How would you analize the following: you have an experiment that consists of many trials (or competitions – e.g., triava quiz) between 2 individuals, and any individual gets a score of 1 if he wins (answer more questions than the loser) or 0 if he loses. No other data on the competition is recorded, only win or loose. The individuals come from 2 different groups (e.g., from 2 different schools), and for each individual you have also some other individual data (such as age, scores in english and history and gender). each individual participates only in one competition. We would like to have a statistical analysis that that finds the predictors of a win. Naturally we can’t just use logistic regression on all the individuals where each individual is a case (or a data point) because the result of the competition between 2 individuals depends on the attributes of the 2 specific individuals involved. So we have to somehow account for the pairing of the individuals. We can account for example by adding a dummy variable of trial ID, or including both the individuals in one data point – i.e., including either the difference between their attributes or even better including all pattributes: english1, english2, history1, history2, etc (all these are considerd fixed effects). But when we do so we can’t also include the group parameter in the analysis (as a random effect)- because obviously within each trial the win of one individual is the loose for the other, making the data point dependent and therefore violating the assumptions of the logistic regression, i guess.

So is there a way to analyze both the group and the other paramters at once, as predictiors for a win? Or do we need to do it in separate analysis: regression for the fixed parameters, and an exact binomial test for the group parameter?
We would be thankful for any suggestions.

thanks!

Reply
- Karen says
  
  July 15, 2013 at 3:58 pm
  
  Ariel, It’s hard to say for sure, but it sounds like you need to use the contest as the unit of analysis. Use characteristics of School 1 participant and School 2 participant as the predictors, and define the outcome as “Did School 1 participant win?: Y/N.” That’s a tricky one.
  
  Reply
Saroni says

June 17, 2013 at 3:37 am

Hi Karen,
Kindly advice,
I have a problem where the DV (ordinal) is in Likert-type scale i.e.
(Most satisfied-1,Satisfied-2,Neither S nor DS-3, Dissatisfied-4, Most Dissatisfied-5)
and 2 sets of 7 IVs (Almost the same Scale but 1-5 scale) and a set of 5 IVs with (a scale of 1-6) both ordinal. Which is the best way to analyse this kind of problem? Do I need to treat the IVs as factors or covariates?
Thanks.

Reply
- Karen says
  
  July 1, 2013 at 1:14 pm
  
  Saroni, there is no exact answer. It depends on what assumptions you are willing to make and the kind of information you need. Start here: https://www.theanalysisfactor.com/the-distribution-of-independent-variables-in-regression-models-2/
  
  Reply
Karen says

January 29, 2013 at 5:27 pm

Hi Bereket,

That sounds like it’s eligible. I can’t tell you it’s the way to go without more info. Are those actual counts?

Karen

Reply
Bereket.M says

January 27, 2013 at 3:16 am

The value of the response variables are 0,1,2,3,4. The observation takes place in 253 primary school childrens. Can i use poisson regression models, or any count data model to model the data.

Reply
Kevin says

January 9, 2013 at 9:15 am

Hi Karen,
This is similar to Robert’s question back in March. My research is actually about language but might be easier to understand if I use the competition metaphor. I have lots of data about different competitors and the results of lots of three-competitor competitions. So, my IVs are three sets of competitor data, and my DV is the winning competitor. I want to build a model that will predict the winner. I understand that this will be a logistic regression, but what kind and how should I organise the data (for SPSS 19), given that my DV is also one of my IVs? A nudge in the right direction would be very much appreciated. Kevin
PS Your courses look very good!

Reply
- Karen says
  
  January 16, 2013 at 10:14 am
  
  Hi Kevin,
  
  This may be the kind of question that requires a consultation because the answer is in the details, which means I’d have to ask about a dozen questions to make sure I understand correctly. But I’ll try to give you a nudge.
  
  Your DV can’t be an IV as well. So that means you’re going to have to define your DV in such a way that it’s not the same as an IV. For example, the DV may be “Did this competitor win: Yes or No” and the IV is “Competitor ID: 1, 2, or 3.” That’s a little different than defining the DV as “Which competitor won: 1, 2, or 3.” Does that work?
  
  Karen
  
  Reply
  - Kevin says
    
    January 16, 2013 at 11:21 am
    
    Hi Karen,
    Thanks for the nudge – that definitely helps. A consultation might be the way forward, but I want to put together a more detailed plan if I go down that route, so that you would be verifying and improving it rather than telling me how to build it from scratch. I hope that makes sense. Thanks again – a very helpful nudge.
    Kevin
    
    Reply
Anna says

December 6, 2012 at 9:26 am

Hi
I have a categorical IV and the DV is order of opening 4 information boxes. Each participant has data like this:
infobox1: 1st
infobox2: 2nd
infobox3: 4th
infobox4: 3rd
Can I use linear regression/ANOVa for this or do I need to use chi-square/logistic regression because it is really an ordinal variable?
I’d really appreciate your advice.
Anna

Reply
- Karen says
  
  December 12, 2012 at 11:49 am
  
  Hi Anna,
  
  It really is an ordinal variable. Plus, when you ask people to rank things, it has the extra issue of dependence. If you know how people ranked the first three boxes, you know their answer to the fourth.
  
  Karen
  
  Reply
shymaa yassin says

November 13, 2012 at 7:23 am

i need to learn every thing about logistic mult more than two categories because i make my resarch thank you

Reply
- Karen says
  
  November 14, 2012 at 12:33 pm
  
  Hi Shymaa,
  
  You may want to start with my webinar on multinomial and ordinal models, but there are also great books out there that discuss it as well, including Long’s Regression Models for Categorical and Limited Dependent Variables.
  
  The webinar recording is free: https://www.theanalysisfactor.com/binary-ordinal-multinomial-logistic/
  
  Karen
  
  Reply
Chris says

July 22, 2012 at 5:17 pm

Hi Karen,

I am dealing with ordinal data for the first time and I am having a bit of a data dilemma. I am examining whether increased insight (awareness) of the fact that one has a mental illness (MI; measured continuously) predicts people’s increased perceptions of their own recovery from MI. Further, does experienced stigma (measured continuously) for having MI moderate this relationship.

Recovery (the DV) is measured using an empirically-validated ordinal 5-stage model which provides a score for each stage of recovery per participant (i.e., 5 scores per person). Hence, according to my hypothesis, high insight should be related to higher stages of recovery, and this relationship should change as experienced stigma changes.

Given that my DV is both ordinal and dependent (i.e., stages of recovery are divergently correlated the further apart they are conceptually; e.g., stage 5 has a stronger correlation with stage 4 than with stage 2) would you recommend a repeated-measures multinomial logistic regression?

Thanks,

Chris

Reply
Karen says

July 10, 2012 at 10:58 am

Hi Epitaf,

There are two ways–graph it and try non-linear terms to make sure they don’t fit better.

Karen

Reply
Epitaf_ says

June 7, 2012 at 1:25 pm

Hi,

How do we check the log linearity assumption for a quantitative predictor in a multinomial regression ?

Thanks.

Reply
robert says

March 6, 2012 at 8:49 pm

Can Ordinal Logistic Regression be applied to analyse the factor weightings that predictively rank folks, say, in a quiz competition? Factors could be those such as age, education, job, relevant hobbies, specialisations etc.

Ten people, say, will finish with 10 different scores in a particular competition. What confuses me is that I have seen examples where perhaps a 100 other competition results are grouped into one analysis run. There are 100 winners but each winner has different abilities which gets lost in the analysis. Or a person finishing last (rank 10) in one competition, say, may have come out top (rank 1) in another weaker competition.
I am thinking that each individual competition has to be normalised somehow so that it actually can be related sensibly to the other 99 competitions.

Any views?

Reply
- Karen says
  
  March 9, 2012 at 11:03 am
  
  Hi Robert,
  
  Let me see if I understand. Each person has 10 different scores, each on, say, a different topic? You use the raw scores to create rankings for each topic?
  
  I guess what I’m missing here is what are you trying to get from the analysis? How the predictors (age, education, etc) affect the overall ranking across all 10 topics? Or whether they affect the rankings on some topics but not others?
  
  Karen
  
  Reply
anon says

February 3, 2012 at 7:05 am

i have a question but the data/subject matter is very confidential so i would prefer a private email – is this at all possible? please let me know

Reply
- Karen says
  
  February 10, 2012 at 6:12 pm
  
  I can’t usually answer emails privately, but you’re welcome to set up a Quick Question consultation. That’s just what they’re for–when you are stuck on something and just need a quick bit of help. Just click on Consulting in the menus, and go to Quick Question.
  
  Karen
  
  Reply
Anton says

July 21, 2010 at 6:37 am

hi,

I am currently completing my dissertation which uses the Polity IV index of democracy, measured on a 0-10 scale with 1 0 having ‘no democracy’ and 10 being a ‘full democracy. My independent variable on diamond abundance is measured on a continuous level.

With such a dependent variable, is it wise to use a simple logistic regression? and should i construct my 1-10 measure of democracy into a dummy variable?

Can anyone help me with this? i will be very grateful!

Anton

Reply
- Karen says
  
  August 6, 2010 at 10:13 am
  
  Hi Anton,
  
  Unless there is a cut point on your 11 point scale that is particularly meaningful, you probably don’t want to split it into a dummy variable. You will lose a lot of information that way.
  
  You basically have two choices: 1. treat it as a continuous variable, which sometimes is a reasonable assumption, and run a linear regression model. 2. treat it as ordinal (which it inherently is), and run an ordinal logistic regression.
  
  There’s a big debate on this, and both types of models have assumptions that may or may not be met here. A lot of people will make it sound like the OLS is clearly wrong here, but the ordinal regression also has assumptions that have to be met.
  
  This post has more information: Can Likert Scale Data ever be Continuous?.
  
  Reply
Anonymous says

June 17, 2010 at 3:52 pm

Hi Every one ;

At the moment I am a PhD student in the field of natural resources Economics ,by the time I start to develop my research proposal entitled “The interaction of Poverty and Natural resources degradation” two things comes in mind;
The model needed is a two stage regression approach which poverty is continuous endogenous variable and natural resources degradation is categorically ordered qualitative variable.

Here comes where I am challenged and I need your kind help.
First of all the model has a mix of logit (probit application) for the ordinal variable(natural resources degradation ) and next an OLS endogenous variable (poverty measured in percapita expenditure)

I can have the data from the household survey and when I start to think how to fit the model by STATA I am confused which commands to use and how to deal with a mixture of such continuous and ordinal endogenous variable.

May you then be kind so send me some hints how to deal with this issue.

With Kind regards

Darish

Reply
- Karen says
  
  July 8, 2010 at 3:45 pm
  
  Hi Darish,
  
  Unfortunately, I’m not a big Stata expert. I’ve used it before, but I’m not up on it enough to give you suggestions.
  
  If anyone else can answer this or give Darish some hints, please do.
  
  Given the nature of your model, though, I would suggest getting a hold of Long & Freese’s book on Categorical data analysis using Stata. I don’t know if it covers two-stage modelling, but it may at least get you started, and I know Long’s other book (which is a bit more theoretical) does cover mixture models.
  
  Good luck,
  Karen
  
  Reply
Dale Dwyer says

November 19, 2009 at 1:34 pm

I have an interesting analysis that I’m not sure how to analyze correctly. I am using all nominal data as IVs (subject membership in one of two groups=IV1 and subject gender=IV2) to predict the likelihood of each subject making a choice to layoff five out of twenty-five people. Each subject, then, has a 1st, 2nd, 3rd, 4th, and 5th choice in order of who he would layoff. I have coded all 25 different choices for each subject as either 0 (subject didn’t choose that person) or 1 (subject chose that person). I have 136 subjects who completed the study.

I am clear about the between subjects analysis, but I’m not quite sure how to do the within subjects analysis. Any suggestions would be greatly appreciated!

Dale

Reply
- Karen says
  
  November 23, 2009 at 10:29 am
  
  Hi Dale,
  
  Just to be clear, you’re ignoring the order here, right? 1st-5th choice, and just coding the response as 1/0?
  
  I would look into a GEE analysis. It’s Generalized Estimating Equations, and it is an approach for repeated measures for generalized linear models.
  
  I have to say, though, this IS an interesting analysis, and there may be other approaches to get at your research questions. For example, if there are characteristics of each of the 25 people, you may want to include them. Or the order may be interesting. But GEE would be a great place to start, and it is available in all the major stat packages.
  
  -Karen
  
  Reply
  - tilahu eshetu says
    
    September 7, 2016 at 10:17 am
    
    hi karen
    
    the orders are 1.highly dissatisfied 2.satisfied 3.neutral satisfied and 5. highly satisfied.
    
    Many thanks
    Tilahun
    
    Reply

Multinomial Logistic Regression

Ordinal Logistic Regression: The Proportional Odds Model

Reader Interactions

Comments

Leave a Reply Cancel reply