Multinomial Logistic Regression
The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model. They are used when the dependent variable has more than two nominal (unordered) categories.
Dummy coding of independent variables is quite common. In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables. There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables. All but one category has its own dummy variable. Each category’s dummy variable has a value of 1 for its category and a 0 for all others. One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.
The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables. The result is M-1 binary logistic regression models. Each one tells the effect of the predictors on the probability of success in that category in comparison to the reference category. Each model has its own intercept and regression coefficients—the predictors can affect each category differently.
Why not just run a series of binary regression models? You could, and people used to, before multinomial regression models were widely available in software. You will likely get similar results. But running them together means they are estimated simultaneously, which means the parameter estimates are more efficient–there is less overall unexplained error.
Ordinal Logistic Regression: The Proportional Odds Model
When the response categories are ordered, you could run a multinomial regression model. The disadvantage is that you are throwing away information about the ordering. An ordinal logistic regression model preserves that information, but it is slightly more involved.
In the Proportional Odds Model, the event being modeled is not having an outcome in a single category as is done in the binary and multinomial models. Rather, the event being modeled is having an outcome in a particular category or any previous category.
For example, for an ordered response variable with three categories, the possible events are defined as:
- being in group 1
- being in group 2 or 1
- being in group 3, 2 or 1.
In the proportional odds model, each outcome has its own intercept but the same regression coefficients. This means:
1. the overall odds of any event can differ, but
2. the the effect of the predictors on the odds of an event occurring in every subsequent category is the same for every category. This is an assumption of the model that you need to check. It is often violated.
The model is written somewhat differently in SPSS than usual with a minus sign between the intercept and all the regression coefficients. This is a convention ensuring that for positive coefficients, increases in X values lead to an increase of probability in the higher-numbered response categories. In SAS, the sign is a plus, so increases in predictor values lead to an increase of probability in the lower-numbered response categories. Make sure you understand how the model is set up in your statistical package before interpreting results.