How to Interpret Odd Ratios when a Categorical Predictor Variable has More than Two Levels

by Karen Grace-Martin


One great thing about logistic regression, at least for those of us who are trying to learn how to use it, is that the predictor variables work exactly the same way as they do in linear regression.

Dummy coding, interactions, quadratic terms–they all work the same way.

Dummy Coding

In pretty much every regression procedure in every stat software, the default way to code categorical variables is with dummy coding.

All dummy coding means is recoding the original categorical variable into a set of binary variables that have values of one and zero.  You may find it helpful to think of these as yes/no variables for each category that indicate whether or not the original variable has that particular category value.

You need one variable for each category except one. The last category, known as the reference category, has a value of zero (no) on all the other dummy variables, so including a variable with a value of 1 (yes) for that one is redundant.

(Incidentally, if you were ever wondering what true, absolute multicollinearity looks like, go ahead and run the model with a dummy variable for that last category).

So in any regression model, the unstandardized coefficients for a dummy variable represent the difference in predicted values that variable’s category compared to the reference category.

How Dummy Codes affect interpretation in Logistic Regression

In logistic regression, the odds ratios for a dummy variable  is the factor of the odds that Y=1 within that category of X, compared to the odds that Y=1 within the reference category.

For example, let’s say you have an experiment with six conditions and a binary outcome: did the subject answer correctly or not. You need to control for a number of covariates, so you can’t just use a chi-square test.

Because there are six conditions, you’ll need 5 dummy variables.  The first indicates whether the subject was in condition 1 or not; the second whether the subject was in condition 2 or not, etc.  We don’t need a dummy variable for condition 6, since everyone in condition 6 has a 0 (no) on all Condition 1-5 dummy variables.

So the odds ratio for condition 1 is a ratio of the odds of answering correctly in condition 1 compared to the odds of answering correctly in condition 6.  The odds ratio for condition 2 is the ratio of the odds of answering correctly in condition 2 compared to condition 6.

This way of coding works especially well if condition 6 is the control condition.
Bookmark and Share

Leave a Comment

Please note that Karen receives hundreds of comments at The Analysis Factor website each week. Since Karen is also busy teaching workshops, consulting with clients, and running a membership program, she seldom has time to respond to these comments anymore. If you have a question to which you need a timely response, please check out our low-cost monthly membership program, or sign-up for a quick question consultation.

Previous post:

Next post: