How to Interpret Odd Ratios when a Categorical Predictor Variable has More than Two Levels

One great thing about logistic regression, at least for those of us who are trying to learn how to use it, is that the predictor variables work exactly the same way as they do in linear regression.

Dummy coding, interactions, quadratic terms–they all work the same way.

Dummy Coding

In pretty much every regression procedure in every stat software, the default way to code categorical variables is with dummy coding.

All dummy coding means is recoding the original categorical variable into a set of binary variables that have values of one and zero. You may find it helpful to think of these as yes/no variables for each category that indicate whether or not the original variable has that particular category value.

You need one variable for each category except one. The last category, known as the reference category, has a value of zero (no) on all the other dummy variables, so including a variable with a value of 1 (yes) for that one is redundant.

(Incidentally, if you were ever wondering what true, absolute multicollinearity looks like, go ahead and run the model with a dummy variable for that last category).

So in any regression model, the unstandardized coefficients for a dummy variable represent the difference in predicted values that variable’s category compared to the reference category.

How Dummy Codes affect interpretation in Logistic Regression

In logistic regression, the odds ratios for a dummy variable is the factor of the odds that Y=1 within that category of X, compared to the odds that Y=1 within the reference category.

For example, let’s say you have an experiment with six conditions and a binary outcome: did the subject answer correctly or not. You need to control for a number of covariates, so you can’t just use a chi-square test.

Because there are six conditions, you’ll need 5 dummy variables. The first indicates whether the subject was in condition 1 or not; the second whether the subject was in condition 2 or not, etc. We don’t need a dummy variable for condition 6, since everyone in condition 6 has a 0 (no) on all Condition 1-5 dummy variables.

So the odds ratio for condition 1 is a ratio of the odds of answering correctly in condition 1 compared to the odds of answering correctly in condition 6. The odds ratio for condition 2 is the ratio of the odds of answering correctly in condition 2 compared to condition 6.

This way of coding works especially well if condition 6 is the control condition.

Understanding Probability, Odds, and Odds Ratios in Logistic Regression

Despite the way the terms are used in common English, odds and probability are not interchangeable. Join us to see how they differ, what each one means, and how to tame that tricky beast: Odds Ratios.

Comments

Emile says

January 5, 2024 at 8:19 am

Hi Karen

I want to ask which value can be coded as 1, is it the value that is expected to be associated when you calculate the Adjusted Odd Ratio, and explain more about to the value that can be considered as reference

Thanks

Reply
Cyrille says

April 3, 2021 at 12:41 pm

Hi,
thank you for those explanations.
In this case, the odd ratios refer to the odds compared to the reference category, but not to the other categories, right? Let’s say you have a logistic regression table with odd ratios, with 4 categories for a variable X (for example, rural, small town, medium-size city and agglomeration) and you look at their effect on Y (whatever Y is). If you take rural as the reference category, the odd ratios for the 3 other categories are compared to rural. So if you want to compare medium-size city and agglomeration, then you will need to change the reference category and use medium-size as the reference? You cannot deduce the odds for agglomeration compared to medium-size city in a regression where rural would be the reference category, right?
However, if the odd ratios get stronger each time you go to the next category (so in the table, in which rural is the reference category, you have an OR of 1,201 for small town ; 1,356 for medium-size and 1,546 for agglomeration (and let’s say all are significant)), can you say that the bigger the city is, the higher the odds that Y happens? And if instead of cities, the 4 categories are level of education, in which the reference category is zero diploma, then high school, then bachelor, then master, could you say: the more educated one is, the higher the odds that Y happens?

Reply
João says

February 24, 2020 at 12:01 pm

Hi, thank you for your explanation.

I would like to know how to interpret Odds ratio in non-binary outcomes. Specifically, I have several Likert itens regarding motivations which are measured in 5 points (strongly disagree to strongly agree). I’ve found a paper referring to this types of Odds ratios as cumulative (for each higher increment, the odds increases by the Odds Ratio). How can I confirm this?

Thank you,
João Teixeira

Reply
- Karen Grace-Martin says
  
  April 17, 2020 at 2:51 pm
  
  H João,
  
  That would an ordinal logistic model. We have a few resources on that. See: Opposite Results in Ordinal Logistic Regression—Solving a Statistical Mystery and Binary, Ordinal, and Multinomial Logistic Regression for Categorical Outcomes
  
  Reply
anita.a says

June 13, 2019 at 10:26 am

Very informative and rather easy to understand. Though, have the same question as Dina- how to read he non significant values?

Reply
dina says

August 7, 2017 at 5:46 am

Hi karen,
I want to ask you about condition that all of the dummy variables are not significant. How do i do to interprate that condition? Can i say that all of them are not significant or what should i say that?
Thank you

Reply

Dummy Coding

How Dummy Codes affect interpretation in Logistic Regression

Reader Interactions

Comments

Leave a Reply Cancel reply