I received an e-mail from a researcher in Canada that asked about communicating logistic regression results to non-researchers. It was an important question, and there are a number of parts to it.
With the asker’s permission, I am going to address it here.
To give you the full context, she explained in a follow-up email that she is communicating to a clinical audience who will be using the results to make clinical decisions. They need to understand the size of an effect that an intervention will provide. She refers to an output I presented in my webinar on Probability, Odds, and Odds Ratios, which you can view free here.
I just went through the two lectures re: logistic regression and prob/odds/odds ratios. I completely understand everything and I have recently run some logistic and multinomial regressions. I have read many papers etc. but still struggle with the meaning of the ratio and putting it into language that nonstats folks will understand.
In your example, you had gender 0=fem, 1=male for failing with EXP(b) at 5.18. I understand males have 5.2xs odds of failing compared to odds of females failing, and that males odds are 418% higher than females odds of failing….but how do I make sense of that with a clear example. Can I say for every one female who fails, X number of males will fail? Or for every 1% of females who fail, X% of males will fail?
So could you provide me a way of explaining what 418% higher odds actually means – I think I am also trying to figure out how I would decide whether it is clinically meaningful. For example, if males have 5% higher odds for Y compared to females, how do I know 5% higher odds really matters?
First of all, you’re absolutely correct in the way you’re interpreting the odds ratio. You’re also correct in that for people who are not familiar with odds ratios, they’re going to need an intuitive way of understanding the results.
This is especially true when your audience is a clinical one who needs to make decisions based on your results. So you’re also absolutely correct that presenting a table full of odds ratios is not the way to go here.
To answer your first question, no. You cannot say for every one female who fails, X number of males will fail.
A Concrete Expression of Odds
You can, however, convey the odds ratios in a concrete way through an example.
So for example, you could say if the odds of a female failing is 1 to 2, the odds of a male failing is about five times as big, or about 5 to 2.
In other words, for every 10 females who fail, 20 pass all their classes.
But for every 10 males who fail, only 4 pass all their classes.
Most people can understand odds and odds ratios in those terms.
This works extremely well for both categorical predictor variables, which interventions usually are (eg. control vs. intervention), or continuous variables, like SAT Math score.
Another way to do it is the way you suggest second, which is to convert predicted odds to predicted probabilities of a female or male failing.
The one thing you have to be careful of, though, is the effect of a predictor on a probability of failing is not constant across all values of the predictor.
That is not a big deal for the categorical predictors, but it can be misleading for continuous ones.
For example, you can figure out the probability of failing at different SAT math scores, which had an odds ratio of .97. So the 3% lower odds of failing is the same whether the SAT Math score is low (at 250), medium (at 500), or high (at 750). It’s because it works like a rate–how much the odds differs depends on the starting point.
The thing to keep in mind though, is most people will interpret those predicted probabilities as means, and think immediately in terms of the differences in those means. The differences in those probabilites is not the same at low, medium, and high values of SAT Math. It’s bigger in the middle than at the ends because the relationship between SAT Math and the probability isn’t linear, it’s sigmoidal.
It’s easy to say that last fact isn’t important, but it’s why we’re running logistic regression in the first place.
So at the very least, show what the predicted probabilities are at many values of SAT math, and point out that increasing an SAT math score by 20 points has a very small effect for people whose scores are very low or very high, and a much larger effect for people whose scores are in the middle.
I recently come across this great article from Decision Science News that gets at this exact question within the context of medical risk : Some ideas on communicating risks to the general public.
It describes some graphical ways to show risk by making the frequencies that make up the probabilities explicit. While logistic regression results aren’t necessarily about risk, risk is inherently about likelihoods that some outcome will happen, so it applies quite well.
Clinically Meaningful Effects
Now what’s clinically meaningful is a whole different story. That can be difficult with any regression parameter in any regression model.
The odds ratio is an effect size you can use to choose a clinically meaningful cutoff, but you’re going to have to use your substantive knowledge of your variables and your field to decide how much of an effect makes a clinical difference in people’s lives.
Paul Murphy says
Does the ASC in a logistic regression have a meaning. Clearly we are looking at the odds of the estimated parameters so is it correct to include an ASC. I have always been told to include it if it is has a significance of 5% or less but how does one interpret it or do we just ignore it? Thanks
adejuwon samuel says
please I need your help on how to use logistic regression in nonresponse survey analysis
what you want to know ? Almost every survey has some non response. Use of logistic regression depends on your objective and type of variable.
I am a Public Administration student very new in logistic regression. My question is whether Logistic regression can be used to measure the Relationship/ influence of migrants on spatial planning?
Abubakar Abdullahi says
your study is more of an effect test,therefore I suggest you use chi square instead of logistic regression because it us use when you have a dependent variable which should be a binary and a set of independent variables.