I recently received this email, which I thought was a great question, and one of wider interest…
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis. Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?
I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.
My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.
Gee, thanks. I look forward to seeing you on the webinars.
Per your question, there are a number of different reasons I’ve seen.
You’re right that there are many situations in which a sophisticated (and complicated) approach and a simple approach both work equally well, and all else being equal, simple is better.
Of course I can’t say why anyone uses any particular methodology in any particular study without seeing it, but I can guess at some reasons.
I’m sure there is a bias among researchers to go complicated because even when journals say they want simple, the fancy stuff is so shiny and pretty and gets accepted more. Mainly because it communicates (on some level) that you understand sophisticated statistics, and have checked out the control variables, so there’s no need for reviewers to object. And whether any of this is actually true, I’m sure people worry about it.
Including controls truly is important in many relationships. Simpson’s paradox, in which a relationship reverses itself without the proper controls, really does happen.
Now you could debate that logistic regression isn’t the best tool. If all the variables, predictors and outcomes, are categorical, a log-linear analysis is the best tool. A log-linear analysis is an extension of Chi-square.
That said, I personally have never found log-linear models intuitive to use or interpret. So, if given the choice, I will use logistic regression. My personal philosophy is that if two tools are both reasonable, and one is so obtuse your audience won’t understand it, go with the easier one.
Which brings us back to chi-square. Why not just use the simplest of all?
A Chi-square test is really a descriptive test, akin to a correlation. It’s not a modeling technique, so there is no dependent variable. So the question is, do you want to describe the strength of a relationship or do you want to model the determinants of and predict the likelihood of an outcome?
So even in a very simple, bivariate model, if you want to explicitly define a dependent variable, and make predictions, a logistic regression is appropriate.