Chi-square test vs. Logistic Regression: Is a fancier test better?

by Karen

I recently received this email, which I thought was a great question, and one of wider interest…

Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis.  Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?

I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.

My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.

Thank you!

My response:

Gee, thanks.  I look forward to seeing you on the webinars.

Per your question, there are a number of different reasons I’ve seen.

You’re right that there are many situations in which a sophisticated (and complicated) approach and a simple approach both work equally well, and all else being equal, simple is better.

Of course I can’t say why anyone uses any particular methodology in any particular study without seeing it, but I can guess at some reasons.

I’m sure there is a bias among researchers to go complicated because even when journals say they want simple, the fancy stuff is so shiny and pretty and gets accepted more.  Mainly because it communicates (on some level) that you understand sophisticated statistics, and have checked out the control variables, so  there’s no need for reviewers to object.  And whether any of this is actually true, I’m sure people worry about it.

Including controls truly is important in many relationships.  Simpson’s paradox, in which a relationship reverses itself without the proper controls, really does happen.

Now you could debate that logistic regression isn’t the best tool.  If all the variables, predictors and outcomes, are categorical, a log-linear analysis is the best tool.  A log-linear analysis is an extension of Chi-square.

That said, I personally have never found log-linear models  intuitive to use or interpret.  So, if given the choice, I will use logistic regression.  My personal philosophy is that if two tools are both reasonable, and one is so obtuse your audience won’t understand it, go with the easier one.

Which brings us back to chi-square.  Why not just use the simplest of all?

A Chi-square test is really a descriptive test, akin to a correlation.  It’s not a modeling technique, so there is no dependent variable.  So the question is, do you want to describe the strength of a relationship or do you want to model the determinants of and predict the likelihood of an outcome?

So even in a very simple, bivariate model, if you want to explicitly define a dependent variable, and make predictions, a logistic regression is appropriate.

Bookmark and Share

If you’d like to learn more about what the results actually mean in logistic regression, you can download a free recording of my webinar: Understanding Probability, Odds, and Odds Ratios in Logistic Regression. It’s free.

{ 5 comments… read them below or add one }

Katie April 2, 2012 at 3:31 pm

Hi, I was wondering if you could help. I am trying to find out how chi square tests are different from log linear analysis and my search brought me here. All I know so far is that log linear analysis is just an extention of chi-square and can be used for more variables!?

Reply

Karen April 2, 2012 at 3:46 pm

Hi Katie,

Yes, that’s true. A log-linear with a single IV would give you the identical results to a chi-square. Log-linear models are basically built off of chi-square tests, but I don’t honestly remember the details of how it was derived well enough to explain it.

Karen

Reply

J-L May 2, 2012 at 8:30 pm

Hi there

This is probably a pretty basic question, but I’m looking at the relationship between 2 categorical (nominal) variables and I want to explicitly define the dependent variable. The problem is, the DV has 3 categories, so normal logistic regression wouldn’t work. My next thoughts were to do multinomial regression, but I only have one IV (with 5 categories) so that would also be inappropriate, right? Is this a situation where log linear analysis would work? Any help would be much appreciated.

Thanks in advance.

J-L

Reply

Karen May 3, 2012 at 2:35 pm

Aha, not basic at all.

It IS the exact situation for a log linear analysis. You could also do the multinomial logistic regression if you dummy code the IV. You would get the same results, although the log linear analysis would put them in a more interpretable form. It would be much like doing a linear regression with a single 5-category IV. It works, but it’s a little awkward.

Karen

Reply

J-L May 7, 2012 at 6:16 pm

Thanks very much, Karen. That helps a lot.

Reply

Leave a Comment

{ 1 trackback }

Previous post:

Next post: