• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Poisson or Negative Binomial? Using Count Model Diagnostics to Select a Model

by Jeff Meyer 10 Comments

How do you choose between Poisson and negative binomial models for discrete count outcomes?

One key criterion is the relative value of the variance to the mean after accounting for the effect of the predictors. A previous article discussed the concept of a variance that is larger than the model assumes: overdispersion.

(Underdispersion is also possible, but much less common).

There are two ways to check for overdispersion:

  1. The Pearson Chi2 dispersion statistic

The Pearson Chi2 dispersion statistic for the model run in that article was 2.94. If the variance is equal to the mean, the dispersion statistic would equal one.

When the dispersion statistic is close to one, a Poisson model fits. If it is larger than one, a negative binomial model fits better.

  1. Residual Plots

Plotting the standardized deviance residuals to the predicted counts is another method of determining which model, Poisson or negative binomial, is a better fit for the data.

Here is the plot using a Poisson model when regressing the number of visits to the doctor in a two week period on gender, income and health status.

The series of waves in the graph is not an unusual structure when graphing count model residuals and predicted outcomes.

Our primary focus is on the scale of the y axis.  A good fitting model will have the majority of the points between negative 2 and positive 2. There should be few points below negative 3 and above positive 3.

Adding more predictors to the model can have an impact on improving the plot but the Poisson model is clearly a very poor fitting model for these data.

If we use the same predictors but use a negative binomial model, the graph improves significantly.

Notice now the maximum value for the standardized deviance residual is now 4 as compared to 8 for the Poisson model. The model still has room for improvement.  That would require, if they are available, selecting better predictors of the outcome.

Now let’s compare the graphs when the Pearson Chi2 dispersion is closer to one. We will now regress the count of rabbits per 400 square yard plots on shrub coverage, density of shrubbery and variety of shrubbery. The Pearson Chi2 dispersion for this model is 1.15.

Using a Poisson model our graph looks like this:

Almost all of the residual points are now inside of negative 2 and positive 2.

Here is the graph of the negative binomial model using the same predictors:

The two graphs are nearly identical.

As you have seen, graphing the standardized deviance residuals by the predicted outcomes can help us verify which type of model is a better fit for your data.

Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here.

Poisson and Negative Binomial Regression for Count Data
Learn when you need to use Poisson or Negative Binomial Regression in your analysis, how to interpret the results, and how they differ from similar models.

Tagged With: count model, dispersion statistic, Model Fit, negative binomial, overdispersion, poisson, predicted count, residual plot

Related Posts

  • Overdispersion in Count Models: Fit the Model to the Data, Don’t Fit the Data to the Model
  • The Problem with Linear Regression for Count Data
  • The Importance of Including an Exposure Variable in Count Models
  • Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims

Reader Interactions

Comments

  1. T.Tana says

    March 11, 2021 at 12:53 am

    I found this very useful. The discussion on graph interpretation was exactly guidance I was looking for.
    My question is the same as the first comment.
    How do you calculate standardised deviance residuals?
    I am using r-code and I have googled for an answer on this with no success.

    Reply
    • Jeff Meyer says

      March 11, 2021 at 6:24 pm

      Hi,

      This is an example for the code for a model:
      model_nb <- glm.nb(medical_advice ~ sex + age_cen + income +
      hscore + nonpresc , data=adv, control = glm.control(maxit = 100))

      To generate the deviance residual:
      dev_std_nb<-rstandard(model_nb)

      Reply
  2. raw says

    May 10, 2020 at 7:32 pm

    how to calculate standardize deviance residual manually ? I try to use it in python

    Reply
  3. Giovanni says

    January 3, 2020 at 7:25 am

    Hi, thanks a lot for this interesting article? What are the commands for the residual plots?
    Thanks for your kindest support

    Reply
    • Jeff Meyer says

      January 3, 2020 at 1:13 pm

      I created the graphs using Stata. First you generate the residuals and the linear predicted values.
      predict dev_nb, deviance standard // deviance residual
      predict xb_nb, xb //linear prediction

      The qnorm plot:
      qnorm dev_nb, title (“negative binomial”)

      Scatter plot
      twoway(scatter dev_nb xb_nb)

      Jeff

      Reply
      • komal says

        February 15, 2022 at 12:28 am

        Hi Jeff,
        Thnaks a lot for this.
        These commands are not working with my STATA.
        Could you please explain

        Reply
  4. Kelsey says

    February 26, 2019 at 11:33 am

    Hello:
    I have a question regarding this as well: how would you test for over dispersion in GAMMs? I have data that I believe is zero-inflated but I want to check whether to apply a negative binomial or a poisson would be better?

    I’m very new with these types of statistics and have found myself very confused.

    Thank you!

    Reply
    • Jeff Meyer says

      February 26, 2019 at 5:57 pm

      Hi Kelsey,

      It’s my view that “zero inflated” is a misleading term. It actually does not mean that you have a lot of zeros in your outcome variable. Zero inflated means you have observations in your data set that can only be zeros. For example, let’s suppose your outcome is “number of fish caught” by people on the pier. There might be people on the pier that aren’t trying to catch fish, they might be there to enjoy the smell of the salt in the air. People not fishing would be considered to be zero inflated and should be accounted for differently.

      Regarding how do you determine whether you should use a negative binomial model or a Poisson model, my first choice is to use a negative binomial model. If the data is not overdispersed the negative binomial model will most likely not converge. If it doesn’t converge I would then use a Poisson model.

      Jeff

      Reply
  5. Benjamin says

    October 9, 2018 at 1:38 am

    Hello:
    I had a question. On this page you stated that a good fitting model would have the majority of its residual points between 2 and -2. I am wondering why this is the case? Any help you could provide would be fantastic.

    Reply
    • Jeff Meyer says

      October 10, 2018 at 9:21 am

      Hi Benjamin,

      Our goal for a good fitting model is a normal distribution of our residuals. In a standardized normal distribution (think of the bell curve) approximately 95% of all of the observations are between -2 and +2. We would like to see that pattern with our standardized residuals. In the case for count models with discrete outcomes, we use the standardized deviance residuals because they have been modified to reflect the distribution that we would have with a continuous outcome. Standardized Pearson residuals should not be used.

      Jeff

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Introduction to SPSS Software Tutorial

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT