How do you choose between Poisson and negative binomial models for discrete count outcomes?
One key criterion is the relative value of the variance to the mean after accounting for the effect of the predictors. A previous article discussed the concept of a variance that is larger than the model assumes: overdispersion.
(Underdispersion is also possible, but much, much less common).
There are two ways to check for overdispersion:
- The Pearson Chi2 dispersion statistic
The Pearson Chi2 dispersion statistic for the model run in that article was 2.94. If the variance is equal to the mean, the dispersion statistic would equal one.
- Residual Plots
Plotting the standardized deviance residuals to the predicted counts is another method of determining which model, Poisson or negative binomial, is a better fit for the data.
Here is the plot using a Poisson model when regressing the number of visits to the doctor in a two week period on gender, income and health status.
The series of waves in the graph is not an unusual structure when graphing count model residuals and predicted outcomes.
Our primary focus is on the scale of the y axis. A good fitting model will have the majority of the points between negative 2 and positive 2. There should be few points below negative 3 and above positive 3.
Adding more predictors to the model can have an impact on improving the plot but the Poisson model is clearly a very poor fitting model for these data.
If we use the same predictors but use a negative binomial model, the graph improves significantly.
Notice now the maximum value for the standardized deviance residual is now 4 as compared to 8 for the Poisson model. The model still has room for improvement. That would require, if they are available, selecting better predictors of the outcome.
Now let’s compare the graphs when the Pearson Chi2 dispersion is closer to one. We will now regress the count of rabbits per 400 square yard plots on shrub coverage, density of shrubbery and variety of shrubbery. The Pearson Chi2 dispersion for this model is 1.15.
Using a Poisson model our graph looks like this:
Almost all of the residual points are now inside of negative 2 and positive 2.
Here is the graph of the negative binomial model using the same predictors:
The two graphs are nearly identical.
As you have seen, graphing the standardized deviance residuals by the predicted outcomes can help us verify which type of model is a better fit for your data.
- Overdispersion in Count Models: Fit the Model to the Data, Don’t Fit the Data to the Model
- The Problem with Linear Regression for Count Data: Predicting Length of Stay in Hospital
- August 2017 Member Webinar: Making Sense of Statistical Distributions
- Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims