• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Regression Models for Count Data

by Karen Grace-Martin 16 Comments

One of the main assumptions of linear models such as linear regression and analysis of variance is that the residual errors follow a normal distribution. To meet this assumption when a continuous response variable is skewed, a transformation of the response variable can produce errors that are approximately normal. Often, however, the response variable of interest is categorical or discrete, not continuous. In this case, a simple transformation cannot produce normally distributed errors.

A common example is when the response variable is the counted number of occurrences of an event. The distribution of counts is discrete, not continuous, and is limited to non-negative values. There are two problems with applying an ordinary linear regression model to these data. First, many distributions of count data are positively skewed with many observations in the data set having a value of 0. The high number of 0’s in the data set prevents the transformation of a skewed distribution into a normal one. Second, it is quite likely that the regression model will produce negative predicted values, which are theoretically impossible.

An example of a regression model with a count response variable is the prediction of the number of times a person perpetrated domestic violence against his or her partner in the last year based on whether he or she had witnessed domestic violence as a child and who the perpetrator of that violence was. Because many individuals in the sample had not perpetrated violence at all, many observations had a value of 0, and any attempts to transform the data to a normal distribution failed.

An alternative is to use a Poisson regression model or one of its variants. These models have a number of advantages over an ordinary linear regression model, including a skew, discrete distribution, and the restriction of predicted values to non-negative numbers. A Poisson model is similar to an ordinary linear regression, with two exceptions. First, it assumes that the errors follow a Poisson, not a normal, distribution. Second, rather than modeling Y as a linear function of the regression coefficients, it models the natural log of the response variable, ln(Y), as a linear function of the coefficients.

The Poisson model assumes that the mean and variance of the errors are equal. But usually in practice the variance of the errors is larger than the mean (although it can also be smaller). When the variance is larger than the mean, there are two extensions of the Poisson model that work well. In the over-dispersed Poisson model, an extra parameter is included which estimates how much larger the variance is than the mean. This parameter estimate is then used to correct for the effects of the larger variance on the p-values. An alternative is a negative binomial model. The negative binomial distribution is a form of the Poisson distribution in which the distribution’s parameter is itself considered a random variable. The variation of this parameter can account for a variance of the data that is higher than the mean.

A negative binomial model proved to fit well for the domestic violence data described above. Because the majority of individuals in the data set perpetrated 0 times, but a few individuals perpetrated many times, the variance was over 6 times larger than the mean. Therefore, the negative binomial model was clearly more appropriate than the Poisson.

All three variations of the Poisson regression model are available in many general statistical packages, including SAS, Stata, and S-Plus.

References:

  • Gardner, W., Mulvey, E.P., and Shaw, E.C (1995). “Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models”, Psychological Bulletin, 118, 392-404.
  • Long, J.S. (1997). Regression Models for Categorical and Limited Dependent Variables, Chapter 8. Thousand Oaks, CA: Sage Publications.
Poisson and Negative Binomial Regression for Count Data
Learn when you need to use Poisson or Negative Binomial Regression in your analysis, how to interpret the results, and how they differ from similar models.

Tagged With: Count data, count models, Negative Binomial Regression, Poisson Regression

Related Posts

  • The Exposure Variable in Poisson Regression Models
  • A Few Resources on Zero-Inflated Poisson Models
  • Poisson Regression Analysis for Count Data
  • The Importance of Including an Exposure Variable in Count Models

Reader Interactions

Comments

  1. Sharon says

    May 14, 2021 at 8:27 pm

    Dear Karen,

    Such a great article – thank you!

    I was wondering – are there any “classic” variables that can almost always be used in (and provide good fit to) linear regression? That is, variables that are not count data?

    Thank you very much 🙂

    Reply
    • Karen Grace-Martin says

      July 16, 2021 at 11:50 am

      That’s a good question Sharon. There are definitely variables that tend to follow normal distributions, like human height. But in any given data set, that might not be true. For example, it might not hold if your population of interest isn’t all humans, but only infants.

      Reply
  2. Manori Dilhani says

    January 12, 2021 at 11:34 am

    Hello,

    Can someone recommend me some datasets for Poisson regression analysis.

    Reply
  3. Jordan says

    August 6, 2020 at 11:15 am

    This was a very informative article. I am glad you mentioned something that has been bothering me: that Negative Binomial “models the natural log of the response variable, ln(Y), as a linear function of the coefficients.”

    I am working with a dataset which seems well suited for a count model. My Y is a discrete interger (0,1,2, ..). The conditional distributions are skewed with variance much larger than mean. However, when I run linear models on subsamples (broken down by the E[Y| Other Covariates]) I find that the effect of a one unit increase in X is fairly constant across the subsamples. This suggest the effect of X is linear. Are there any models designed for count data which allow the effect of a one unit increase in X to be linear instead of multiplicative?

    Thanks again.

    Reply
  4. Jason says

    March 9, 2020 at 7:36 pm

    Thanks for this helpful website! I have a question. The binomial and Poisson distributions both seem to assume that the individual events they are modeling are independent. What about cases where they don’t seem independent (e.g. if one event occurs, another event is more likely to occur)? Is it okay to still use one of these models? If not, what is a more appropriate count model?

    Reply
  5. Mirketa says

    November 26, 2019 at 5:51 am

    Thanks for sharing this valuable information

    Reply
  6. Agnes says

    April 30, 2019 at 10:02 am

    Hi Karen, thank you so much for your helpful article. I really appreciate that.
    I’m a student at Universitas Indonesia in Statistics major. I have a question related to my final project. If we have Poisson Regression models, is it true that the mean of error of the models could never be 0?
    Many thanks,
    Agnes.

    Reply
    • Karen Grace-Martin says

      May 9, 2019 at 9:54 am

      Hi Agnes,

      I’m not sure I understand exactly what you’re asking. As a generalized linear model, Poisson Regression “errors” are a little different than in linear models. Are you talking about something like Deviance Residuals?

      Reply
  7. Omer Abid says

    June 1, 2018 at 5:56 am

    This is perhaps the most clear explanation of why count data uses Poisson than anything else I read on the web. Thank you Karen.

    Reply
  8. obu, obu Enang says

    September 24, 2017 at 7:06 pm

    Thank you for the article, my is a question, (1) what are the possible method of modeling count data on Sunil distribution,(2) How can i use R program to run count data

    Reply
  9. Mushi Solomon says

    June 7, 2017 at 6:15 pm

    Thanks Karen
    I am the students at University of Dar es Salaam, taking MA Economics.
    I really appreciated your article. It adds knowledge.

    Thank you
    Mushi Solomon

    Reply
  10. Davies says

    July 3, 2015 at 11:01 am

    Thanks for this short and highly educative piece about our closest everyday life distribution- the Poisson.I am currently doing a PhD research on Bayesian Spatial modeling and my response variable is in counts. Extending the model to accomodate for spatial random effects in the presence of overdispersion is asssumed i can use the negative binomial to model for the count data. For the spatial random effects, my question now is this; can i assume a multivariate student t distribution against the widely assumed gaussian? Pls I need ur expertise.

    Davies, South Africa.

    Reply
  11. Karen says

    December 9, 2011 at 1:56 pm

    Great question, Anna.

    You will probably get very similar parameter estimates whether you run it as a normal or Poisson model. As the mean gets further away from zero even as low as 10, the Poisson distribution looks more and more like a normal distribution. It becomes symmetric with it a mode at the mean.

    However, the normal distribution really is assuming that details extend forever. Therefore, it can give you predicted values that are negative. The Poisson distribution won’t do that, because of the log link. So you only get positive predicted values.

    so it depends on whether you’re just interested in the regression coefficients, which would be slightly easier to interpret using a normal model, or the predicted values, which will be more accurate using a Poisson model.

    Karen

    Reply
  12. Anna says

    November 30, 2011 at 1:53 pm

    Such a good article! It answered most of my questions on modeling count data.
    Just one more: If the distribution of the count data is not skewed, but following a normal-like distribution, could I still use Poisson regression. If so, which one is better? Poisson or OLS?
    Thank you very much.
    Have a good one,
    Anna

    Reply
  13. COLIN ATKINSON says

    May 5, 2011 at 12:30 pm

    THIS IS AN EXCELLENT ARTICLE. IT IS SO EASY TO FOLLOW AND THEREFORE TO REMEMBER.
    MANY THANKS,
    COLIN ATKINSON

    Reply
  14. Jay says

    November 5, 2010 at 2:46 am

    Just wanted to say that this article is going to be a life saver for me.

    Thanks so much for reminding me about Poisson. It’d been years since my schooling, and without application, all but the most basic elements of my stat teachings had abandoned me. This article gave me the info I needed to get me asking the right questions that will get me to my answers.

    Thanks!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT