Count Regression Models



online workshops

Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models

Once you learn the ins and outs of linear models, it can seem that you’re ready to tackle any dependent variable. But not all numerical dependent variables are created equal! Some are discrete, not continuous. If you apply linear regression, which is designed for continuous dependent variables, to discrete dependent variables, you’re going to run into some BIG issues. learn more


the craft of statistical analysis free webinars

Poisson and Negative Binomial Regression for Count Data

Ever discover that your data are not normally distributed, no matter what transformation you try? It may be that they follow another distribution altogether. Although they are numerical, discrete count data often follow a Poisson or Negative Binomial distribution, not a normal one. learn more


statistically speaking member trainings

Zero Inflated Models

A common situation with count outcome variables is there are a lot of zero values. The Poisson distribution used for modeling count variables takes into account that zeros are often the most common value, but sometimes there are even more zeros than the Poisson distribution can account for. learn more

Making Sense of Statistical Distributions

Many who work with statistics are already functionally familiar with the normal distribution, and maybe even the binomial distribution. These common distributions are helpful in many applications, but what happens when they just don’t work? learn more

Generalized Linear Models

Generalized linear models are designed to work with outcomes that aren’t normally distributed, but have other recognizable characteristics, such as being counts, proportions, or belonging to categories. They are often exactly what you need when you just can’t get a normal distribution to fit. learn more

Types of Regression Models and When to Use Them

Linear, Logistic, Tobit, Cox, Poisson, Zero Inflated… The list of regression models goes on and on before you even get to things like ANCOVA or Linear Mixed Models. learn more


articles at the analysis factor

Poisson and Negative Binomial Regression

Poisson Regression Analysis for Count Data

There are many dependent variables that no matter how many transformations you try, you cannot get to be normally distributed. The most common culprits are count variables–the variable that measures the count or rate of some event in a sample. learn more

Differences Between the Normal and Poisson Distributions

The normal distribution is so ubiquitous in statistics that those of us who use a lot of statistics tend to forget it’s not always so common in actual data. And since the normal distribution is continuous, many people describe all numerical variables as continuous. I get it: I’m guilty of using those terms interchangeably, too, but they’re not exactly the same. learn more

Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims

Let’s imagine you have been asked to determine the factors that will help a hospital determine the length of stay in the intensive care unit once a patient is admitted. The hospital tells you that once the patient is admitted to the ICU, he or she has a day count of one. As soon as they spend 24 hours plus 1 minute, they have stayed an additional day. Clearly this is count data. There are no fractions, only whole numbers. learn more

Interpreting Regression Coefficients in Models other than Ordinary Linear Regression

Someone who registered for my Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well. The short answer: Yes. The detailed explanation of why this is true and the one caveat: One of the greatest things about regression models is that they all have the same set up. learn more

Understanding Incidence Rate Ratios through the Eyes of a Two-Way Table

The coefficients of count model regression tables are shown in either logged form or as incidence rate ratios. Trying to explain the coefficients in logged form can be a difficult process. Incidence rate ratios are much easier to explain. You probably didn’t realize you’ve seen incidence rate ratios before, expressed differently. learn more

Two-Way Tables and Count Models: Expected and Predicted Counts

Previously, we discussed how incidence rate ratios calculated in a Poisson regression can be determined from a two-way table of categorical variables. Statistical software can also calculate the expected (aka predicted) count for each group. learn more

The Exposure Variable in Poisson Regression Models

Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables. learn more

Zero-Inflated Poisson Models for Count Outcomes

There are quite a few types of outcome variables that will never meet ordinary linear model’s assumption of normally distributed residuals. A non-normal outcome variable can have normally distribued residuals, but it does need to be continuous, unbounded, and measured on an interval or ratio scale. learn more

When Can Count Data be Considered Continuous?

Recently I did a webinar on Poisson and negative binomial models for count data. With a few hundred participants, we ran out of time to get through all the questions, so I’m answering some of them here on the blog. This set of questions are all related to when it’s appropriate to treat count data as continuous and run the more familiar and simpler linear model. learn more

Count Models: Understanding the Log Link Function

In linear regression, we assume that probability distribution is normal. But there are a lot of outcome variables for which a normal distribution doesn’t fit. Generalized linear models allow a few other distributions, including Poisson, binomial, and Gamma (among others). learn more

Issues with Truncated Data

Previously, we explored bounded variables and the difference between truncated and censored. Can we ignore the fact that a variable is bounded and just run our analysis as if the data wasn’t bounded? Count data, which consists of non-negative integers, are naturally bounded – you can’t have negative counts. learn more

Overdispersion in Count Models: Fit the Model to the Data, Don’t Fit the Data to the Model

If you have count data you use a Poisson model for the analysis, right? The key criterion for using a Poisson model is after accounting for the effect of predictors, the mean must equal the variance. If the mean doesn’t equal the variance then all we have to do is transform the data or tweak the model, correct? learn more


Count Models in the Context of Generalized Linear Models

Confusing Statistical Term #7: GLM

Like some of the other terms in our list, GLM has two different meanings. It’s a little different than the others, though, because it’s an abbreviation for two different terms: General Linear Model and Generalized Linear Model. It’s extra confusing because their names are so similar on top of having the same abbreviation. learn more

Generalized Linear Models in R, Part 6: Poisson Regression for Count Variables

Earlier in the series, I demonstrated a logistic regression model with binomial errors on binary data in R’s glm() function. But one of wonderful things about glm() is that it is so flexible. It can run so much more than logistic regression models. The flexibility, of course, also means that you have to tell it exactly which model you want to run, and how. In fact, we can use generalized linear models to model count data as well. learn more

Generalized Linear Models in R, Part 7: Checking for Overdispersion in Count Regression

Last time, we fitted a generalized linear model to count data using a Poisson error structure. We found, however, that there was over-dispersion in the data – the variance was larger than the mean in our dependent variable. Over-dispersion is a problem if the conditional variance (residual variance) is larger than the conditional mean. learn more

Five Extensions of the General Linear Model

Generalized linear models, linear mixed models, generalized linear mixed models, marginal models, GEE models. You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the general linear model. learn more

How to Combine Complicated Models with Tricky Effects

Need to dummy code in a Cox regression model? Interpret interactions in a logistic regression? Add a quadratic term to a multilevel model? This is where statistical analysis starts to feel really hard. You’re combining two difficult issues into one. learn more

When Linear Models Don’t Fit Your Data, Now What?

When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, linear models don’t fit. The data just will not meet the assumptions of linear models. But there’s good news: other models exist for many types of dependent variables. learn more

6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures. That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates. But you need to check the assumptions anyway, because some departures are so far that the p-value become inaccurate. learn more

Interpreting Regression Coefficients in Models other than Ordinary Linear Regression

Someone who registered for my Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well. The short answer: Yes. The detailed explanation of why this is true and the one caveat: One of the greatest things about regression models is that they all have the same set up. learn more