When our outcome variable is the frequency of occurrence of an event, we will typically use a count model to analyze the results. There are numerous count models. A few examples are: Poisson, negative binomial, zero-inflated Poisson and truncated negative binomial.
There are specific requirements for which count model to use. The models are not interchangeable. But regardless of the model we use, there is a very important prerequisite that they all share.
One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.
For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.
Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the (more…)
We previously examined why a linear regression and negative binomial regression were not viable models for predicting the expected length of stay in the hospital for people with the flu. A linear regression model was not appropriate because our outcome variable, length of stay, was discrete and not continuous.
A negative binomial model wasn’t the proper choice because the minimum length of stay is not zero. The minimum length of stay is one day. Negative binomial and Poisson models can only be used on data where the observations’ outcome have the possibility of having a zero count.
We need to use a truncated negative binomial model to analyze the expected length of stay of people admitted to the hospital who have the flu. Calculating the expected length of stay is an easy task once we create our model. (more…)
Imagine this scenario:
This year’s flu strain is very vigorous. The number of people checking in at hospitals is rapidly increasing. Hospitals are desperate to know if they have enough beds to handle those who need their help.
You have been asked to analyze a previous year’s hospitalization length of stay by people with the flu who had been admitted to the hospital. The predictors in your data set are age group, gender and race of those admitted. You also have an indicator that signifies whether the hospital was privately or publicly run.
It’s that time of year: flu season.
Let’s imagine you have been asked to determine the factors that will help a hospital determine the length of stay in the intensive care unit (ICU) once a patient is admitted.
The hospital tells you that once the patient is admitted to the ICU, he or she has a day count of one. As soon as they spend 24 hours plus 1 minute, they have stayed an additional day.
Clearly this is count data. There are no fractions, only whole numbers.
To help us explore this analysis, let’s look at real data from the State of Illinois. We know the patients’ ages, gender, race and type of hospital (state vs. private).
A partial frequency distribution looks like this: (more…)
Count variables are common dependent variables in many fields. For example:
- Number of diseased trees
- Number of salamander eggs that hatch
- Number of crimes committed in a neighborhood
Although they are numerical and look like they should work in linear models, they often don’t.
Not only are they discrete instead of continuous (you can’t have 7.2 eggs hatching!), they can’t go below 0. And since 0 is often the most common value, they’re often highly skewed — so skewed, in fact, that transformations don’t work.
There are, however, generalized linear models that work well for count data. They take into account the specific issues inherent in count data. They should be accessible to anyone who is familiar with linear or logistic regression.
In this webinar, we’ll discuss the different model options for count data, including how to figure out which one works best. We’ll go into detail about how the models are set up, some key statistics, and how to interpret parameter estimates.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
About the Instructor
Karen Grace-Martin helps statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.
She has guided and trained researchers through their statistical analysis for over 15 years as a statistical consultant at Cornell University and through The Analysis Factor. She has master’s degrees in both applied statistics and social psychology and is an expert in SPSS and SAS.
Not a Member Yet?
It’s never too early to set yourself up for successful analysis with support and training from expert statisticians.
Just head over and sign up for Statistically Speaking
You'll get access to this training webinar, 100+ other stats trainings, a pathway to work through the trainings that you need — plus the expert guidance you need to build statistical skill with live Q&A sessions and an ask-a-mentor forum.