When we run a statistical model, we are in a sense creating a mathematical equation. The simplest regression model looks like this:
Yi = β0 + β1X+ εi
The left side of the equation is the sum of two parts on the right: the fixed component, β0 + β1X, and the random component, εi.
You’ll also sometimes see the equation written (more…)
One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.
For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.
Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the (more…)
We previously examined why a linear regression and negative binomial regression were not viable models for predicting the expected length of stay in the hospital for people with the flu. A linear regression model was not appropriate because our outcome variable, length of stay, was discrete and not continuous.
A negative binomial model wasn’t the proper choice because the minimum length of stay is not zero. The minimum length of stay is one day. Negative binomial and Poisson models can only be used on data where the observations’ outcome have the possibility of having a zero count.
We need to use a truncated negative binomial model to analyze the expected length of stay of people admitted to the hospital who have the flu. Calculating the expected length of stay is an easy task once we create our model. (more…)
Count variables are common dependent variables in many fields. For example:
- Number of diseased trees
- Number of salamander eggs that hatch
- Number of crimes committed in a neighborhood
Although they are numerical and look like they should work in linear models, they often don’t.
Not only are they discrete instead of continuous (you can’t have 7.2 eggs hatching!), they can’t go below 0. And since 0 is often the most common value, they’re often highly skewed — so skewed, in fact, that transformations don’t work.
There are, however, generalized linear models that work well for count data. They take into account the specific issues inherent in count data. They should be accessible to anyone who is familiar with linear or logistic regression.
In this webinar, we’ll discuss the different model options for count data, including how to figure out which one works best. We’ll go into detail about how the models are set up, some key statistics, and how to interpret parameter estimates.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
About the Instructor
Karen Grace-Martin helps statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.
She has guided and trained researchers through their statistical analysis for over 15 years as a statistical consultant at Cornell University and through The Analysis Factor. She has master’s degrees in both applied statistics and social psychology and is an expert in SPSS and SAS.
Not a Member Yet?
It’s never too early to set yourself up for successful analysis with support and training from expert statisticians.
Just head over and sign up for Statistically Speaking
You'll get access to this training webinar, 100+ other stats trainings, a pathway to work through the trainings that you need — plus the expert guidance you need to build statistical skill with live Q&A sessions and an ask-a-mentor forum.
1. For a general overview of modeling count variables, you can get free access to the video recording of one of my The Craft of Statistical Analysis Webinars:
Poisson and Negative Binomial for Count Outcomes
2. One of my favorite books on Categorical Data Analysis is:
Long, J. Scott. (1997). Regression models for Categorical and Limited Dependent Variables. Sage Publications.
It’s moderately technical, but written with social science researchers in mind. It’s so well written, it’s worth it. It has a section specifically about Zero Inflated Poisson and Zero Inflated Negative Binomial regression models.
3. Slightly less technical, but most useful only if you use Stata is Regression Models for Categorical Dependent Variables Using Stata, by J. Scott Long and Jeremy Freese.
4. UCLA’s ATS Statistical Software Consulting Group has some nice examples of Zero-Inflated Poisson and other models in various software packages.
There are many dependent variables that no matter how many transformations you try, you cannot get to be normally distributed. The most common culprits are count variables–the variable that measures the count or rate of some event in a sample. Some examples I’ve seen from a variety of disciplines are:
Number of eggs in a clutch that hatch
Number of domestic violence incidents in a month
Number of times juveniles needed to be restrained during tenure at a correctional facility
Number of infected plants per transect
A common quality of these variables is that 0 is the mode–the most common value. 1 is the next most common, 2 the next, and so on. In variables with low expected counts (number of cars in a household, number of degrees earned), (more…)