When our outcome variable is the frequency of occurrence of an event, we will typically use a count model to analyze the results. There are numerous count models. A few examples are: Poisson, negative binomial, zero-inflated Poisson and truncated negative binomial.
There are specific requirements for which count model to use. The models are not interchangeable. But regardless of the model we use, there is a very important prerequisite that they all share.
How do you choose between Poisson and negative binomial models for discrete count outcomes?
One key criterion is the relative value of the variance to the mean after accounting for the effect of the predictors. A previous article discussed the concept of a variance that is larger than the model assumes: overdispersion.
(Underdispersion is also possible, but much less common).
There are two ways to check for overdispersion: (more…)
Imagine this scenario:
This year’s flu strain is very vigorous. The number of people checking in at hospitals is rapidly increasing. Hospitals are desperate to know if they have enough beds to handle those who need their help.
You have been asked to analyze a previous year’s hospitalization length of stay by people with the flu who had been admitted to the hospital. The predictors in your data set are age group, gender and race of those admitted. You also have an indicator that signifies whether the hospital was privately or publicly run.
Many who work with statistics are already functionally familiar with the normal distribution, and maybe even the binomial distribution.
These common distributions are helpful in many applications, but what happens when they just don’t work?
This webinar will cover a number of statistical distributions, including the:
- Poisson and negative binomial distributions (especially useful for count data)
- Multinomial distribution (for responses with more than two categories)
- Beta distribution (for continuous percentages)
- Gamma distribution (for right-skewed continuous data)
- Bernoulli and binomial distributions (for probabilities and proportions)
- And more!
We’ll also explore the relationships among statistical distributions, including those you may already use, like the normal, t, chi-squared, and F distributions.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
It’s that time of year: flu season.
Let’s imagine you have been asked to determine the factors that will help a hospital determine the length of stay in the intensive care unit (ICU) once a patient is admitted.
The hospital tells you that once the patient is admitted to the ICU, he or she has a day count of one. As soon as they spend 24 hours plus 1 minute, they have stayed an additional day.
Clearly this is count data. There are no fractions, only whole numbers.
To help us explore this analysis, let’s look at real data from the State of Illinois. We know the patients’ ages, gender, race and type of hospital (state vs. private).
A partial frequency distribution looks like this: (more…)
The normal distribution is so ubiquitous in statistics that those of us who use a lot of statistics tend to forget it’s not always so common in actual data.
And since the normal distribution is continuous, many people describe all numerical variables as continuous. I get it: I’m guilty of using those terms interchangeably, too, but they’re not exactly the same.
Numerical variables can be either continuous or discrete.
The difference? Continuous variables can take any number within a range. Discrete variables can only be whole numbers.
So 3.04873658 is a possible value of a continuous variable, but not discrete.
Count variables, as the name implies, are frequencies of some event or state. Number of arrests, fish (more…)