• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Zero Inflated

Member Training: Zero Inflated Models

by Karen Grace-Martin Leave a Comment

A common situation with count outcome variables is there are a lot of zero values.  The Poisson distribution used for modeling count variables takes into account that zeros are often the most common value, but sometimes there are even more zeros than the Poisson distribution can account for.

This can happen in continuous variables as well–most of the distribution follows a beautiful normal distribution, except for the big stack of zeros.

This webinar will explore two ways of modeling zero-inflated data: the Zero Inflated model and the Hurdle model. Both assume there are two different processes: one that affects the probability of a zero and one that affects the actual values, and both allow different sets of predictors for each process.

We’ll explore these models as well as some related models, like Zero-One Inflated Beta models for proportion data.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

[Read more…] about Member Training: Zero Inflated Models

Tagged With: beta regression, count model, hurdle model, outcome variable, Zero Inflated, zero values

Related Posts

  • Member Training: Count Models
  • Member Training: Types of Regression Models and When to Use Them
  • The Importance of Including an Exposure Variable in Count Models
  • Count Models: Understanding the Log Link Function

When to Check Model Assumptions

by Karen Grace-Martin 1 Comment

Like the chicken and the egg, there’s a question about which comes first: run a model or test assumptions? Unlike the chickens’, the model’s question has an easy answer.

There are two types of assumptions in a statistical model.  Some are distributional assumptions about the residuals.  Examples include independence, normality, and constant variance in a linear model.

Others are about the form of the model.  They include linearity and [Read more…] about When to Check Model Assumptions

Tagged With: categorical outcome, Censored, Model Assumptions, testing normality, Zero Inflated

Related Posts

  • When Dependent Variables Are Not Fit for Linear Models, Now What?
  • Member Training: Types of Regression Models and When to Use Them
  • 6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption
  • Confusing Statistical Term #7: GLM

Member Training: Count Models

by Karen Grace-Martin Leave a Comment

Count variables are common dependent variables in many fields. For example:

  • Number of diseased trees
  • Number of salamander eggs that hatch
  • Number of crimes committed in a neighborhood

Although they are numerical and look like they should work in linear models, they often don’t.

Not only are they discrete instead of continuous (you can’t have 7.2 eggs hatching!), they can’t go below 0. And since 0 is often the most common value, they’re often highly skewed — so skewed, in fact, that transformations don’t work.

There are, however, generalized linear models that work well for count data. They take into account the specific issues inherent in count data. They should be accessible to anyone who is familiar with linear or logistic regression.

In this webinar, we’ll discuss the different model options for count data, including how to figure out which one works best. We’ll go into detail about how the models are set up, some key statistics, and how to interpret parameter estimates.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

Not a Member? Join!

About the Instructor

Karen Grace-Martin helps statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.

She has guided and trained researchers through their statistical analysis for over 15 years as a statistical consultant at Cornell University and through The Analysis Factor. She has master’s degrees in both applied statistics and social psychology and is an expert in SPSS and SAS.

Not a Member Yet?

It’s never too early to set yourself up for successful analysis with support and training from expert statisticians. Just head over and sign up for Statistically Speaking. You'll get access to this training webinar and 85+ other stats trainings — plus the expert guidance you need to progress with live Q&A sessions and an ask-a-mentor forum.

Tagged With: Count data, count model, hurdle model, incidence rate ratio, log link, Negative Binomial Regression, Poisson Regression, regression coefficients, Zero Inflated

Related Posts

  • Poisson Regression Analysis for Count Data
  • The Importance of Including an Exposure Variable in Count Models
  • Count Models: Understanding the Log Link Function
  • When to Use Logistic Regression for Percentages and Counts

Member Training: Types of Regression Models and When to Use Them

by Karen Grace-Martin Leave a Comment

Linear, Logistic, Tobit, Cox, Poisson, Zero Inflated… The list of regression models goes on and on before you even get to things like ANCOVA or Linear Mixed Models.

In this webinar, we will explore types of regression models, how they differ, how they’re the same, and most importantly, when to use each one.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

Not a Member? Join!

About the Instructor

Karen Grace-Martin helps statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.

She has guided and trained researchers through their statistical analysis for over 15 years as a statistical consultant at Cornell University and through The Analysis Factor. She has master’s degrees in both applied statistics and social psychology and is an expert in SPSS and SAS.

Not a Member Yet?

It’s never too early to set yourself up for successful analysis with support and training from expert statisticians. Just head over and sign up for Statistically Speaking. You'll get access to this training webinar and 85+ other stats trainings — plus the expert guidance you need to progress with live Q&A sessions and an ask-a-mentor forum.

Tagged With: ancova, Cox Regression, linear mixed model, linear regression, logistic regression, Poisson Regression, Tobit Regression, Zero Inflated

Related Posts

  • Member Training: Using Excel to Graph Predicted Values from Regression Models
  • Member Training: Hierarchical Regressions
  • How to Combine Complicated Models with Tricky Effects
  • When Dependent Variables Are Not Fit for Linear Models, Now What?

A Few Resources on Zero-Inflated Poisson Models

by Karen Grace-Martin 4 Comments

1. For a general overview of modeling count variables, you can get free access to the video recording of one of my The Craft of Statistical Analysis Webinars:

Poisson and Negative Binomial for Count Outcomes

2. One of my favorite books on Categorical Data Analysis is:

Long, J. Scott. (1997).  Regression models for Categorical and Limited Dependent Variables.  Sage Publications.

It’s moderately technical, but written with social science researchers in mind.  It’s so well written, it’s worth it.  It has a section specifically about Zero Inflated Poisson and Zero Inflated Negative Binomial regression models.

3. Slightly less technical, but most useful only if you use Stata is Regression Models for Categorical Dependent Variables Using Stata, by J. Scott Long and Jeremy Freese.

4. UCLA’s ATS Statistical Software Consulting Group has some nice examples of Zero-Inflated Poisson and other models in various software packages.


Bookmark and Share

Tagged With: Count data, Negative Binomial Regression, Poisson Regression, Zero Inflated

Related Posts

  • Poisson Regression Analysis for Count Data
  • Regression Models for Count Data
  • The Importance of Including an Exposure Variable in Count Models
  • Count Models: Understanding the Log Link Function

Zero-Inflated Poisson Models for Count Outcomes

by Karen Grace-Martin 10 Comments

There are quite a few types of outcome variables that will never meet ordinary linear model’s assumption of normally distributed residuals.  A non-normal outcome variable can have normally distribued residuals, but it does need to be continuous, unbounded, and measured on an interval or ratio scale.   Categorical outcome variables clearly don’t fit this requirement, so it’s easy to see that an ordinary linear model is not appropriate.  Neither do count variables.  It’s less obvious, because they are measured on a ratio scale, so it’s easier to think of them as continuous, or close to it.  But they’re neither continuous or unbounded, and this really affects assumptions.

Continuous variables measure how much.  Count variables measure how many.  Count variables can’t be negative—0 is the lowest possible value, and they’re often skewed–so severly that 0 is by far the most common value.  And they’re discrete, not continuous.  All those jokes about the average family having 1.3 children have a ring of truth in this context.

Count variables often follow a Poisson or one of its related distributions.  The Poisson distribution assumes that each count is the result of the same Poisson process—a random process that says each counted event is independent and equally likely.  If this count variable is used as the outcome of a regression model, we can use Poisson regression to estimate how predictors affect the number of times the event occurred.

But the Poisson model has very strict assumptions.  One that is often violated is that the mean equals the variance.  When the variance is too large because there are many 0s as well as a few very high values, the negative binomial model is an extension that can handle the extra variance.

But sometimes it’s just a matter of having too many zeros than a Poisson would predict.  In this case, a better solution is often the Zero-Inflated Poisson (ZIP) model.  (And when extra variation occurs too, its close relative is the Zero-Inflated Negative Binomial model).

ZIP models assume that some zeros occurred by a Poisson process, but others were not even eligible to have the event occur.  So there are two processes at work—one that determines if the individual is even eligible for a non-zero response, and the other that determines the count of that response for eligible individuals.

The tricky part is either process can result in a 0 count.   Since you can’t tell which 0s were eligible for a non-zero count, you can’t tell which zeros were results of which process.  The ZIP model fits, simultaneously, two separate regression models.  One is a logistic or probit model that models the probability of being eligible for a non-zero count.  The other models the size of that count.

Both models use the same predictor variables, but estimate their coefficients separately.  So the predictors can have vastly different effects on the two processes.

But a ZIP model requires it be theoretically plausible that some individuals are ineligible for a count.  For example, consider a count of the number of disciplinary incidents in a day in a youth detention center.  True, there may be some youth who would never instigate an incident, but the unit of observation in this case is the center.  It is hard to imagine a situation in which a detention center would have no possibility of any incidents, even if they didn’t occur on some days.

Compare that to the number of alcoholic drinks consumed in a day, which could plausibly be fit with a ZIP model.  Some participants do drink alcohol, but will have consumed 0 that day, by chance.   But others just do not drink alcohol, so will never have a non-zero response.  The ZIP model can determine which predictors affect the probability of being an alcohol consumer and which predictors affect how many drinks the consumers consume.  They may not be the same predictors for the two models, or they could even have opposite effects on the two processes.


Bookmark and Share

Tagged With: Count data, Discrete Counts, Poisson Regression, Zero Inflated

Related Posts

  • A Few Resources on Zero-Inflated Poisson Models
  • Poisson Regression Analysis for Count Data
  • The Importance of Including an Exposure Variable in Count Models
  • The Problem with Linear Regression for Count Data

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

Primary Sidebar

Free Webinars

Effect Size Statistics on Tuesday, Feb 2nd

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.