• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Zero One Inflated Beta Models for Proportion Data

by Karen Grace-Martin 6 Comments

Proportion and percentage data are tricky to analyze.

Much like count data, they look like they should work in a linear model.

They’re numerical.  They’re often continuous.

And sometimes they do work.  Some proportion data do look normally distributed so estimates and p-values are reasonable.

But more often they don’t. So estimates and p-values are a mess.  Luckily, there are other options.  One is beta regression.

Beta Regression

Like logistic and Poisson regression, beta regression is a type of generalized linear model.

It works nicely for proportion data because the values of a variable with a beta distribution must fall between 0 and 1.

It’s a bit of a funky distribution in that it’s shape can change a lot depending on the values of the mean and dispersion parameters.

Here are a few examples of the possible shapes of a beta distribution, with different means and variances:

You can see that in some, the distribution looks quite normal.  It that situation, you would get reasonable estimates and p-values if you assumed normality.

But here is just the kind of sticky situation you commonly see in real data.  Let’s say you want to compare the mean proportion of days out of 30 that people do some behavior–take their prescribed medication, exercise for at least 30 minutes, or act physically aggressively toward peers.

Maybe you’ve got some intervention that you want to test will help people take their medications.  Perhaps the control group indeed looks like the nice normal distribution in the third graph above.

But the treatment worked so well that in the intervention group, the distribution is highly skewed.  It looks like the last graph.

Assuming normality isn’t going to work here.  That’s where a beta regression can work instead.

One big problem.

0 and 1 aren’t possible values in a beta distribution.  So if Y|X follows a beta distribution, Y can have values close to 0 and 1, say .001 or .998.  But not 0 or 1 exactly.

So if a client takes their medication 30 out of 30 days, a beta regression won’t run.  You can’t have any 0s or 1s in the data set.

Zero-One Inflated Beta Models

There is, however, a version of beta regression model that can work in this situation.  It’s one of those models that has been around in theory for a while, but is only in the past few years become available in (some) mainstream statistical software.

It’s called a Zero-One-Inflated Beta and it works very much like a Zero-Inflated Poisson model.

It’s a type of mixture model that says there are really three processes going on.

One is a process that distinguishes between zeros and non-zeros. The idea is there is something qualitatively different about people who never take their medication than those who do, at least sometimes.

Likewise, there is a process that distinguishes between ones and non-ones.  Again, there is something qualitatively different about people who always take their medication than those who do sometimes or never.

And then there is a third process that determines how much someone takes their medication if they do some of the time.

The first and second processes are run through a logistic regression and the third through a beta regression.

These three models are run simultaneously.  They can each have their own set of predictors and their own set of coefficients.  For example, maybe memory is a big predictor of how often someone takes their medication if they take it sometimes, but not at all an issue for whether or not someone takes it 0 times.  Perhaps those people aren’t forgetting–they can’t afford to purchase it.

So maybe whether someone has health insurance that pays for the medication is a predictor in the zero/non-zero logistic regression, but not in the other two parts.

Depending on the shape of the distribution, you may not need all three processes.  If there are no zeros in the data set, you may only need to accommodate inflation at 1.

It’s highly flexible and adds important options to your data analysis toolbox.

Tagged With: beta regression, generalized linear models, mixture model, percentage data, proportion data, zero inflated poisson, zero-one-inflated beta

Related Posts

  • Count Models: Understanding the Log Link Function
  • Proportions as Dependent Variable in Regression–Which Type of Model?
  • What are Sums of Squares?
  • December Member Training: Missing Data

Reader Interactions

Comments

  1. Ceres says

    January 6, 2020 at 7:54 pm

    Thanks for the great summary on zero-and-one-inflated beta models. I was wondering how one uses these (hurdle) models for prediction, incorporating both the probabilities of having a 0 or 1 value (from the Poisson components) and the predicted values for (0,1) from the Beta component. many thanks

    Reply
  2. Antonella says

    December 27, 2017 at 11:18 am

    I would like to know what R packages exist to run Zero-One Inflated Beta Models. Do they estimate parameters via maximum likelihood?
    Thank you very much in advance.

    Reply
    • Carl says

      March 4, 2018 at 7:11 pm

      Zoib is an R pkg that handles Zero-One.

      Reply
    • Jeff Girard says

      April 5, 2019 at 3:23 pm

      I use the brms package to estimate Bayesian Zero-One Inflated Beta models. You specify this by adding “family = zero_one_inflated_beta” to your call.
      https://cran.r-project.org/web/packages/brms/vignettes/brms_families.html

      Reply
      • Hank Stevens says

        July 17, 2019 at 10:45 am

        Thanks Jeff! I am headed to brms now.

        Reply
      • Rachael says

        April 7, 2020 at 7:44 pm

        Hi Jeff,

        Do you know any helpful documents for someone completely unfamiliar with Bayesian and brms? I’ve tried following along with the package documentation and continue to get lost.

        Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

Free Webinars

Effect Size Statistics on Tuesday, Feb 2nd

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.