• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • our programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • statistical resources
  • blog
  • about
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • contact
  • login

Discrete Counts

When Linear Models Don’t Fit Your Data, Now What?

by Karen Grace-Martin  32 Comments

When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, linear models don’t fit. The data just will not meet the assumptions of linear models. But there’s good news, other models exist for many types of dependent variables.

Today I’m going to go into more detail about 6 common types of dependent variables that are either discrete, bounded, or measured on a nominal or ordinal scale and the tests that work for them instead. Some are all of these.

[Read more…] about When Linear Models Don’t Fit Your Data, Now What?

Tagged With: binary variable, categorical variable, Censored, dependent variable, Discrete Counts, Multinomial, ordinal variable, Poisson Regression, Proportion, Proportional Odds Model, regression models, Truncated, Zero Inflated

Related Posts

  • 6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption
  • Member Training: Types of Regression Models and When to Use Them
  • When to Check Model Assumptions
  • Proportions as Dependent Variable in Regression–Which Type of Model?

Should I Specify a Model Predictor as Categorical or Continuous?

by Karen Grace-Martin  Leave a Comment

Predictor variables in statistical models can be treated as either continuous or categorical.

Usually, this is a very straightforward decision.

Categorical predictors, like treatment group, marital status, or highest educational degree should be specified as categorical.

Likewise, continuous predictors, like age, systolic blood pressure, or percentage of ground cover should be specified as continuous.

But there are numerical predictors that aren’t continuous. And these can sometimes make sense to treat as continuous and sometimes make sense as categorical.

[Read more…] about Should I Specify a Model Predictor as Categorical or Continuous?

Tagged With: categorical predictor, continuous predictor, Discrete Counts, Linear Regression Model, Model Building, numeric variable, predictor variable

Related Posts

  • Recoding a Variable from a Survey Question to Use in a Statistical Model
  • Interpreting Regression Coefficients
  • Overfitting in Regression Models
  • Centering a Covariate to Improve Interpretability

Member Training: Logistic Regression for Count and Proportion Data

by Karen Grace-Martin  Leave a Comment

Most of us know that binary logistic regression is appropriate when the outcome variable has two possible outcomes: success and failure.

There are two more situations that are also appropriate for binary logistic regression, but they don’t always look like they should be.

[Read more…] about Member Training: Logistic Regression for Count and Proportion Data

Tagged With: Bernoulli, binomial, Discrete Counts, logistic regression, normal distribution, outcome variable, poisson

Related Posts

  • Member Training: Making Sense of Statistical Distributions
  • Member Training: Explaining Logistic Regression Results to Non-Researchers
  • Member Training: Types of Regression Models and When to Use Them
  • Member Training: Multinomial Logistic Regression

Zero-Inflated Poisson Models for Count Outcomes

by Karen Grace-Martin  10 Comments

There are quite a few types of outcome variables that will never meet ordinary linear model’s assumption of normally distributed residuals.  A non-normal outcome variable can have normally distribued residuals, but it does need to be continuous, unbounded, and measured on an interval or ratio scale.   Categorical outcome variables clearly don’t fit this requirement, so it’s easy to see that an ordinary linear model is not appropriate.  Neither do count variables.  It’s less obvious, because they are measured on a ratio scale, so it’s easier to think of them as continuous, or close to it.  But they’re neither continuous or unbounded, and this really affects assumptions.

Continuous variables measure how much.  Count variables measure how many.  Count variables can’t be negative—0 is the lowest possible value, and they’re often skewed–so severly that 0 is by far the most common value.  And they’re discrete, not continuous.  All those jokes about the average family having 1.3 children have a ring of truth in this context.

Count variables often follow a Poisson or one of its related distributions.  The Poisson distribution assumes that each count is the result of the same Poisson process—a random process that says each counted event is independent and equally likely.  If this count variable is used as the outcome of a regression model, we can use Poisson regression to estimate how predictors affect the number of times the event occurred.

But the Poisson model has very strict assumptions.  One that is often violated is that the mean equals the variance.  When the variance is too large because there are many 0s as well as a few very high values, the negative binomial model is an extension that can handle the extra variance.

But sometimes it’s just a matter of having too many zeros than a Poisson would predict.  In this case, a better solution is often the Zero-Inflated Poisson (ZIP) model.  (And when extra variation occurs too, its close relative is the Zero-Inflated Negative Binomial model).

ZIP models assume that some zeros occurred by a Poisson process, but others were not even eligible to have the event occur.  So there are two processes at work—one that determines if the individual is even eligible for a non-zero response, and the other that determines the count of that response for eligible individuals.

The tricky part is either process can result in a 0 count.   Since you can’t tell which 0s were eligible for a non-zero count, you can’t tell which zeros were results of which process.  The ZIP model fits, simultaneously, two separate regression models.  One is a logistic or probit model that models the probability of being eligible for a non-zero count.  The other models the size of that count.

Both models use the same predictor variables, but estimate their coefficients separately.  So the predictors can have vastly different effects on the two processes.

But a ZIP model requires it be theoretically plausible that some individuals are ineligible for a count.  For example, consider a count of the number of disciplinary incidents in a day in a youth detention center.  True, there may be some youth who would never instigate an incident, but the unit of observation in this case is the center.  It is hard to imagine a situation in which a detention center would have no possibility of any incidents, even if they didn’t occur on some days.

Compare that to the number of alcoholic drinks consumed in a day, which could plausibly be fit with a ZIP model.  Some participants do drink alcohol, but will have consumed 0 that day, by chance.   But others just do not drink alcohol, so will never have a non-zero response.  The ZIP model can determine which predictors affect the probability of being an alcohol consumer and which predictors affect how many drinks the consumers consume.  They may not be the same predictors for the two models, or they could even have opposite effects on the two processes.


Bookmark and Share

Tagged With: Count data, Discrete Counts, Poisson Regression, Zero Inflated

Related Posts

  • A Few Resources on Zero-Inflated Poisson Models
  • Poisson Regression Analysis for Count Data
  • When Linear Models Don’t Fit Your Data, Now What?
  • The Importance of Including an Exposure Variable in Count Models

6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

by Karen Grace-Martin  9 Comments

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures.  That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.

But you need to check the assumptions anyway, because some departures are so far that the p-value become inaccurate.  And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.

But sometimes you can’t.

Sometimes it’s because the dependent variable just isn’t appropriate for a linear model.  The [Read more…] about 6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

Tagged With: Assumptions, categorical outcome, categorical variable, Censored, Constant Variance, dependent variable, Discrete Counts, normality, ordinal variable, Proportion, Truncated, Zero Inflated

Related Posts

  • When Linear Models Don’t Fit Your Data, Now What?
  • When to Check Model Assumptions
  • Statistical Models for Truncated and Censored Data
  • Member Training: Types of Regression Models and When to Use Them

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: The Link Between ANOVA and Regression

Upcoming Workshops

    No Events

Upcoming Free Webinars

TBA

Quick links

Our Programs Statistical Resources Blog/News About Contact Log in

Contact

Upcoming

Free Webinars Membership Trainings Workshops

Privacy Policy

Search

Copyright © 2008–2023 The Analysis Factor, LLC.
All rights reserved.

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT