Regression models

Member Training: Zero Inflated Models

June 1st, 2016 by
A common situation with count outcome variables is there are a lot of zero values.  The Poisson distribution used for modeling count variables takes into account that zeros are often the most common value, but sometimes there are even more zeros than the Poisson distribution can account for.

This can happen in continuous variables as well–most of the distribution follows a beautiful normal distribution, except for the big stack of zeros.

This webinar will explore two ways of modeling zero-inflated data: the Zero Inflated model and the Hurdle model. Both assume there are two different processes: one that affects the probability of a zero and one that affects the actual values, and both allow different sets of predictors for each process.

We’ll explore these models as well as some related models, like Zero-One Inflated Beta models for proportion data.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)


Incorporating Graphs in Regression Diagnostics with Stata

May 24th, 2016 by

Stage 2You put a lot of work into preparing and cleaning your data. Running the model is the moment of excitement.

You look at your tables and interpret the results. But first you remember that one or more variables had a few outliers. Did these outliers impact your results? (more…)


Linear Regression in Stata: Missing Data and the Stories they Might Tell

May 18th, 2016 by

Stage 2

In a previous post , Using the Same Sample for Different Models in Stata, we examined how to use the same sample when comparing regression models. Using different samples in our models could lead to erroneous conclusions when interpreting results.

But excluding observations can also result in inaccurate results.

The coefficient for the variable “frequent religious attendance” was negative 58 in model 3 (more…)


Issues with Truncated Data

May 12th, 2016 by

In a previous post we explored bounded variables and the difference between truncated and censored. Can we ignore the fact that a variable is bounded and just run our analysis as if the data wasn’t bounded? (more…)


Member Training: An Introduction to Kaplan-Meier Curves

March 29th, 2016 by

Survival data models provide interpretation of data representing the time until an event occurs. In many situations, the event is death, but it can also represent the time to other bad events such as cancer relapse or failure of a medical device. It can also be used to denote time to positive events such as pregnancy. Often patients are lost to follow-up prior to death, but you can still use the information about them while they were in your study to better estimate the survival probability over time.

This is done using the Kaplan-Meier curve, an approach developed by (more…)


Zero One Inflated Beta Models for Proportion Data

March 16th, 2016 by

Proportion and percentage data are tricky to analyze.

Much like count data, they look like they should work in a linear model.

They’re numeric. They’re often continuous.

And sometimes they do work. Some proportion data do look normally distributed so estimates and p-values are reasonable.

But more often they don’t. So estimates and p-values are a mess. Luckily, there are other options. (more…)