by Jeff Meyer
As mentioned in a previous post, there is a significant difference between truncated and censored data.
Truncated data eliminates observations from an analysis based on a maximum and/or minimum value for a variable.
Censored data has limits on the maximum and/or minimum value for a variable but includes all observations in the analysis.
As a result, the models for analysis of these data are different. (more…)
Linear, Logistic, Tobit, Cox, Poisson, Zero Inflated… The list of regression models goes on and on before you even get to things like ANCOVA or Linear Mixed Models.
In this webinar, we will explore types of regression models, how they differ, how they’re the same, and most importantly, when to use each one.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
About the Instructor
Karen Grace-Martin helps statistics practitioners gain an intuitive understanding of how statistics is applied to real data in research studies.
She has guided and trained researchers through their statistical analysis for over 15 years as a statistical consultant at Cornell University and through The Analysis Factor. She has master’s degrees in both applied statistics and social psychology and is an expert in SPSS and SAS.
Not a Member Yet?
It’s never too early to set yourself up for successful analysis with support and training from expert statisticians.
Just head over and sign up for Statistically Speaking.
You'll get access to this training webinar, 100+ other stats trainings, a pathway to work through the trainings that you need — plus the expert guidance you need to build statistical skill with live Q&A sessions and an ask-a-mentor forum.
When the dependent variable in a regression model is a proportion or a percentage, it can be tricky to decide on the appropriate way to model it.
The big problem with ordinary linear regression is that the model can predict values that aren’t possible–values below 0 or above 1. But the other problem is that the relationship isn’t linear–it’s sigmoidal. A sigmoidal curve looks like a flattened S–linear in the middle, but flattened on the ends. So now what?
The simplest approach is to do a linear regression anyway. This approach can be justified only in a few situations.
1. All your data fall in the middle, linear section of the curve. This generally translates to all your data being between .2 and .8 (although I’ve heard that between .3-.7 is better). If this holds, you don’t have to worry about the two objections. You do have a linear relationship, and you won’t get predicted values much beyond those values–certainly not beyond 0 or 1.
2. It is a really complicated model that would be much harder to model another way. If you can assume a linear model, it will be much easier to do, say, a complicated mixed model or a structural equation model. If it’s just a single multiple regression, however, you should look into one of the other methods.
A second approach is to treat the proportion as a binary response then run a logistic or probit regression. This will only work if the proportion can be thought of and you have the data for the number of successes and the total number of trials. For example, the proportion of land area covered with a certain species of plant would be hard to think of this way, but the proportion of correct answers on a 20-answer assessment would.
The third approach is to treat it the proportion as a censored continuous variable. The censoring means that you don’t have information below 0 or above 1. For example, perhaps the plant would spread even more if it hadn’t run out of land. If you take this approach, you would run the model as a two-limit tobit model (Long, 1997). This approach works best if there isn’t an excessive amount of censoring (values of 0 and 1).
Reference: Long, J.S. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage Publishing.