Statistical Models for Truncated and Censored Data

by Jeff Meyer

by Jeff Meyer

As mentioned in a previous post, there is a significant difference between truncated and censored data.

Truncated data eliminates observations from an analysis based on a maximum and/or minimum value for a variable.

Censored data has limits on the maximum and/or minimum value for a variable but includes all observations in the analysis.

As a result, the models for analysis of these data are different.

Models to consider with censored data:

For censored data the correct model to use is the tobit regression.

The economist John Tobin created this model, which was originally known as the “Tobin probit” model.  It combines components of the binomial probit model and an OLS regression model.

A potential drawback of the Tobit model is you have to use the same variables for both the probit component and the regression component.

Fortunately James Heckman created a model that takes into account the selection bias noted previously and allows the use of different variables in the two step model created by Tobin.

The command in Stata is heckman, the SAS code is PROC QLIM and specify HECKIT. The model can also be run in R but not in SPSS.

Models to consider with truncated data:

For continuous data where you want to use a subset of the data based on a lower or upper boundary, a truncated regression model should be used.

In a truncated regression model you are running the analysis using the full data set but telling the model at what value(s) to truncate. The reported sample size used in the model will be the truncated group. But the results can be used to make inferences about the population.

The command in Stata, R, and SAS is truncreg. For SPSS one needs to attain the Essentials for R package.

To model zero-truncated count data the procedure requires several steps to determine which probability distribution function (pdf)  fits the data best.

Some of the choices for the optimal pdf are Poisson, Poisson-Gamma Mixture, Poisson-Inverse Gaussian Mixture, Generalized Poisson, negative binomial, and three-paramenter negative binomial (Famoye).

Stata’s command is trncregress, SAS uses PROC NLMIXED and R uses VGAM.

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to answers and more resources 24/7.

Previous post:

Next post: