Statistical Models for Truncated and Censored Data

by Jeff Meyer

As mentioned in a previous post, there is a significant difference between truncated and censored data.

Truncated data eliminates observations from an analysis based on a maximum and/or minimum value for a variable.

Censored data has limits on the maximum and/or minimum value for a variable but includes all observations in the analysis.

As a result, the models for analysis of these data are different.

Models to consider with censored data:

For censored data the correct model to use is the tobit regression.

The economist John Tobin created this model, which was originally known as the “Tobin probit” model. It combines components of the binomial probit model and an OLS regression model.

A potential drawback of the Tobit model is you have to use the same variables for both the probit component and the regression component.

Fortunately James Heckman created a model that takes into account the selection bias noted previously and allows the use of different variables in the two step model created by Tobin.

The command in Stata is heckman, the SAS code is PROC QLIM and specify HECKIT. The model can also be run in R but not in SPSS.

Models to consider with truncated data:

For continuous data where you want to use a subset of the data based on a lower or upper boundary, a truncated regression model should be used.

In a truncated regression model you are running the analysis using the full data set but telling the model at what value(s) to truncate. The reported sample size used in the model will be the truncated group. But the results can be used to make inferences about the population.

The command in Stata, R, and SAS is truncreg. For SPSS one needs to attain the Essentials for R package.

To model zero-truncated count data the procedure requires several steps to determine which probability distribution function (pdf) fits the data best.

Some of the choices for the optimal pdf are Poisson, Poisson-Gamma Mixture, Poisson-Inverse Gaussian Mixture, Generalized Poisson, negative binomial, and three-paramenter negative binomial (Famoye).

Stata’s command is trncregress, SAS uses PROC NLMIXED and R uses VGAM.

Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here.

Models to consider with censored data:

Models to consider with truncated data:

Reader Interactions

Comments

Leave a Reply Cancel reply