The Exposure Variable in Poisson Regression Models

by Karen

Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. A few examples of count variables include:

- Number of words an eighteen month old can say

- Number of aggressive incidents performed by patients in an impatient rehab center

Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.

Using these regression models gives much more accurate parameter estimates than trying to fit an ordinary linear regression model, whose assumptions rarely fit count data, such as normal residuals and constant variance.

But how do the Poisson models handle rates?  A rate is just a count per unit time.

The first example would not need a rate., but the second probably will.  If all patients are in the center the same number of days, a rate is unnecessary.  But if there is variation in the number of days each patient is present, attendance itself could affect the count.  A count of 10 incidents out of 180 days is much smaller than a count of 10 out of 15.

Poisson models handle exposure variables by using simple algebra to change the dependent variable from a rate into a count.

If the rate is count/exposure, multiplying both sides of the equation by exposure moves it to the right side of the equation.  When both sides of the equation are then logged, the final model contains ln(exposure) as a term that is added to the regression coefficients. This logged variable, ln(exposure), is called the offset variable.

Most statistical software will require you to create  the logged variable and define it as the offset variable.  Only Stata allows you to define either the exposure or the offset variable.

One important feature of an offset variable is that it is required to have a coefficient of 1.  This is because it is part of the rate.  The coefficient of 1 allows you to theoretically move it back to the left side of the equation to turn your count back into a rate.

What this means theoretically is that by defining an offset variable, you are only adjusting for the amount of opportunity an event has.  The assumption here is that, for example, every day in rehab makes a patient equally likely to have an aggressive incident.  Each day is simply an opportunity for an incident.  A patient in for 20 days is twice as likely to have an incident as a patient in for 10 days.

There is an assumption that the likelihood of events is not changing over time.  If, for example, it takes patients a few weeks to learn the consequences of aggressive behavior, then stop or lessen their rates, then time is not just a matter of exposure.  Likewise, if patients start becoming more agitated after being in a program after a few months, so that the longer residence time is actually creating more aggression, then time is not just a matter of exposure.  In either of these cases, number of days in a program would serve better as a predictor than as an exposure variable.  As a predictor, the coefficient will be estimated from the data, not set to 1.

This logic can extend to any regression model that has a ratio as a dependent variable.  Make sure that you understand the implication that the denominator of that ratio is not affecting the numerator beyond opportunity.

{ 5 comments… read them below or add one }

Tarak February 6, 2012 at 12:40 pm

Thanks so much! After reading this, I better undertand the poisson models that I am running on my data (comparing mortality rates during exposure time intervals vs. non-exposuer time intervals). In medical school, we don’t recieve great training on interpreting the stats behind the research we consume. If we don’t understand the basic assumptions underlying the stats, I don’t know that we can properly interpret anything we read. This is a great contribution!

Reply

Karen February 10, 2012 at 6:09 pm

Thanks, Tarak. Glad I could help!

Karen

Reply

Bryan March 31, 2012 at 4:41 pm

Let me start by saying that I think your site is fantastic. When using an offset it appears that it is only applied to the dependent variable, correct? I am curious if the same offset variable used to convert the dependent variable to a rate is also used to scale the independent variables in the model as well or if this must be done in another way. Thanks.

Reply

Karen April 2, 2012 at 10:06 am

Hi Bryan,

First, thanks for the kind words.

Yes. The offset is really only offsetting the DV. If the IVs are also rates, you’d have to express them in terms of those rates.

Best,
Karen

Reply

Bryan April 6, 2012 at 4:15 pm

Thanks for your answer.

I can express them as rates by dividing the numbers recorded (for example kg of fruit) by the area sampled for each and then taking the log of each and use them as covariates but this still doesn’t really reflect the different area sampled for each and how much each subject (fruit transects with different lengths) ought to contribute overall. Do you know of a way to do this in SPSS? Using the GEE scale weight is not an option because that is also only for the dependent variable. Standard ‘Weight cases” also wouldn’t work because that would weight all the variables and cause problems with the offset.

Thanks again for your help, I am really glad that I came across your site and I plan to use you when I need consultation help in the future.

Bryan

Reply

Leave a Comment

{ 1 trackback }

Previous post:

Next post: