Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. A few examples of count variables include:
– Number of words an eighteen month old can say
– Number of aggressive incidents performed by patients in an impatient rehab center
Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.
Using these regression models gives much more accurate parameter estimates than trying to fit an ordinary linear regression model whose assumptions rarely fit count data such as normal residuals and constant variance.
But how do the Poisson models handle rates? A rate is just a count per unit time.
The first example would not need a rate, but the second probably will. If all patients are in the center the same number of days, a rate is unnecessary. But if there is variation in the number of days each patient is present, attendance itself could affect the count. A count of 10 incidents out of 180 days is much smaller than a count of 10 out of 15.
Poisson models handle exposure variables by using simple algebra to change the dependent variable from a rate into a count.
If the rate is count/exposure, multiplying both sides of the equation by exposure moves it to the right side of the equation. When both sides of the equation are then logged, the final model contains ln(exposure) as a term that is added to the regression coefficients. This logged variable, ln(exposure), is called the offset variable.
Most statistical software will require you to create the logged variable and define it as the offset variable. Only Stata allows you to define either the exposure or the offset variable.
One important feature of an offset variable is that it is required to have a coefficient of 1. This is because it is part of the rate. The coefficient of 1 allows you to theoretically move it back to the left side of the equation to turn your count back into a rate.
What this means theoretically is that by defining an offset variable, you are only adjusting for the amount of opportunity an event has. The assumption here is that, for example, every day in rehab makes a patient equally likely to have an aggressive incident. Each day is simply an opportunity for an incident. A patient in for 20 days is twice as likely to have an incident as a patient in for 10 days.
There is an assumption that the likelihood of events is not changing over time. If, for example, it takes patients a few weeks to learn the consequences of aggressive behavior, then stop or lessen their rates, then time is not just a matter of exposure. Likewise, if patients start becoming more agitated after being in a program after a few months so that the longer residence time is actually creating more aggression, then time is not just a matter of exposure. In either of these cases, number of days in a program would serve better as a predictor than as an exposure variable. As a predictor, the coefficient will be estimated from the data, not set to 1.
This logic can extend to any regression model that has a ratio as a dependent variable. Make sure that you understand the implication that the denominator of that ratio is not affecting the numerator beyond opportunity.