• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

The Exposure Variable in Poisson Regression Models

by Karen Grace-Martin 42 Comments

Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. A few examples of count variables include:

– Number of words an eighteen month old can say

– Number of aggressive incidents performed by patients in an impatient rehab center

Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.

Using these regression models gives much more accurate parameter estimates than trying to fit an ordinary linear regression model whose assumptions rarely fit count data such as normal residuals and constant variance.

But how do the Poisson models handle rates?  A rate is just a count per unit time.

The first example would not need a rate, but the second probably will.  If all patients are in the center the same number of days, a rate is unnecessary.  But if there is variation in the number of days each patient is present, attendance itself could affect the count.  A count of 10 incidents out of 180 days is much smaller than a count of 10 out of 15.

Poisson models handle exposure variables by using simple algebra to change the dependent variable from a rate into a count.

If the rate is count/exposure, multiplying both sides of the equation by exposure moves it to the right side of the equation.  When both sides of the equation are then logged, the final model contains ln(exposure) as a term that is added to the regression coefficients. This logged variable, ln(exposure), is called the offset variable.

Most statistical software will require you to create  the logged variable and define it as the offset variable.  Only Stata allows you to define either the exposure or the offset variable.

One important feature of an offset variable is that it is required to have a coefficient of 1.  This is because it is part of the rate.  The coefficient of 1 allows you to theoretically move it back to the left side of the equation to turn your count back into a rate.

What this means theoretically is that by defining an offset variable, you are only adjusting for the amount of opportunity an event has.  The assumption here is that, for example, every day in rehab makes a patient equally likely to have an aggressive incident.  Each day is simply an opportunity for an incident.  A patient in for 20 days is twice as likely to have an incident as a patient in for 10 days.

There is an assumption that the likelihood of events is not changing over time.  If, for example, it takes patients a few weeks to learn the consequences of aggressive behavior, then stop or lessen their rates, then time is not just a matter of exposure.  Likewise, if patients start becoming more agitated after being in a program after a few months so that the longer residence time is actually creating more aggression, then time is not just a matter of exposure.  In either of these cases, number of days in a program would serve better as a predictor than as an exposure variable.  As a predictor, the coefficient will be estimated from the data, not set to 1.

This logic can extend to any regression model that has a ratio as a dependent variable.  Make sure that you understand the implication that the denominator of that ratio is not affecting the numerator beyond opportunity.

Poisson and Negative Binomial Regression for Count Data
Learn when you need to use Poisson or Negative Binomial Regression in your analysis, how to interpret the results, and how they differ from similar models.

Tagged With: Count data, exposure variable, offset variable, Poisson Regression, Rates

Related Posts

  • The Importance of Including an Exposure Variable in Count Models
  • Regression Models for Count Data
  • The Problem with Linear Regression for Count Data
  • A Few Resources on Zero-Inflated Poisson Models

Reader Interactions

Comments

  1. Max says

    November 29, 2020 at 3:13 am

    Hi:

    I’ve been told I cant use negative binomial regression model when there’s an offset variable, is this true? or is this true under some specific conditions?

    Thanks
    Max

    Reply
  2. Ned Levine says

    September 17, 2020 at 11:28 am

    CrimeStat IV allows a user to define an exposure/offset variable in a Poisson-based model without having to transform it to a log form (like STATA). The program estimates the function with Markov Chain Monte Carlo (MCMC) rather than with Maximum Likelihood. The dispersion can be modeled as either Gamma (negative binomial) or Lognormal. A spatial autocorrelation term can be modeled as either a Conditional autoregressive (CAR) or Simultaneous autoregressive (SAR). The program and documentation can be found at:

    http://www.nij.gov/CrimeStat

    Reply
  3. Mario Gallego says

    April 11, 2020 at 5:36 am

    Dear Karen,

    First, thanks for the useful explanation.
    My question yields in the fact of always making the logarithmic transformation of the offset variable

    In my case, the variable is already normally-distributed

    Best,
    Mario

    Reply
    • Karen Grace-Martin says

      April 17, 2020 at 2:42 pm

      Hi Mario,

      Always double-check your software manual, but generally the offset is already logged. The exposure isn’t.

      Reply
  4. Noriko says

    January 13, 2020 at 4:38 pm

    Hi Karen. Thanks for the wonderful information. It makes more sense to me now why I have to include offset or exp option in Stata when I am applying mepoissson.

    My question is simple. If we have multiple exposures, can we include them all in this option?

    Thank you.

    Reply
  5. Pat Taggart says

    December 1, 2019 at 7:29 pm

    Hi Karen,

    can a model offset be used to account for differences in sampling intensity/exposure in the response variable and the predictor variables? Or can it only be used to account for differences in sampling intensity in the response variable?

    Lets think of a hypothetical situation, if I modelled the count of animals positive to virus #1 then I can include an offset to account for the number of animals sampled during each observation. What if I also included virus #2 as a predictor of virus #1. Will the offset for the number of animals sampled during each observation also account for differences in sampling intensity/exposure in the predictor variables?

    Thanks,
    Pat

    Reply
    • Karen Grace-Martin says

      December 2, 2019 at 10:53 am

      Hi Pat,

      I think it only works for the outcome because of the way the equation is set up.

      Reply
  6. Natalie says

    September 2, 2019 at 8:24 pm

    Thank you very much for your helpful article. I was just wondering if for the patient example, would the offset be average outbursts per exposure length? Or would it be constructed differently? Just wanting to get an idea of what the actual offset would look like/how it would be made.

    Thank you for your time and help!
    Natalie

    Reply
  7. Javier says

    April 19, 2019 at 10:53 pm

    First of all…THANK YOU VERY MUCH for your magnificent work!

    Next, it is my question:
    Some data sets have a variable as a count, but also as “rates” per 100000 people (patients, residents, etc.). Because it is more reasonable to compare rates when there are significant differences in the population (for instance among countries, counties, cities, etc), Which is the correct way to formulate the model when the dependent (response) variable is, for instance, “Murders per 100000 residents”?

    I hope you can guide me!…

    Thanks again and all the best!

    Reply
  8. andres r says

    April 3, 2019 at 4:28 pm

    Hi! Great post, thanks a lot!! Here is my question: I am studying the rate of specific behaviors displayed by judges observed in a sample of case hearings (the count vas goes from 0 behaviors to 3 behaviors). When running GOF using estat gof, or using glm, both Pearson and Deviation parameters are well below the threshold (yay, my models fit the data). However, when adding an exposure variable (duration of the hearings), the Pearson test and the Deviation test show different and conflicting results (reject Ho for Pearson, fail to reject Ho for Deviation). What should I do? Part of the problem may be that my sample size is small (380 hearings) so adding the exposure parameter may be creating instability for the Pearson test. Can I still move forward and calculate expected values for the count outcome, count distributions, etc.? Thanks in advance!!

    Reply
  9. Helen Li says

    June 28, 2018 at 3:16 pm

    HI Karen,

    I want to assess the association between EmergencyVisitDueToInfection vs. Temperature. My question is how I can control age and gender in the model, please?

    For example, the table 1 is per row per patient. I created table 2 which contains Infection-Count per date before feeding the Poisson model. However, I have difficulties to create a gender or age variable in table 2 to feed the Poisson Model because it is per row per date, and there are lots of patient each date.

    Table 1:
    ID Date Age Gender Infection (1=Yes, 0=No)
    1 2018-01-01 20 M 1
    2 2018-01-01 30 F 0
    3 2018-01-02 40 F 1
    4 2018-01-02 50 F 1
    ……

    Table 2:
    Date Count of Infection
    2018-01-01 1
    2018-01-02 2
    ……

    Many thanks,
    Helen

    Reply
  10. Pindi says

    March 27, 2017 at 9:48 am

    Hi Karen,
    I’m unclear how to use negative binomial regression for my situation. My dependent variable is vaccine exemptions (count), but I also have rates. My independent variables are: school type, geographic location, free and reduced school lunch rates, and I’m trying to analyze the difference in exemption rates from 2014 to 2015. I’m not sure what the counts really relay when I’m more interested in the rates. Also, how do I get adjusted vs. unadjusted IRR using SPSS?

    Reply
  11. Putra says

    June 14, 2016 at 4:24 am

    Very nice information, thank you very much
    but how to interprete the result?

    Reply
  12. Jon says

    August 28, 2015 at 7:01 am

    Hi Karen,

    Great post, thank you for taking the time to put it up!

    I am working on investigating trends in incidence rates over roughly 20 years. Besides joinpoint, do you have a recommendation on how to do this? For example, would you recommend using splines in poisson regression?

    Thank you for your help!

    Reply
  13. John says

    July 30, 2015 at 2:36 am

    Hello,

    I just have a small question whether or not prevalence is a count variable and poisson regression can teherefore be conducted for a set of independent variables (age, country, population size). Could you please help with your knowledge?

    Thanks in advance.

    Reply
  14. Joana Correia says

    June 18, 2015 at 2:28 pm

    Hi Karen,

    This was very useful, but I couldn’t help get a bit confused. So, only if I have a ratio as dependent variable should I use an exposure variable?

    In my case, I would like to know how incidence of a disease variates in different countries. I have incidence as a count: I have total dignosis over several years (sum of yearly diagnosis). Of course, population is affecting total diagnosis. Should I use the population to transform incidence into a rate, or should I use population as an exposure variable, although my dependent variable is a count already?

    Thank you! All the best

    Reply
    • Karen says

      June 23, 2015 at 12:19 pm

      Hi Joana,

      You can’t use the rate as the DV–it has to be a count. So use the count as the DV and population as the exposure variable.

      Reply
  15. Kalle says

    May 20, 2015 at 11:07 am

    Hi Karen,

    I found this very helpful, thank you !

    I’m just picking up on Anne’s second question above.

    Would it be possible to include a version of time both as an offset exposure variable (to control for the pure time effect, more time = more incidents, as you described above) and as an IV (let’s say as a dummy for for people that stay longer than 10 days) in order to see how i.e. the rates of agressive behaviour is affected by time (outside the pure exposure mechanic)?

    Thanks,
    Kalle

    Reply
  16. abdulaziz says

    February 20, 2015 at 3:56 am

    thank you very much, now i understand the offset on count model

    Reply
  17. Peter Goff says

    July 29, 2014 at 7:19 pm

    Hi Karen,

    Here’s a hopefully quick question: What are the implications (perhaps anticipated types of bias) that are expected from using OLS rather than a poisson model when the dependent variable is a count variable? Is there a rule of thumb as to when it matters and when it doesn’t? Thanks!

    Reply
  18. Andrea says

    June 22, 2014 at 4:44 pm

    Hi Karen,

    Quick question. I am running a poisson regression with an exposure variable. In Stata the syntax is pretty straightforward: poisson y x1, exposure(z), where y is my count var, x1 is my independent var, and z is my exposure var. In this context, do I interpret the coefficient on x1 (ie beta1) as the effect of x1 on the count y or instead as the effect of x1 on the rate y/z. Thanks and sorry for the simple question!!

    Reply
  19. Amir says

    April 25, 2014 at 7:51 pm

    Hi Karen,

    First of all, I would like to thank you for the great article here and hosting this conversation room!

    I was wondering if you have seen any papers/text books, that discuss exposure in a setting similar to what I describe below:

    Let’s assume we are interested in the number of kids in a class who develop a specific type of disorder. Our explanatory variables are the number of kids with certain ages, the number of female, and male kids, and some other explanatory variables that are counts of kids in different cohorts:
    E(Y|X1,X2,…,Xn) = f (X1,X2,…,Xn)
    Y = # of kids with the disorder
    X1 = # of less-than-5-year-old kids
    X2 = # of higher-than-5-year-old kids
    X3 = # of females
    X4 = # of males
    and so on.
    Exposure = X = total number of kids

    As you see, all of variables are counts, as well as the exposure.

    Reply
  20. Daniel says

    February 18, 2014 at 5:12 am

    Thank you for the great web site. I read the articles “The Exposure Variable in Poisson Regression Models” and “Poisson Regression Analysis for Count Data” and have a follow-up question.

    It would be a big help if you could give a practical example (by hand) how Poisson regression is used to calculate a time trend line, and calculate a confidence interval for whether there is a trend. Here is some sample data if you would like (Texas viral hepatitis deaths):

    Year, Events, Population, Rate per 100000
    —————————-
    1990, 108, 16986510, 0.64
    1991, 154, 17349000, 0.89
    1992, 141, 17655650, 0.80
    1993, 212, 18031484, 1.18
    1994, 254, 18378185, 1.38
    1995, 283, 18723991, 1.51
    1996, 353, 19128261, 1.85
    1997, 383, 19439337, 1.97
    1998, 432, 19759614, 2.19
    —————————–

    I understand the algorithm for least squares slope, and how to analyze that slope for significance. But I want a trend method to take into account the variance of the data points. Obviously, if each data point is based on hundreds of events, the slope is more reliable than if each data point is based on just a few events. I have also seen chi-square suggested, know how to do chi-square, plan to give chi-square a try, but I saw a lot more references to Poisson regression for time trend analysis.

    I am familiar with the Poisson distribution itself, so that’s not the problem. But I can’t find a practical example of Poisson regression anywhere. The only way I can understand an algorithm is to do it by hand.

    Thanks,
    Daniel

    Reply
  21. ali says

    January 25, 2014 at 5:34 am

    hi .i want learn stata .you can help me .very NECESSARY
    thanks

    Reply
    • Karen says

      February 3, 2014 at 4:15 pm

      Hmmm, not very quickly. Unfortunately, we haven’t yet included Stata in our trainings. (We will at some point, though). I know that Michael Mitchell has an introductory book on using Stata. I would suggest starting there.

      Reply
  22. Grateful2U says

    August 5, 2013 at 9:15 am

    Hi Karen,

    First, I would like to say that since finding this great website I’ve returned to it an infinite number of times, and have recommended it to an infinite number of people. If I could, I would spend all day reading the material and viewing the webinars, but unfortunately my thesis is not going to write itself…

    I have a couple of what are probably silly questions, but I have found contradicting info on the matter, so just hoping you can settle this.

    1) Is there a certain sample size below which Poisson or Negative Binomial regressions are not recommended/credible? Is this in terms of, say, study subjects (e.g., 26 individuals are not enough, but 78 are much better) or in terms of subjects x the observations of their count data (e.g., each of the 26 subjects has 10 observations, so 260 observations overall is ok)?

    2) Is it true that when using exposure as an offset variable, it first has to be transformed into log? log10 or natural log? If so, do current versions of SPSS do this automatically when defining an offset variable or do we need to handle this simple data transformation prior?

    Also, I was wondering if there is any way to participate in workshops retrospectively since you write that videos are available for those. I missed out on registering for your recent workshop as the site said it was already full.

    Thanks again for being so good at explaining this stuff. You are appreciated by many!

    Grateful2U

    Reply
    • Karen says

      August 7, 2013 at 4:03 pm

      Hi Grateful,

      First, thanks for the kind words. These are great questions.

      1. It’s a function of the number of subjects and the number of parameters in the model.
      2. This depends on your software. Some want it one way and some want it the other. I would suggest checking your manual.

      3. Yes. We don’t yet have past workshops available for sale in an organized way (though we’re working on it), but feel free to contact my team using the Contact form. We do make past workshops available on request.

      Reply
  23. Anne says

    July 14, 2013 at 10:18 am

    Hi Karen,
    Question, can an exposure variable (i.e. if I am scaling my DV by a variable) can it ALSO be an independent variable? It seems as though from your responses above that it probably can’t be, but I’m not sure. In your example, clearly the longer a patient is admitted the more opportunity there is to be an incident, but as you also point out the length of a patient’s stay may also affect the likelihood in other ways beyond just increased opportunity. Can you include the length of the stay as an exposure variable to represent the increased opportunity and then also include it as an independent variable to pick up the information which is not exposure related?

    This website is incredibly helpful. I hope they pay you the big bucks.
    Anne

    Reply
    • Karen says

      July 15, 2013 at 3:38 pm

      Hi Anne,

      The exposure variable is an independent variable in the model, but its coefficient is constrained to 1.

      I don’t know how you could include it twice, but you could always include it as a predictor instead.

      And I lol’d about the big bucks. Not sure who ‘they’ are. 🙂

      Karen

      Reply
      • Anne says

        July 22, 2013 at 9:38 am

        Sorry that I don’t know who ‘they’ are either, or I would send them your way …

        If a measurement W is in there twice (once as the exposure variable and once as an IV) then the logged value of W (as the exposure variable) could be swung to the lhs to scale the DV (because the exposure coefficient is 1) and if the coefficient on W as an IV was significant you would go from there interpreting the coefficient just like you would any other coefficient in the model, no? e^coef.? The only issue I can come up with is that the two variables (log W and W) are likely highly correlated, but wouldn’t that work against having findings on W? Then again with logW constrained to a coefficient of 1 … would that preclude me from using it that way though? If the coefficient on W is significant, then there is just a combined effect, no? I’m just not sure what to think … I am wondering if you can think of another angle to this that would make it problematic mathematically to consider.

        Reply
      • Giovanni says

        May 28, 2018 at 11:04 pm

        Why can’t you use an explanatory variable already included in the model as an exposure term? For example, I’m running a regression of crime on guns and want to control for the effect of population on the other covariates, but also want to adjust the amount of opportunity for crime by the population–thus the inclusion of population as an independent variable and exposure term.

        Reply
        • Karen Grace-Martin says

          October 26, 2018 at 5:20 pm

          Hi Giovanni,

          When you put an exposure term in, you are already including it as a predictor. You’re just setting its coefficient to 1 instead of letting the model estimate it. This is what allows you to interpret your other IRRs in terms of rate per unit of exposure.

          So if you include it as both predictor and exposure, you’ll have perfect multicollinearity between two Xs.

          Reply
          • Jude says

            May 18, 2019 at 6:45 pm

            I am trying to understand the idea of perfect collinearity as a show stopper. Wouldn’t the same argument be similar for polynomial terms.

  24. Evan says

    April 3, 2013 at 4:31 pm

    Karen,
    Thanks for the helpful article. When running generalized estimating equations, would you advise doing what you described above, or would it be acceptable to use the calculated rate as the DV and specify a poisson distribution and a log link? Thanks!
    Evan

    Reply
    • Karen says

      April 8, 2013 at 9:50 am

      Hi Evan. Most (I want to say all) stat software requires that the data be discrete if you specify a Poisson distribution. If anything has decimals, as will happen if you calculate a rate, it won’t run. Hence the need for the exposure variable.

      Reply
      • Evan says

        April 29, 2013 at 11:10 am

        Thanks for the response Karen. I’m running Stata 12 MP.
        From you’re earlier response to a comment, it looks like any IVs about my patient population (the exposure) would have to be expressed as rates. My initial thought was that i could include count of adult females instead of % of adult females because the exposure would divide the count and therefore create the rate itself. But this seems to be incorrect?
        Also, are there any resources you would suggest consulting regarding exposure for count models? I’ve searched online but haven’t found much. Thanks for your time!
        Evan

        Reply
        • Karen says

          April 29, 2013 at 6:32 pm

          Hi Evan,

          One book I really like on Poisson Regression in general is Scott Long’s book. The title is something like Regression Models for Categorical and Limited Dependent Variables. I know there is info there on the exposure variable, and it’s relatively non-math laden.

          He also has a book on the same topic, but where everything is applied in Stata. I haven’t read that one.

          Karen

          Reply
  25. Bryan says

    March 31, 2012 at 4:41 pm

    Let me start by saying that I think your site is fantastic. When using an offset it appears that it is only applied to the dependent variable, correct? I am curious if the same offset variable used to convert the dependent variable to a rate is also used to scale the independent variables in the model as well or if this must be done in another way. Thanks.

    Reply
    • Karen says

      April 2, 2012 at 10:06 am

      Hi Bryan,

      First, thanks for the kind words.

      Yes. The offset is really only offsetting the DV. If the IVs are also rates, you’d have to express them in terms of those rates.

      Best,
      Karen

      Reply
      • Bryan says

        April 6, 2012 at 4:15 pm

        Thanks for your answer.

        I can express them as rates by dividing the numbers recorded (for example kg of fruit) by the area sampled for each and then taking the log of each and use them as covariates but this still doesn’t really reflect the different area sampled for each and how much each subject (fruit transects with different lengths) ought to contribute overall. Do you know of a way to do this in SPSS? Using the GEE scale weight is not an option because that is also only for the dependent variable. Standard ‘Weight cases” also wouldn’t work because that would weight all the variables and cause problems with the offset.

        Thanks again for your help, I am really glad that I came across your site and I plan to use you when I need consultation help in the future.

        Bryan

        Reply
  26. Tarak says

    February 6, 2012 at 12:40 pm

    Thanks so much! After reading this, I better undertand the poisson models that I am running on my data (comparing mortality rates during exposure time intervals vs. non-exposuer time intervals). In medical school, we don’t recieve great training on interpreting the stats behind the research we consume. If we don’t understand the basic assumptions underlying the stats, I don’t know that we can properly interpret anything we read. This is a great contribution!

    Reply
    • Karen says

      February 10, 2012 at 6:09 pm

      Thanks, Tarak. Glad I could help!

      Karen

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Introduction to SPSS Software Tutorial

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT