• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

The Importance of Including an Exposure Variable in Count Models

by Jeff Meyer 5 Comments

When our outcome variable is the frequency of occurrence of an event, we will typically use a count model to analyze the results. There are numerous count models. A few examples are: Poisson, negative binomial, zero-inflated Poisson and truncated negative binomial.

There are specific requirements for which count model to use. The models are not interchangeable. But regardless of the model we use, there is a very important prerequisite that they all share.

We must identify the period of time or area of space in which the counts were generated.

The term used for modeling the period of time or area of space is exposure. The exposure variable modifies each observation from a count into a rate per period of time or area.

For example, if you were to count birds at various locations, you would need to know the area of space in which you are doing the count. Ten birds counted within 100 square feet is more than 10 birds counted within 625 square feet.

Counting the number of births during the month of February (28 days) represents a different length of time than the number of births during the month of January (31 days).

If we don’t take into account the different exposures, we will have biased results.

An Exposure Variable Example

Let’s look at a model where the outcome is the number of deaths.

The predictors in the model are whether the deceased smoked and what age bracket they were in. The exponentiated model coefficients represent the incidence rate ratio (IRR) of the category compared to the base category.

The results tell us that smokers have a rate of death 6.24 times greater than non-smokers when controlling for their age bracket.

We also find that 55- to 64-year-olds  have a rate of death 6.88 times more than 35- to 44-year-olds.

Interesting enough, we see that the rate of death for 75- to 84-year-olds is lower than 55- to 64-year-olds, when controlling for smoking. I would have to think that doesn’t make sense.

Converting Counts to Rates

Question: Over what period or area were the outcomes measured? Were they measured over the same period of time and over the same size population?

It turns out they were not.

Each observation measures the number of deaths by person-years. The data in this analysis were collected from English counties. It contains the number of smokers and non-smokers per age category and the number of deaths over a specific time period in each county.

As you can imagine, the number of people living in county A and county B will differ. So will the number of years each county is measured.

Including an exposure variable, such as person-years, allows the counts of deaths to be comparable. We don’t want to predict more deaths just because a county has more people or because we measured it for a longer period of time.

After including person-years in our model as the exposure variable, we get very different results.

The incidence rate ratio drops from 6.24 to 1.43 when comparing smokers to non-smokers. In addition, as age increases, the incident rate ratio (as compared to the base category) increases. This intuitively makes sense.

Note: Some statistical software requires the analyst to include the “offset” variable rather than the “exposure” variable. If that is the case with your software, you will need to take the natural log of the variable in order to include it in the model.

**This article was updated Nov. 19th, 2020.

Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here.

Poisson and Negative Binomial Regression for Count Data
Learn when you need to use Poisson or Negative Binomial Regression in your analysis, how to interpret the results, and how they differ from similar models.

Tagged With: Count data, count model, exposure variable, incidence rate ratio, linear regression, negative binomial, offset variable, Poisson Regression

Related Posts

  • The Problem with Linear Regression for Count Data
  • The Exposure Variable in Poisson Regression Models
  • Count Models: Understanding the Log Link Function
  • Getting Accurate Predicted Counts When There Are No Zeros in the Data

Reader Interactions

Comments

  1. Julian Schenkenhofer says

    October 9, 2020 at 5:01 am

    Thank you so much for your great article. But exactly is the exposure variable doing? Is it diving the number of y/exposure? But wouldn’t we have to divide the X variables, then, too? (for each case)

    How do we actually include the exposure variable in the model? Is it included as a new X variable? Or do we divide Y through exposure?

    Thanks for helping me.

    Reply
    • Jeff Meyer says

      October 9, 2020 at 12:00 pm

      Hi Julian,
      The exposure variable is used by the statistical software to weight the observations so that each observation’s time period or area of measurement are equivalent. This type of model is also known as a proportional intensity model. You as the analyst do not need to make any adjustments, the software will make the adjustments for you. How you include it in the model depends upon which statistical software package you are using. It’s basically an option that you add to your model.

      Jeff

      Reply
      • Sam says

        November 20, 2020 at 2:23 pm

        So does this mean all the predictor variables are also weighted by the exposure/offset variable?

        For example, I have an outcome variable of crime count, and want to use neighbourhood size as the offset variable. My main predictor is number of alcohol outlets. Does having neighbourhood size as the offset variable mean that the number of bars is also turned into a rate (number of alcohol outlets/neighbourhood size). Or should I construct an alcohol outlets rate variable myself (i.e. Number of alcohol outlets / neighbourhood size)?

        Thanks

        Reply
  2. Natalie says

    September 1, 2019 at 11:00 pm

    Thank you so much for this very helpful article, as well as the very helpful Exposure Variable in Poisson Regression Models article.

    I was wondering if you can give an example of what the person-years exposure variable would look like from this article? Would this be the number of total people observed per county per duration of measurement year such as:
    – 200k for county A for measurement of 7 years
    – 100k for county B for measurement of 7 years
    – 600k for county A for measurement of 5 years
    or the total people observed per duration of measurement year such as:
    – 300k for measurement of 7 years
    – 600k for measurement of 5 years
    or something else? Should it be number of people total observed, or only those that died?

    Thank you so much for your time and help!
    Natalie

    Reply
    • Jeff Meyer says

      September 3, 2019 at 9:30 am

      Hi Natalie,

      You will want each observation’s exposure to have the same common denominator. Let’s suppose that the outcome is number of deaths that were measured over different periods of time and different sizes of population, similar to what you stated above. The exposure is the total number of people observed. Your exposure for county A is 200k*7, county B 100k*7 and county C 600k * 5. The exposure relates to each individual observation, not groups of observations.

      Jeff

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • February Member Training: Choosing the Best Statistical Analysis

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT