We’re changing how we teach our statistics workshops to support more software options. Each module will feature a live webinar lecture (along with all the supplementary material — code, exercises, Q&As, etc.). This lecture used to include all the statistical concepts, the steps to implement, and a demonstration in one or more software packages. Now we’re splitting those things up so that you have easier access to the software support you need.

## Analyzing Zero-Truncated Count Data: Length of Stay in the ICU for Flu Victims

Let’s imagine you have been asked to determine the factors that will help a hospital determine the length of stay in the intensive care unit (ICU) once a patient is admitted. The hospital tells you that once the patient is admitted to the ICU, he or she has a day count of one. As soon as they spend 24 hours plus 1 minute, they have stayed an additional day. Clearly this is count data. There are no fractions, only whole numbers.

## January 2017 Webinar: Communicating Statistical Results to Non-Statisticians

One of the biggest challenges that data analysts face is communicating statistical results to our clients, advisors, and colleagues who don’t have a statistics background. Unfortunately, the way that we learn statistics is not usually the best way to communicate our work to others, and many of us are left on our own to navigate what is arguably the most important part of our work.

## Two-Way Tables and Count Models: Expected and Predicted Counts

In a previous article, we discussed how incidence rate ratios calculated in a Poisson regression can be determined from a two-way table of categorical variables. Statistical software can also calculate the expected (or predicted) count for each group. Below is the actual and expected count of the number of boys and girls participating and not participating in organized sports.

## Understanding Incidence Rate Ratios through the Eyes of a Two-Way Table

The coefficients of count model regression tables are shown in either logged form or as incidence rate ratios. Trying to explain the coefficients in logged form can be a difficult process. Incidence rate ratios are much easier to explain. You probably didn’t realize you’ve seen incidence rate ratios before, expressed differently.

## Differences Between the Normal and Poisson Distributions

The normal distribution is so ubiquitous in statistics that those of us who use a lot of statistics tend to forget it’s not always so common in actual data. And since the normal distribution is continuous, many people describe all numerical variables as continuous. I get it: I’m guilty of using those terms interchangeably, too, but they’re not exactly the same. Numerical variables can be either continuous or discrete. The difference? Continuous variables can take any number within a range. Discrete variables can only be whole numbers.

## Overdispersion in Count Models: Fit the Model to the Data, Don’t Fit the Data to the Model

If you have count data we use a Poisson model for our analysis, right? The key criterion for using a Poisson model is after accounting for the effect of predictors, the mean must equal the variance. If the mean doesn’t equal the variance then all we have to do is transform the data or tweak the model, correct? Let’s see how we can do this with some real data.

## The Impact of Removing the Constant from a Regression Model: The Categorical Case

In a simple linear regression model how the constant (aka, intercept) is interpreted depends upon the type of predictor (independent) variable.

If the predictor is categorical and dummy-coded, the constant is the mean value of the outcome variable for the reference category only. If the predictor variable is continuous, the constant equals the predicted value of the outcome variable when the predictor variable equals zero.

## Count Models: Understanding the Log Link Function

When we perform a statistical model, we are in a sense creating a mathematical equation. We have two parts to the equation. The left side of the equation is the sum of that fixed component and the random component. The random component has a probability distribution. Since the outcome variable includes that random component, it too follows a probability distribution. On the right side of the equation is a link function, which is the link between the mean of Y and the structural component. It’s very possible you have run models without being aware of this. Some software packages have predictor models (e.g., Stata’s Poisson and nbreg) which use a default link function. But if you run a generalized linear model (GLM), then you must select the link function that fits your random components.

## December 2016 Member Webinar: A Gentle Introduction to Generalized Linear Mixed Models – Part 2

Generalized linear mixed models (GLMMs) are incredibly useful tools for working with complex, multi-layered data. But they can be tough to master. In this follow-up to October’s webinar (“A Gentle Introduction to Generalized Linear Mixed Models – Part 1”), you’ll learn the major issues involved in working with GLMMs and how to incorporate these models into your own work.