What is the Purpose of a Generalized Linear Mixed Model?

If you are new to using generalized linear mixed effects models, or if you have heard of them but never used them, you might be wondering about the purpose of a GLMM.

Mixed effects models are useful when we have data with more than one source of random variability. For example, an outcome may be measured more than once on the same person (repeated measures taken over time).

When we do that we have to account for both within-person and across-person variability. A single measure of residual variance can’t account for both.

Or maybe multiple fields each contain multiple plots with different varieties of produce to be compared, but the fields themselves are treated differently—we would need to account for the field-to-field (cluster) variability as well as the plot-to-plot variability.

However, maybe you are really wondering about the types of question that a GLMM can answer. If you do use a GLMM for one of the situations described above, what can it tell you?

As it turns out, GLMMs are quite flexible in terms of what they can accomplish. In that sense, they are not much different from many other models in the “linear family” (general linear models, like regression and ANOVA, or generalized linear models, like logistic regression). Like these models, the goal may be to answer relatively specific research questions, such as

  • What is the relationship between a particular independent variable and the expected outcome?
    • You can adjust for relationships of other variables with the outcome if they are important (covariates)
  • Does one particular independent variable change the relationship of another particular independent variable with the expected outcome? (Or phrased slightly differently, is there an interaction of two variables?)

GLMMs may also answer more general questions, like

  • What is the “best” combination of independent variables for estimating the expected outcome?
  • For a given set of values of independent variables, what is the estimated expected outcome?
    • A variant of this: what is the most likely predicted value of the outcome variable itself, and how likely is it to be equal or close to that value?

In addition to answering these kinds of questions, mixed effects models (whether linear or generalized) also can be used to understand sources of random variability in outcomes. While we often think of these additional sources of variability as annoyances, in fact, being able to describe them can be extremely useful for both summary purposes and decision making.

In The Craft of Statistical Analysis free webinar, Introduction to Generalized Linear Mixed Models, we can see an example of this. A simulated data set contains information about patients being treated for cancer, their doctors (who cared for multiple patients), and whether or not each patient was in remission following treatment by their doctor.

From a GLMM, we learn about:

1) the relationships of measurable patient and doctor traits with the probability of remission,

2) random patient-to-patient variability in individual remission outcomes, and

3) random doctor-to-doctor variability in the probability of remission.

The GLMM gives us the information we need to make an informed determination about whether individual variation among doctors is notably related to remission probabilities—even after we take account of their measurable traits, such as experience and education. It is probably clear that this information could be of much interest to patients, treatment facilities, and others involved in the healthcare industry.

What is GLMM and When Should You Use It?
When you have multilevel or repeated data and normality just isn't happening, you may need GLMM. Get started learning Generalized Linear Mixed Models and when and how to apply them to your data.

Reader Interactions


  1. Eliza says

    I am wondering how to go about a mixed effect glm where true zero values are recorded upon the repeated measurement on STATA? I now realise that gamma log link cannot be used because of this; but unsure of what alternative to use.
    Data is repeated measures of percentages over 3 time points after an intervention?
    Please advise. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.