One of the most common—and one of the trickiest—challenges in data analysis is deciding how to include multiple predictors in a model, especially when they’re related to each other.
Here’s an example. Let’s say you are interested in studying the relationship between work spillover into personal time as a predictor of job burnout.
You have 5 categorical yes/no variables that indicate whether a particular symptom of work spillover is present (see below).
While you could use each individual variable, you’re not really interested if one in particular is related to the outcome. Perhaps it’s not really each symptom that’s important, but the idea that spillover is happening.
One possibility is to count up the number of items to which each respondent said yes. This variable will measure the degree to which spillover is happening. In many studies, this is just what you need.
But it doesn’t tell you something important—whether there are certain combinations that generally co-occur, and is it these combinations that affect burnout?
In other words, what if it’s not just the degree of spillover that’s important, but the type?
Enter Latent Class Analysis (LCA).
LCA is a measurement model in which individuals can be classified into mutually exclusive and exhaustive types, or latent classes, based on their pattern of answers on a set of categorical variables.
Probability of ‘Yes’ response for each Class |
||||
Item |
Class 1 (20%) |
Class 2 (61%) |
Class 3 (12%) |
Class 4 (7%) |
Regularly brings home work to work on in the evenings |
.30 |
.08 |
1.0 |
.66 |
Is asked to work weekends to meet deadlines |
.10 |
.03 |
.47 |
1.0 |
Is expected to answer emails from the office within an hour outside of working hours |
.93 |
.04 |
.15 |
.96 |
Checks work email from home |
.84 |
.45 |
.91 |
.94 |
Is expected to be on call during vacations |
.66 |
.15 |
.06 |
.88 |
True class membership is unknown for each individual. As categories of a latent variable, these classes can’t be directly measured other than through the patterns of responses on the indicator items.
There are two sets of parameters in an LCA. The first is the set of inclusion probabilities that any random person will be in any latent class. You can see in the example above that there are 4 classes, and that 20% of respondents are in Class 1, 61% are in Class 2, etc.
The blue numbers in each column are the second type of parameters, equivalent to factor loadings in confirmatory factor analysis. Each is the conditional probability that someone in a particular class would respond ‘yes’ to a certain item. These parameters are used to interpret the classes.
For example, the largest class, Class 2, might be interpreted as the “Low Spillover” group. Their probability of answering ‘yes’ to any of the 5 questions is relatively low. The only one that is a little bit high is ‘Checks work email from home,’ but even so, this group does this at the lowest probability of any of the classes.
Likewise, Class 4, the smallest, has a pretty high probability of answering ‘yes’ to every single question. This class would be the “High Spillover” group.
So far, it’s not very interesting, right? It just seems a level of degree.
But Classes 1 and 3 are more interesting.
Class 1 has pretty high probabilities of answering ‘yes’ to three of the questions and very low probabilities of answering ‘yes’ to the other two. If you examine what they’re saying yes to, they’re all about being available to the company outside of work hours. So their personal lives are often interrupted, but they’re not regularly working long hours.
Compare this to class 3. Class 3 is quite different. Members of Class 3 are highly likely to check work email from home, but they’re also regularly putting in extra work in the evenings and, to a lesser extent, on weekends. They’re not expected to be at the beck and call of work, however. (Maybe they’re the ones in the office working late).
These are two qualitatively different ways of having work spill into home life, and they could have different impacts on burnout. This is how Latent Class Analysis can be so useful.
In this example, we were able to use Latent Class Analysis to identify a latent typology that is used as a predictor variable, but there are many other uses within statistics, too.
So be sure to keep LCA on your radar—you never know when it might come in handy.