What Is Latent Class Analysis?

One of the most common—and one of the trickiest—challenges in data analysis is deciding how to include multiple predictors in a model, especially when they’re related to each other.

Let’s say you are interested in studying the relationship between work spillover into personal time as a predictor of job burnout.

You have 5 categorical yes/no variables that indicate whether a particular symptom of work spillover is present (see below).

While you could use each individual variable, you’re not really interested if one in particular is related to the outcome. Perhaps it’s not really each symptom that’s important, but the idea that spillover is happening.

One possibility is to count up the number of items to which each respondent said yes. This variable will measure the degree to which spillover is happening. In many studies, this is just what you need.

But it doesn’t tell you something important—whether there are certain combinations that generally co-occur, and is it these combinations that affect burnout?

In other words, what if it’s not just the degree of spillover that’s important, but the type?

Enter Latent Class Analysis (LCA).

LCA is a measurement model in which individuals can be classified into mutually exclusive and exhaustive types, or latent classes, based on their pattern of answers on a set of categorical indicator variables. (Factor Analysis is also a measurement model, but with continuous indicator variables).

	Probability of ‘Yes’ response for each Class
Item	Class 1 (20%)	Class 2 (61%)	Class 3 (12%)	Class 4 (7%)
Regularly brings home work to work on in the evenings	.30	.08	1.0	.66
Is asked to work weekends to meet deadlines	.10	.03	.47	1.0
Is expected to answer emails from the office within an hour outside of working hours	.93	.04	.15	.96
Checks work email from home	.84	.45	.91	.94
Is expected to be on call during vacations	.66	.15	.06	.88

True class membership is unknown for each individual. As categories of a latent variable, these classes can’t be directly measured other than through the patterns of responses on the indicator variables.

There are two sets of parameters in an LCA. The first is the set of inclusion probabilities that any random person will be in any latent class. You can see in the example above that there are 4 classes, and that 20% of respondents are in Class 1, 61% are in Class 2, etc.

The blue numbers in each column are the second type of parameters, equivalent to factor loadings in confirmatory factor analysis. Each is the conditional probability that someone in a particular class would respond ‘yes’ to a certain item. These parameters are used to interpret the classes.

For example, the largest class, Class 2, might be interpreted as the “Low Spillover” group. Their probability of answering ‘yes’ to any of the 5 questions is relatively low. The only one that is a little bit high is ‘Checks work email from home,’ but even so, this group does this at the lowest probability of any of the classes.

Likewise, Class 4, the smallest, has a pretty high probability of answering ‘yes’ to every single question. This class would be the “High Spillover” group.

So far, it’s not very interesting, right? It just seems a level of degree.

But Classes 1 and 3 are more interesting.

Class 1 has pretty high probabilities of answering ‘yes’ to three of the questions and very low probabilities of answering ‘yes’ to the other two. If you examine what they’re saying yes to, they’re all about being available to the company outside of work hours. So their personal lives are often interrupted, but they’re not regularly working long hours.

Compare this to Class 3. Class 3 is quite different. Members of Class 3 are highly likely to check work email from home, but they’re also regularly putting in extra work in the evenings and, to a lesser extent, on weekends. They’re not expected to be at the beck and call of work, however. (Maybe they’re the ones in the office working late).

These are two qualitatively different ways of having work spill into home life, and they could have different impacts on burnout. This is how Latent Class Analysis can be so useful.

In this example, we were able to use Latent Class Analysis to identify a latent typology that is used as a predictor variable, but there are many other uses within statistics, too.

So be sure to keep LCA on your radar—you never know when it might come in handy.

The Pathway: Steps for Staying Out of the Weeds in Any Data Analysis

Get the road map for your data analysis before you begin. Learn how to make any statistical modeling – ANOVA, Linear Regression, Poisson Regression, Multilevel Model – straightforward and more efficient.

Comments

Soundoftext says

February 21, 2024 at 4:21 am

Great post! I found the explanation of the Holland’s Publication Model particularly helpful in understanding the process of latent class analysis. It made me realize that this technique can be used to identify unobserved subpopulations within a larger population, which can be valuable in various research applications. Thank you for sharing your insights!

Reply
Tony Goodchild says

January 16, 2024 at 11:17 am

Is the LCA method similar to any kind of cluster analysis?

Reply
- Karen Grace-Martin says
  
  January 16, 2024 at 12:30 pm
  
  It’s used to answer a similar research question, but there are key differences. LCA uses all categorical variables and is a measurement model. Cluster analysis uses continuous variables and isn’t measuring a latent construct. It’s based on distances.
  
  Reply
Michael Johnson Mahande says

October 23, 2023 at 2:25 am

This is very interesting topic

Reply
Lou says

November 11, 2022 at 3:09 pm

LatentGold from Statistical Innovations is a powerful

app for latent class methods.

Reply
Jon K Peck says

November 11, 2022 at 10:26 am

Latent class analysis is available in SPSS Statistics via the PLS (Regression > Partial Least Squares) or STATS LATENT CLASS (Loglinear > Latent Class Analysis) extension commands.

Reply
- Karen Grace-Martin says
  
  November 11, 2022 at 11:48 am
  
  Hi Jon,
  
  Wow, that’s great to know. Thanks!
  
  Reply
Meghna Chakraborty says

June 19, 2020 at 2:04 pm

Hi, I would like to estimate the different factors influencing the child restraint use. I have two age groups of children, 0 to 3 years, and 4 to 7 years. My independent variables are driver age, race, gender, vehicle type etc. Can I run an LCM model using the children group as a latent class? Do you think, I am getting it conceptually right? Please suggest. Thank you very much!

Reply
- Karen Grace-Martin says
  
  June 25, 2020 at 4:51 pm
  
  Hi Meghna,
  
  I don’t think so. The latent classes are groupings you don’t actually know, but are inferring from patterns in the data. I assume you have observed data on the child’s age?
  
  Reply
Larry says

April 14, 2020 at 11:17 am

Interesting. How do we determine the classes? Do we use Maximum Likelihood?

Reply
- Karen Grace-Martin says
  
  April 17, 2020 at 2:35 pm
  
  Hi Larry,
  
  Yes, that’s what the LCA is doing. Finding the classes. And yes, it uses Maximum Likelihood to do so.
  
  Reply
Annette Ponnock says

October 21, 2019 at 3:09 pm

I have card sort data and I want to run an LCA to determine classes of people who grouped cards similarly. How would I go about doing that?

Reply
Ella Ganio says

July 19, 2019 at 9:16 pm

our survey is all about learning facilities and environment.these are the choices that will be used. is this a 4 point or 5 point likert scale?

4-Very satisfied 3–satisfied 2–slightly satisfied 1–not satisfied 0-No Experience in the facility

would i solve for 0 too?

Reply
- Karen Grace-Martin says
  
  August 22, 2019 at 1:18 pm
  
  Ella, It’s not really a 5-point scale b/c 0 isn’t part of the ordering. It’s a qualitatively different category. So this variable isn’t entirely ordinal, but it certainly is categorical, so you can definitely use LCA on it.
  
  Reply
Noel McGinn says

April 14, 2018 at 2:45 pm

Can Latent Class Analysis be done using SPSS Statistics 23?

Reply
- Karen Grace-Martin says
  
  May 15, 2018 at 11:30 am
  
  Not version 23. I know Stata, R, MPlus, and SAS can all do it.
  
  Reply

Enter Latent Class Analysis (LCA).

So far, it’s not very interesting, right? It just seems a level of degree.

Reader Interactions

Comments

Leave a Reply Cancel reply