by Maike Rahn, PhD
One of the hardest things to determine when conducting a factor analysis is how many factors to settle on. Statistical programs provide a number of criteria to help with the selection.
Eigenvalue > 1
Programs usually have a default cut-off for the number of generated factors, such as all factors with an eigenvalue of ≥1.
This is because a factor with an eigenvalue of 1 accounts for as much variance as a single variable, and the logic is that only factors that explain at least the same amount of variance as a single variable is worth keeping.
But often a cut-off of 1 results in more factors than the user bargained for or leaving out a theoretically important factor whose eigenvalue is just below 1. So use this criterion only with extreme caution.
Another option is the scree plot. A scree plot shows the eigenvalues on the y-axis and the number of factors on the x-axis. It always displays a downward curve.
The point where the slope of the curve is clearly leveling off (the “elbow) indicates the number of factors that should be generated by the analysis.
Unfortunately, both criteria sometimes yield an unreasonably high number of factors. In the above example, a cut-off of an eigenvalue ≥1 would give you seven factors. And the scree plot suggests either three or five factors due to the way the slope levels off twice.
It is important to keep in mind that one of the reasons for running a factor analysis is to reduce the large number of variables that describe a complex concept such as socioeconomic status to a few interpretable latent variables (=factor). In other words, we would like to find a smaller number of interpretable factors that explain the maximum amount variability in the data.
Total Percent Variance Explained
Therefore, another important metric to keep in mind is the total amount of variability of the original variables explained by each factor solution.
Remember that every factor analysis has the same number of factors as it does variables, and those factors are listed in the order of the variance they explain. You’ll always be able to explore more total variance by keeping more factors in the solution, but later factors explain so little variation, they don’t add much.
If the first three factors together explain most of the variability in the original 10 variables, then those factors are clearly a good, simpler substitute for all 10 variables. You can drop the rest without losing much of the original variability.
But if it takes 7 factors to explain most of the variance in those 10 variables, you might as well just use the original 10.
It is also important that the rotated factors make theoretical sense to the researcher.
Do the variables that are loading on the same factor make sense together? If you can name the concept they represent, that’s indicative that the factor solution is a reasonable one.
Likewise, do the variables that are loading on different factors measure something different? If you’ve created a scale with two items that are just different wordings of the same underlying question, a factor solution that puts them on different factors doesn’t make a lot of sense.
Keep in mind that each of the identified factors should have at least three variables with high factor loadings, and that each variable should load highly on only one factor.
After looking at the scree plot as a guide, I often wind up forcing my analysis to run between one and five factors, and then develop the five models separately.
Usually it quickly becomes clear when to drop a factor solution, especially when one factor has only two important variables and therefore does not explain much of the overall variability, or if it is not very convincing based on my theoretical expectations.
About the Author: Maike Rahn is a health scientist with a strong background in data analysis. Maike has a Ph.D. in Nutrition from Cornell University.