Factor Analysis: A Short Introduction, Part 5–Dropping unimportant variables from your analysis

by guest

Share

by Maike Rahn, PhD

When are factor loadings not strong enough?

Once you run a factor analysis and think you have some usable results, it’s time to eliminate variables that are not “strong” enough. They are usually the ones with low factor loadings, although additional criteria should be considered before taking out a variable.

As a rule of thumb, your variable should have a rotated factor loading of at least |0.4| (meaning ≥ +.4 or ≤ –.4) onto one of the factors in order to be considered important.

Some researchers use much more stringent criteria such as a cut-off of |0.7|. In some instances, this may not be realistic: for example, when the highest loading a researcher finds in her analysis is |0.5|.

Other researchers relax the criteria to the point where they include variables with factor loadings of |0.2|. Which cut-offs to use depends on whether you are running a confirmatory or exploratory factor analysis, and on what is usually considered an acceptable cut-off in your field. In addition, a variable should ideally only load cleanly onto one factor.

How many variables and observations?

Another question often asked is how many variables a researcher should use for analysis. Generally, each factor should have at least three variables with high loadings.

It is also important to have a sufficient number of observations to support your factor analysis: per variable you should ideally have about 20 observations in the data set to ensure stable results.  A common minimum is the lesser of 10 observations per variable and 100 observations.   However, some statisticians would go as low as five observations per variable .

About the Author: Maike Rahn is a health scientist with a strong background in data analysis.   Maike has a Ph.D. in Nutrition from Cornell University.
Bookmark and Share

{ 4 comments… read them below or add one }

Mohamed Zakeen

Imagine you had 42 variables for 6,000 observations. Imagine you ran a factor analysis on this dataset. Although you initially created 42 factors, a much smaller number of, say 4, uncorrelated factors might have been ‘retained’ under the criteria that the minimum eigenvalue be greater than 1 and the factor rotation will be orthogonal. As it turns out, the first factor has in eigenvalue of 8.5. Question: What does all that mean?

Reply

S. Caubet

Hi Professor Rahn,
First of all thank you so much! I wish this resource were available when I was in graduate school.

My question is about usig factor analysis for scale development to assess a set of skills taught in a workshop. We have 28 items and hypothesize 4 factors and we have 528 valid replies before the workshop and 109 for the post. We are still collecting data as this is an on-going curriculum.

Would you advise that we run a separate factor analysis for the data we collect after the workshop for comparison? If so, what should we look for? I did look at some results (both exploratory and comfirmatory) for the after workshop data and there were some differences in the groupings of the factor loadings. I am wondering if this could be a real pre/post difference in latent variables or maybe there aren’t enough cases to be conclusive. Would a larger N bring more stable results? I suppose I should just try and see.

I guess my real question is if this violates any research protocols.

I should also mention that about 60% of the after workshop group also replied to the pre assessment so they are not truely independent samples.

Thank you!
Suzanne

Reply

suresh kumar

Dear Prof Rahn,
I would like to ask, how many variables minimum we need to run factor analysis? I saw some researchers use at least 15. Is it the rule of thumb?
Or can we run factor analysis for less variables, per say less than 10?
Thank you.

Reply

Karen

You can use fewer than 10. The rule of thumb is that you need at least 3 clean-loading variables for each factor. (Clean loading = simple structure).

So if there is only one factor, you could technically use as few as 3 variables. However, it’s very common that at least one variable won’t load cleanly, so it’s always a good idea to have more variables to work with.

Whether you have any control over this depends on whether you’re designing a scale or whether you’re working with an existing data set, or something in between.

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to answers and more resources 24/7.

Previous post:

Next post: