*by Maike Rahn, PhD*

## Why use factor analysis?

Factor analysis is a useful tool for investigating variable relationships for complex concepts such as socioeconomic status, dietary patterns, or psychological scales.

It allows researchers to investigate concepts that are not easily measured directly by collapsing a large number of variables into a few interpretable underlying factors.

## What is a factor?

The key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent (i.e. not directly measured) variable.

For example, people may respond similarly to questions about income, education, and occupation, which are all associated with the latent variable socioeconomic status.

In every factor analysis, there are the same number of factors as there are variables. Each factor captures a certain amount of the overall variance in the observed variables, and the factors are always listed in order of how much variation they explain.

The eigenvalue is a measure of how much of the variance of the observed variables a factor explains. Any factor with an eigenvalue ≥1 explains more variance than a single observed variable.

So if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables. This factor, which captures most of the variance in those three variables, could then be used in other analyses.

The factors that explain the least amount of variance are generally discarded. Deciding how many factors are useful to retain will be the subject of another post.

## What are factor loadings?

The relationship of each variable to the underlying factor is expressed by the so-called factor loading. Here is an example of the output of a simple factor analysis looking at indicators of wealth, with just six variables and two resulting factors.

Variables |
Factor 1 |
Factor 2 |

Income | 0.65 | 0.11 |

Education | 0.59 | 0.25 |

Occupation | 0.48 | 0.19 |

House value | 0.38 | 0.60 |

Number of public parks in neighborhood | 0.13 | 0.57 |

Number of violent crimes per year in neighborhood | 0.23 | 0.55 |

The variable with the strongest association to the underlying latent variable. Factor 1, is income, with a factor loading of 0.65.

Since factor loadings can be interpreted like standardized regression coefficients, one could also say that the variable income has a correlation of 0.65 with Factor 1. This would be considered a strong association for a factor analysis in most research fields.

Two other variables, education and occupation, are also associated with Factor 1. Based on the variables loading highly onto Factor 1, we could call it “Individual socioeconomic status.”

House value, number of public parks, and number of violent crimes per year, however, have high factor loadings on the other factor, Factor 2. They seem to indicate the overall wealth within the neighborhood, so we may want to call Factor 2 “Neighborhood socioeconomic status.”

Notice that the variable house value also is marginally important in Factor 1 (loading = 0.38). This makes sense, since the value of a person’s house should be associated with his or her income.

**About the Author:***Maike Rahn is a health scientist with a strong background in data analysis. Maike has a Ph.D. in Nutrition from Cornell University.*

Muhammad Karim says

Explained nicely. Now the meaning of factor loading is clear. But, there is still a confusion. What is eigen value. If eigen value is greater than 1, so what does it mean???

BIBHU BHUSAN NAYAK says

Thank you so much for my first understanding on FA

tiffany field says

Very nice presentation. I have two questions: 1)on the SPSS output which of the analyses do you prefer-component, pattern or structure? and 2)how do you interpret negative sign loadings? Thanks so much. Tiffany

Barbara says

Hi,

I am still confused about the factor analysis. If have 6 factors in my analysis table, is it necessary to reduce it to say only 2 factors only?

Thanks

Idris shamsuddeen Yaradua says

Thank you sir for this explanation.my question here can I add principal component analysis and factor analysis to make an analysis?

Jyotirmoy Pandit says

Dear, In my study,l have selected some municipalities with their different indicators viz. Demographic, education, amenities, health. Here,my quarries is -by which analysis I am going to confirm that the situation of this or that municipality are good or bad. Pls reply.

Vithalani Bhargav says

Helpful thank you for help

maryam says

please help me

how many variables minimum we need to run factor analysis? I saw some researchers use at least 15. Is it the rule of thumb?

I have 3 varible and for evry vaible 150 observation

can I use factor analysis?

Rahmatullah says

Well Explained, I found it very helpful and useful as described in the easiest way to understand it.

Thank u.

Tareq says

Very clear example and useful coverage to the FA concept

Mariya Zheleva says

Dear Mr Rahn,

I would like to ask for your piece of advice on the following questions in relation to factor analysis:

1) How do you decide how many factors should be extracted? For instance, I have 44 variables in my survey and data is mainly categorical.

2) Do you conduct the factor analysis for all of variables at once or it is best to first prepare a bunch of variables and conduct the analysis. In my case, should I make like for instance 4 bunches of 11 variables and on a separate case run the factor analysis for each of the bunches. Does this mean that I should in advance make a descriptive statistic for each variable?

3) Once conducting a principle factor analysis for all variables, I see that the highest correlations have value 0,252 or 0,314 (in the correlation matrix). Does this mean that the model is insignificant?

Thank you in advance for your kind guidance.

Kind regards,

Mariya Zheleva

PhD student at Sofia University “St. Kliment Ohridski”, Bulgaria and at UVSQ in Paris, France

Alphoncina says

can someone respond to this question please.

I am facing the same problem

Hassan Golshani says

Easy to understand. thank you.

YJC says

Really nice summary!

Precise and comprehensive!

Much appreciated,

eg tan says

easy to understand.thks

Ioanna Karaoulani says

Clear, precise, simple to understand!

Thank you.

Isah says

Hi, how are the factors obtained?

Tausif says

How you get factor 1 and Factor 2 ??

atheer says

You are happy evening

I would like to ask you about your effective position on whether it is possible to use counting variables with factor analysis

thanks

Best wishes from IRAQ

Karen says

Atheer,

It’s possible. The assumption is that all variables are normally distributed. Count variables are often skewed, but not always. So check your distributions.

Lanh says

Dear Maike,

thank you so much for your clear and useful explanation. I totally understand how to apply it well.

Best wishes from Germany

upasana says

Thank you. It was easy to understand.

Morobi Mothulatshipi says

thanks a lot for the information

Mark Norman says

The article states “In every factor analysis, there are the same number of factors as there are variables”. However the table used in the example shows 6 variables and 2 factors. Why are the two numbers not equal? Does “variable” have different meanings in the statement and the table?

Thanks in advance for any clarification.

Karen says

Mark,

Because although there are as many factors as variables, they aren’t all useful. So part of the job of the data analyst is to decide how many factors are useful and therefore retained.

Alex Hamed says

This is a clear and straight forward explanation.

Alex says

This clear and straight forward explanation.

Thank you

Daniel Lim says

Thank you for the clear explanation!

Fairouz says

Thanks for the simplicity and clear info 🙂

Sarah Andalib says

Thanks. It was explained very well.

Ashenafi says

Thank you

Dr. Ramnath Takiar says

It is a well written article. If I understood correctly, we may use many questionnaire to assess some construct like Motivation. For this, I may include questions related to Work environment, Supervisor relationship, pay and other benefits, job satisfaction, training facilities etc., So there are five subcategories under which I have framed the questions. A factor analysis, if done properly should result at least in five factors. So, a factor analysis tries to stratify the questions included in the survey to homogeneous sub groups. Whether my understanding is correct?

Mark says

commendable . best explanation so far

samuel says

so if i understood it well, the FA can be used to analyse a data on “barroriers” to effective communication. That is when i have about 20 factors of the barriers to analyse. Thank you

Arslan Saleem says

God Bless you. it was an interesting, simple and understandable. it was well written and to the point. helped me a lot

Jimoh says

Thanks for your contribution of FA. It’s is helping but need a hypothesis to support it

David Akiiki Kalenzi says

Dr Maike Rahn, Thanks so much for the short explanation of what factor analysis is all about. I fully understand how to apply. I wish one day you read my piece of work.

Kindest regards from Queenstown in Eastern Cape-South Africa

Tamanna says

Hey, could you please name 4 psychological tests based on factor analysis, such as 16 PF and NEO, any other tests that you have come across?

Thanks.

James Tan says

I have read several articles trying to explain factor analysis. This one is the easiest to understand because it is clear and concise.

Mike says

Hi,

Is it safe to say that factor analysis is the the analysis done in seeking the relationship of demographic and the variables (dependent, mediator, moderator) in the study? or Or is it the analysis done on every items under a construct? to see the loading among the items that represent the construct.

Do help me as I still cant figure out what factor analysis is. Kindly assist. Many thanks.

Mike

Karen says

Hi Mike,

No, FA isn’t done to seek relationship between different variables in a relationship model.

Factor Analysis is a measurement model for an unmeasured variable (a construct). So it’s closer to your latter definition.

Pablo Ramos says

Thank you very much!

The clearest explanation I ever read.

Regards from Spain.

Morobi Mothulatshipi says

Thank you very much. I fully understand how to apply it.