Factor Analysis: A Short Introduction, Part 1

by guest

by Maike Rahn, PhD

Why use factor analysis?

Factor analysis is a useful tool for investigating variable relationships for complex concepts such as socioeconomic status, dietary patterns, or psychological scales.

It allows researchers to investigate concepts that are not easily measured directly by collapsing a large number of variables into a few interpretable underlying factors.

What is a factor?

The key concept of factor analysis is that multiple observed variables have similar patterns of responses because of their association with an underlying latent variable, the factor, which cannot easily be measured.

For example, people may respond similarly to questions about income, education, and occupation, which are all associated with the latent variable socioeconomic status.

In every factor analysis, there are the same number of factors as there are variables.  Each factor captures a certain amount of the overall variance in the observed variables, and the factors are always listed in order of how much variation they explain.

The eigenvalue is a measure of how much of the variance of the observed variables a factor explains.  Any factor with an eigenvalue ≥1 explains more variance than a single observed variable.

So if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables.  This factor, which captures most of the variance in those three variables, could then be used in other analyses.

The factors that explain the least amount of variance are generally discarded.  Deciding how many factors are useful to retain will be the subject of another post.

What are factor loadings?

The relationship of each variable to the underlying factor is expressed by the so-called factor loading. Here is an example of the output of a simple factor analysis looking at indicators of wealth, with just six variables and two resulting factors.

VariablesFactor 1Factor 2
Income0.650.11
Education0.590.25
Occupation0.480.19
House value0.380.60
Number of public parks in neighborhood0.130.57
Number of violent crimes per year in neighborhood0.230.55

 

The variable with the strongest association to the underlying latent variable. Factor 1, is income, with a factor loading of 0.65.

Since factor loadings can be interpreted like standardized regression coefficients, one could also say that the variable income has a correlation of 0.65 with Factor 1. This would be considered a strong association for a factor analysis in most research fields.

Two other variables, education and occupation, are also associated with Factor 1. Based on the variables loading highly onto Factor 1, we could call it “Individual socioeconomic status.”

House value, number of public parks, and number of violent crimes per year, however, have high factor loadings on the other factor, Factor 2. They seem to indicate the overall wealth within the neighborhood, so we may want to call Factor 2 “Neighborhood socioeconomic status.”

Notice that the variable house value also is marginally important in Factor 1 (loading = 0.38). This makes sense, since the value of a person’s house should be associated with his or her income.

About the Author: Maike Rahn is a health scientist with a strong background in data analysis.   Maike has a Ph.D. in Nutrition from Cornell University.


Bookmark and Share

Do you need help on specific statistical topics and have time to watch an hour long instructional video? Take a look at our downloadable webinar recordings available for $17 each.

Send to Kindle

{ 13 comments… read them below or add one }

Clint August 17, 2013 at 7:33 pm

Hello Dr. Rahn

This was the best and and easiest to understand explanation of Factor Analysis I have found. I will book mark your page as a future reference. Thanks

Clint

Reply

Wilbert September 12, 2013 at 9:04 am

Very clear and useful description, also understandable for non-mathematicians, e.g. linguists. Many thanks for posting this!

Reply

Jakob September 17, 2013 at 10:14 pm

Dear Dr. Rahn,

I would like to hear your opinion if this method is valid:

I have used a PLS model and created an ‘factor’ (lets called it “Loyalty”). To make that factor I’ve used four variables and the factor loadings are the following:

s1 factorloading: 0,934
s2 factorloading: 0,886
s3 factorloading: 0,913
s4 factorloading: 0,937

Next I would like to estimate the loyalty of a respondent, who has the following values:

s1 = 3
s2 = 4
s3 = 4
s4 = 2

How can I emerge these values to one value and group each respondent into e.g. two groups (e.g. high loyalty, low loyalty)

I have an idea:
I use this formular:

Sum of (factorloading (si) * values(si))

(0.934 * 3) + (0.886 * 4) + (0.913 * 4) * (0.937 * 2) = 11.872

or maybe this formular:

Sum of (factorloadings(si) / (sum of factorloadings(s1,s2,s3,s4)) * values(si)

((0.934/(0.934+0.886+0.913+0.937)) * 3) + ((0.886/ (0.934+0.886+0.913+0.937)) * 4 + ((0.913 * (0.934+0.886+0.913+0.937)) * 4 + ((0.937 * (0.934+0.886+0.913+0.937)) * 2) = 3.23
Using this formular in this example would give the respondent a value of:

which formular is the right one (if any), and if either of them are the right one, what is?

thanks

p.s. Anyone is welcome to answer this question :)

Reply

hari November 10, 2013 at 10:53 am

the first one is correct. the Factor is a linear combination of the original variable. Hence, your first formula, represents the required info.

Reply

seatlathebe ephraim lepomane March 11, 2014 at 4:54 am

Dear Dr.

very simple and informative.

thanks

Reply

jessica June 30, 2014 at 11:01 am

Thanks, this was great. simple and to the point. many thanks.

Reply

Emily July 13, 2014 at 8:33 pm

Dr. Rahn- I’ve been trying all afternoon to understand a research article that used this method and this was the first explanation that has helped me. Thank you very much for posting it!

Reply

rania July 18, 2014 at 7:14 am

Thanks a lot this made my life a lot easier in the PHD
Thanks again!!

Reply

sangeetha July 21, 2014 at 7:40 am

very usefull an understandable explanation.saved lit if time bcoz if this easy explationation..thank you…sir mikhe…

Reply

ashish August 3, 2014 at 2:42 am

This was simple and clear with commonsense.

Reply

Amaa September 23, 2014 at 3:55 pm

As i am using Factor analysis by SPSS in my master research, i got five factors related to my research. At the end of the results by spss there is a 5*5 matrix ( 5 are the factors ). What does this matrix endicated for? in the beginning i thought it is a correlation matrix of the factors, but then i’ve been told no it isn’t ( without giving me what it is exactly). Can you help please?
p.s ; welcome to everybodys’ answer.

Thank you.

Reply

john September 24, 2014 at 12:00 pm

Dear Dr Thanks very much for you explanation on factor analysis, even those who beginners in statistics like me can follow your elaborations. its so illuminating. have gone through several text on factor analysis but could hardly capture the concept,
Thanks

Reply

Bibi October 7, 2014 at 6:25 am

Thank you very much Dr. Rahn. I have struggled 13 months to understand Factor Analysis, and this has been the simple and very helpful. Thank you again.

Reply

Leave a Comment

Previous post:

Next post: