# Factor Analysis: A Short Introduction, Part 1

by

by Maike Rahn, PhD

## Why use factor analysis?

Factor analysis is a useful tool for investigating variable relationships for complex concepts such as socioeconomic status, dietary patterns, or psychological scales.

It allows researchers to investigate concepts that are not easily measured directly by collapsing a large number of variables into a few interpretable underlying factors.

## What is a factor?

The key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent (i.e. not directly measured) variable.`their association with an underlying latent variable, the factor, which cannot easily be measured.`

For example, people may respond similarly to questions about income, education, and occupation, which are all associated with the latent variable socioeconomic status.

In every factor analysis, there are the same number of factors as there are variables.  Each factor captures a certain amount of the overall variance in the observed variables, and the factors are always listed in order of how much variation they explain.

The eigenvalue is a measure of how much of the variance of the observed variables a factor explains.  Any factor with an eigenvalue ≥1 explains more variance than a single observed variable.

So if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables.  This factor, which captures most of the variance in those three variables, could then be used in other analyses.

The factors that explain the least amount of variance are generally discarded.  Deciding how many factors are useful to retain will be the subject of another post.

The relationship of each variable to the underlying factor is expressed by the so-called factor loading. Here is an example of the output of a simple factor analysis looking at indicators of wealth, with just six variables and two resulting factors.

 Variables Factor 1 Factor 2 Income 0.65 0.11 Education 0.59 0.25 Occupation 0.48 0.19 House value 0.38 0.60 Number of public parks in neighborhood 0.13 0.57 Number of violent crimes per year in neighborhood 0.23 0.55

The variable with the strongest association to the underlying latent variable. Factor 1, is income, with a factor loading of 0.65.

Since factor loadings can be interpreted like standardized regression coefficients, one could also say that the variable income has a correlation of 0.65 with Factor 1. This would be considered a strong association for a factor analysis in most research fields.

Two other variables, education and occupation, are also associated with Factor 1. Based on the variables loading highly onto Factor 1, we could call it “Individual socioeconomic status.”

House value, number of public parks, and number of violent crimes per year, however, have high factor loadings on the other factor, Factor 2. They seem to indicate the overall wealth within the neighborhood, so we may want to call Factor 2 “Neighborhood socioeconomic status.”

Notice that the variable house value also is marginally important in Factor 1 (loading = 0.38). This makes sense, since the value of a person’s house should be associated with his or her income.

About the Author: Maike Rahn is a health scientist with a strong background in data analysis.   Maike has a Ph.D. in Nutrition from Cornell University.

Could you use some affordable ongoing statistical training with the opportunity to ask questions about statistical topics? Consider joining our Data Analysis Brown Bag program.

Jayashree Ramanan June 29, 2016 at 5:10 am

Explained in the simplest way even a lay man can understand. Thanks a bunch.

Rajendran June 26, 2016 at 1:57 pm

Simple and very clear explanation. It’s very clear for me now. Thank you.

Dr Altaf June 18, 2016 at 6:20 pm

Thaks sir,

Very nice explained, as simple as lay mans language

Jeremy June 6, 2016 at 3:08 am

I wish everything had such an easy to understand definition! Thank you

surag_1 June 6, 2016 at 12:31 am

Very crisp, clear and concise explanation. Thanks a ton.

IceSwan May 30, 2016 at 7:37 am

have been through many documents about factor analysis, yours is the most clear explanation. Thanks big time

Baloyi May 19, 2016 at 4:02 pm

this is the best explanation that i have understand, keep on the standard Dr,,

J. O. Kwapong May 19, 2016 at 9:56 am

I like it. kudos!

Roy May 19, 2016 at 3:11 am

Very nice explanation of factor analysis. Keep up the nice work. A small request to you sir – please start small regular tutorials on statistics & data analysis.

CMB April 14, 2016 at 5:42 pm

Just adding my thanks to the list so you keep the posts coming!

Monica April 10, 2016 at 10:21 am

OMG !
As I have searched many of websites for factor analysis. This was the best and easiest explanation i found yet.
Really helpful ! Great attempt ! Keep on doing social service !

A3 (assalafiy) March 11, 2016 at 6:44 pm

that is very nice explanation.
you are so wonderful

R March 9, 2016 at 6:56 am

Very lucid introduction on factors which would be useful to any novice to FA.

Eric Francis Eshun March 5, 2016 at 9:37 am

Thank you

Godwin Kodituwakku February 20, 2016 at 1:26 am

Simple but valuable explanation. Thanks.

Rebecca mcmullen December 30, 2015 at 2:08 pm

baba iddi December 8, 2015 at 5:56 am

thanks for the introduction on factor analysis

Prof Sreekumar Pillai November 27, 2015 at 11:28 pm

Excellent explanation of the basics,
in my language there is a saying ( around 2000 years old) “Good teachings should have the quality of mothers milk,being good ,simple,digestable and sustaining) and I feel I have found it for Factor analysis.
Keep up the good work!

Rishi September 30, 2015 at 7:40 am

Explained in one of the best ways possible!!! Helps you understand by just reading it once (quite the contrary for the definitions on the other websites)

Sat September 29, 2015 at 4:57 am

Hi Maike,
I have a survey with 15 q, 3 measure reading ability, 3 writing, 3 understanding, 3 measure monetary values and 3 measure literacy unrelated aspects.
I am confused
do I pick the read, write and understanding on the SPSS for factor analysis? how about the literacy unrelated q which are controls?
Sat

Sanelise September 19, 2015 at 6:27 am

Very simple and straight forward…Thanx

magda September 16, 2015 at 7:18 pm

Very clear explanation and useful examples. Thanks. I woudl liek to aks you somehting. I have a questionnaire of 52 items (I used it for Pilot Sutdy)and I have done FA obtaining 1O factors after reduction. I need to reduce the number of questions since 52 is too much and leave the most ‘powerful’ can I use the FA analysis to reduce the number of questions? Thank you

Lucia Sauti July 30, 2015 at 7:15 am

I would like to design a questionnaire using Likert scale that I can use for factor analysis. my challenge is should I mix positive statements and negative statements in my compilation of the questionnaire? e.g. Let us say I need to find out the view of a student if they have a negative attitude towards learning a subject. Should I say in my questionnaire, “I have a negative attitude towards Mathematics.” or I do not have a negative attitude towards Mathematics.”

Ahmed Muhammed June 2, 2015 at 8:36 pm

A very good work, thank you sir.

Ali May 9, 2015 at 10:11 pm

It seems to me you have mixed up the difference between factor analysis and PCA (Principal Component Analysis).
Where you talked about the amount of variance a factor captures and eigenvalue that measures that. it is principal components in PCA that tells you that because each principal component is orthogonal to the others and associated with an eigen-vector with a corresponding eigenvalue.

If not please let me know how eigenvalues of factors are calculated in factor anlysis

Dr. Akhter April 23, 2015 at 2:24 pm

Very simple and nice explainations

issa stambuli April 17, 2015 at 6:48 am

Well done

Abel March 30, 2015 at 10:47 am

Thanks Doc
This has been the most understandable explanation I have so far had. You mentioned something about your next post? about determination of number of factors. May you please also talk about factor analysis using R.

Jason lee March 29, 2015 at 6:32 am

Dear Dr.

Good day to you. I have a question on factor analysis. I have a pool of 30 items for my construct, then I conducted the PCs, with nine items. After conducted the CFA, it only has three items. Does this acceptable ? Thank you.

Al-Amin March 26, 2015 at 10:31 am

Fantastic explanation!! Thank you

Hassan January 12, 2015 at 3:38 am

I have two kinds of questions: one with a 5-option response and another with a 7-option one. Can I run exploratory FA on both at the same time? When I run them with SPSS it lead to 8 factors that can explain 61% of the variance. But, mathematically, is it right?

S.S. December 31, 2014 at 11:28 am

Hi Rahn,
Great Job.!!!
How am I suppose to put citations to your web site?

DR..H.K.LAKSHMANRAO December 11, 2014 at 10:16 pm

FACTOR ANALYSIS IS VERY USEFUL METHOD FOR ANALYSING SCIENTIFIC DATA PARTICULARLY FOR DATA RELATING TO BIOTECH AND FOOD TECNOLOGY AND ANIMAL BEHAVIOUR
ALSO;Principal component analysis and exploratory factor analysis are both data reduction techniques — techniques to combine a group of correlated variables into fewer variables. You can then use those combination variables — indices or subscales — in other analyses.

Rizwan September 8, 2015 at 1:33 am

Dear sir,

mohammed ibrahim, fut minna, nger state. nigeria October 25, 2014 at 5:57 pm

I am grateful to have little idea on how to apply factor analysis. But stil sir! How would I enter data on exel spreat sheet and how will I start running the analysis? I am ph.D student and one of my objective of the study has to do with factor analysis. I have identify four factors with twenty three variable in question. Pls explain step by step for me. Thanks and best regard. Looking forward to hear from you sir.

Zimula October 24, 2014 at 3:15 pm

Good stuff

Bibi October 7, 2014 at 6:25 am

Thank you very much Dr. Rahn. I have struggled 13 months to understand Factor Analysis, and this has been the simple and very helpful. Thank you again.

john September 24, 2014 at 12:00 pm

Dear Dr Thanks very much for you explanation on factor analysis, even those who beginners in statistics like me can follow your elaborations. its so illuminating. have gone through several text on factor analysis but could hardly capture the concept,
Thanks

Amaa September 23, 2014 at 3:55 pm

As i am using Factor analysis by SPSS in my master research, i got five factors related to my research. At the end of the results by spss there is a 5*5 matrix ( 5 are the factors ). What does this matrix endicated for? in the beginning i thought it is a correlation matrix of the factors, but then i’ve been told no it isn’t ( without giving me what it is exactly). Can you help please?
p.s ; welcome to everybodys’ answer.

Thank you.

ashish August 3, 2014 at 2:42 am

This was simple and clear with commonsense.

sangeetha July 21, 2014 at 7:40 am

very usefull an understandable explanation.saved lit if time bcoz if this easy explationation..thank you…sir mikhe…

rania July 18, 2014 at 7:14 am

Thanks a lot this made my life a lot easier in the PHD
Thanks again!!

Emily July 13, 2014 at 8:33 pm

Dr. Rahn- I’ve been trying all afternoon to understand a research article that used this method and this was the first explanation that has helped me. Thank you very much for posting it!

jessica June 30, 2014 at 11:01 am

Thanks, this was great. simple and to the point. many thanks.

seatlathebe ephraim lepomane March 11, 2014 at 4:54 am

Dear Dr.

very simple and informative.

thanks

hari November 10, 2013 at 10:53 am

the first one is correct. the Factor is a linear combination of the original variable. Hence, your first formula, represents the required info.

Jakob September 17, 2013 at 10:14 pm

Dear Dr. Rahn,

I would like to hear your opinion if this method is valid:

I have used a PLS model and created an ‘factor’ (lets called it “Loyalty”). To make that factor I’ve used four variables and the factor loadings are the following:

Next I would like to estimate the loyalty of a respondent, who has the following values:

s1 = 3
s2 = 4
s3 = 4
s4 = 2

How can I emerge these values to one value and group each respondent into e.g. two groups (e.g. high loyalty, low loyalty)

I have an idea:
I use this formular:

(0.934 * 3) + (0.886 * 4) + (0.913 * 4) * (0.937 * 2) = 11.872

or maybe this formular:

((0.934/(0.934+0.886+0.913+0.937)) * 3) + ((0.886/ (0.934+0.886+0.913+0.937)) * 4 + ((0.913 * (0.934+0.886+0.913+0.937)) * 4 + ((0.937 * (0.934+0.886+0.913+0.937)) * 2) = 3.23
Using this formular in this example would give the respondent a value of:

which formular is the right one (if any), and if either of them are the right one, what is?

thanks

p.s. Anyone is welcome to answer this question 🙂

Wilbert September 12, 2013 at 9:04 am

Very clear and useful description, also understandable for non-mathematicians, e.g. linguists. Many thanks for posting this!