One common reason for running Principal Component Analysis (PCA) or Factor Analysis (FA) is variable reduction.

In other words, you may start with a 10-item scale meant to measure something like Anxiety, which is difficult to accurately measure with a single question.

You could use all 10 items as individual variables in an analysis–perhaps as predictors in a regression model.

But you’d end up with a* mess*.

Not only would you have trouble interpreting all those coefficients, but you’re likely to have multicollinearity problems.

And most importantly, you’re not interested in the effect of *each* of those individual 10 items on your outcome. You’re interested in the effect of Anxiety *as a whole*.

So we turn to a variable reduction technique like FA or PCA to turn 10 related variables into one that represents the construct of Anxiety.

FA and PCA have different theoretical underpinnings and assumptions and are used in different situations, but the processes are very similar. We’ll use FA here for this example.

So let’s say you have successfully come up with a good factor analytic solution, and have found that indeed, these 10 items all represent a single factor that can be interpreted as Anxiety. There are two similar, but theoretically distinct ways to combine these 10 items into a single index.

**Factor Scores**

Part of the Factor Analysis output is a table of factor loadings. Each item’s loading represents how strongly that item is associated with the underlying factor.

Some loadings will be so low that we would consider that item as unassociated with the factor and we wouldn’t want to include it in the index. But even among items with reasonably high loadings, the loadings can vary quite a bit. If those loadings are very different from each other, you’d want the index to reflect that each item has an unequal association with the factor.

One approach to combining items is to calculate an index variable via an *optimally-weighted* linear combination of the items, called the Factor Scores. Each item’s weight is its factor loading. So each item’s contribution to the factor score depends on how strongly it relates to the factor.

Factor scores are essentially a weighted sum of the items. Because those weights are all between -1 and 1, the scale of the factor scores will be very different from a pure sum. I find it helpful to think of factor scores as standardized weighted averages.

**Factor-Based Scores**

The second, simpler approach is to calculate the linear combination ignoring weights. Either a sum or an average works, though averages have the advantage as being on the same scale as the items.

In this approach, you’re running the Factor Analysis simply to determine which items load on each factor, then combining the items for each factor.

The technical name for this new variable is a factor-based score.

Factor based scores only make sense in situations where the loadings are all similar. In that case, the weights wouldn’t have done much anyway.

**Which Scores to Use?**

It’s never wrong to use Factor Scores. If the factor loading are very different, they’re a better representation of the factor. And all software will save and add them to your data set quickly and easily.

There are two advantages of Factor Based Scores. First, they’re generally more intuitive. A non-research audience can easily understand an average of items better than a standardized optimally-weighted linear combination.

Second, you don’t have to worry about weights differing across samples. Factor loadings should be similar in different samples, but they won’t be identical. This will affect the actual factor scores, but won’t affect factor-based scores.

But before you use factor-based scores, make sure that the loadings really are similar. Otherwise you can be misrepresenting your factor.

{ 8 comments… read them below or add one }

Hi Karen,

is a high correlation between factor-based scores and factor scores (>.95 for example) any indication that it’s fine to use factor-based scores?

Thanks, Lisa

Hi Lisa,

I have never heard of this criterion but it sounds reasonable. As a general rule, you’re usually better off using mulitple criteria to make decisions like this.

I have a question on the phrase:”to calculate an index variable via an optimally-weighted linear combination of the items”

since the factor loadings are the (calculated-now fixed) weights that produce factor scores what does the ‘optimally’ refer to?

Before running PCA or FA is it 100% necessary to standardize variables? in each case, what would the two(using standardization or not) different results signal

The question I’d like to ask is what is the correlation of regression and PCA.

From my understanding the correlations of a factor and its constituent variables is a form of linear regression – multiplying the x-values with estimated coefficients produces the factor’s values

And my most important question is can you perform (not necessarily linear) regression by estimating coefficients for *the factors* that have their own now constant coefficients)

I found it is easily understandable and clear. I would like to work on it how can

I get the detail resources that focus on implementing factor analysis in research project with some examples.

thank you

Hi,

I’m using factor analysis to create an index, but I’d like to compare this index over multiple years. What is the best way to do this? Can I use the weights of the first year for following years? Can I calculate the average of yearly weightings and use this?

Your help would be greatly appreciated!

I have data on income generated by four different types of crops.My crop of interest is cassava and i want to compare income earned from it against the rest. Can i develop an index using the factor analysis and make a comparison?

Hi Karen,

After obtaining factor score, how to you use it as a independent variable in a regression?