The Analysis Factor

The end of the year seems to be rushing toward us as we finish up our 11th DABB webinar and 11th workshop of the year.

A quick reminder that our office will be closed Thursday and Friday next week for Thanksgiving.

This month in DABB we explored Matrix Algebra for Data Analysts and it proved to be quite popular. I thought it would be one of those topics that I was forcing on participants for their own good, but it turns out there was a lot of interest. If you missed it and are still interested, you can still get the recording if you join by the end of November.

We have one more workshop in 2015 coming up in December: Principal Component Analysis and Exploratory Factor Analysis with SPSS, taught by yours truly. I’ve gotten questions about whether it will work to follow along in a different software package and the answer is yes. I’m only demonstrating SPSS, but the focus of the workshop is on the concepts and steps, so if you’re comfortable with figuring it out in your favorite software, it should be accessible to you. And if you’d like to see demonstrations in another package, let us know and we’ll see what we can put together.

One of the great things you can do with Factor Analysis is create optimally-weighted index scores from a group of individual variables. This month’s workshop describes two ways to do it. Enjoy!

Happy analyzing!
Karen

One common reason for running Principal Component Analysis (PCA) or Factor Analysis (FA) is variable reduction.

In other words, you may start with a 10-item scale meant to measure something like Anxiety, which is difficult to accurately measure with a single question.

You could use all 10 items as individual variables in an analysis--perhaps as predictors in a regression model.

But you'd end up with a mess.

Not only would you have trouble interpreting all those coefficients, but you're likely to have multicollinearity problems.

And most importantly, you're not interested in the effect of each of those individual 10 items on your outcome. You're interested in the effect of Anxiety as a whole.

So we turn to a variable reduction technique like FA or PCA to turn 10 related variables into one that represents the construct of Anxiety.

FA and PCA have different theoretical underpinnings and assumptions and are used in different situations, but the processes are very similar. We'll use FA here for this example.
So let's say you have successfully come up with a good factor analytic solution, and have found that indeed, these 10 items all represent a single factor that can be interpreted as Anxiety. There are two similar, but theoretically distinct ways to combine these 10 items into a single index.

Part of the Factor Analysis output is a table of Factor Loadings. Each item's loading represents how strongly that item is associated with the underlying factor.

Some loadings will be so low that we would consider that item as unassociated with the factor and we wouldn't want to include it in the index. But even among items with reasonably high loadings, the loadings can vary quite a bit. If those loadings are very different from each other, you'd want the index to reflect that each item has an unequal association with the factor.

One approach to combining items is to calculate an index variable via an optimally-weighted linear combination of the items, called the Factor Scores. Each item's weight is its factor loading. So each item's contribution to the factor score depends on how strongly it relates to the factor.

Factor scores are essentially a weighted sum of the items. Because those weights are all between -1 and 1, the scale of the factor scores will be very different from a pure sum. I find it helpful to think of factor scores as standardized weighted averages.

The second, simpler approach is to calculate the linear combination ignoring weights. Either a sum or an average works, though averages have the advantage as being on the same scale as the items.

In this approach, you're running the Factor Analysis simply to determine which items load on each factor, then combining the items for each factor.

The technical name for this new variable is a factor-based score.

Factor based scores only make sense in situations where the loadings are all similar. In that case, the weights wouldn't have done much anyway.

It's never wrong to use Factor Scores. If the factor loading are very different, they're a better representation of the factor. And all software will save and add them to your data set quickly and easily.

There are two advantages of Factor Based Scores. First, they're generally more intuitive. A non-research audience can easily understand an average of items better than a standardized optimally-weighted linear combination.

Second, you don't have to worry about weights differing across samples. Factor loadings should be similar in different samples, but they won't be identical. This will affect the actual factor scores, but won't affect factor-based scores.

But before you use factor-based scores, make sure that the loadings really are similar. Otherwise you can be misrepresenting your factor.

Confusing Statistical Term #6: Factor

Factor Analysis: A Short Introduction, Part 3-The Difference Between Confirmatory and Exploratory Factor Analysis

O’Rourke, N. & Hatcher, L. (2013). A Step-By-Step Approach To Using Sas For Factor Analysis And Structural Equation Modeling. SAS Press.