I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.
If you missed it, you can get the webinar recording here.
Question: In Principal Component Analysis, can loadings be both positive and negative?
Recall that in PCA, we are creating one index variable (or a few) from a set of variables. You can think of this index variable as a weighted average of the original variables.
The loadings are the weights.
The goal of the PCA is to come up with optimal weights. “Optimal” means we’re capturing as much information in the original variables as possible, based on the correlations among those variables.
So if all the variables in a component are positively correlated with each other, all the loadings will be positive.
But if there are some negative correlations among the variables, some of the loadings will be negative too.
Here’s a simple example that we used in the webinar. We want to combine four variables about mammal species into a single component.
The variables are weight, a predation rating, amount of exposure while sleeping, and the total number of hours an animal sleeps each day.
If you look at the correlation matrix, total hours of sleep is negatively correlated with the other 3 variables. Those other three are all positively correlated.
It makes sense — species that sleep more tend to be smaller, less exposed while sleeping, and less prone to predation. Species that are high on these three variables must not be able to afford much sleep.
Think bats vs. zebras.
Likewise, the PCA with one component has positive loadings for three of the variables and a negative loading for hours of sleep.
Species with a high component score will be those with high weight, high predation rating, high sleep exposure, and low hours of sleep.