Here’s a question I get pretty often: In Principal Component Analysis, can loadings be negative and positive?
Recall that in PCA, we are creating one index variable (or a few) from a set of variables. You can think of this index variable as a weighted average of the original variables.
The loadings are the correlations between the variables and the component. We compute the weights in the weighted average from these loadings.
The goal of the PCA is to come up with optimal weights. “Optimal” means we’re capturing as much information in the original variables as possible, based on the correlations among those variables.
So if all the variables in a component are positively correlated with each other, all the loadings will be positive.
But if there are some negative correlations among the variables, some of the loadings will be negative too.
An Example of Negative Loadings in Principal Component Analysis
Here’s a simple example that we used in our Principal Component Analysis webinar. We want to combine four variables about mammal species into a single component.
The variables are weight, a predation rating, amount of exposure while sleeping, and the total number of hours an animal sleeps each day.
If you look at the correlation matrix, total hours of sleep correlates negatively with the other 3 variables. Those other three are all positively correlated.
It makes sense — species that sleep more tend to be smaller, less exposed while sleeping, and less prone to predation. Species that are high on these three variables must not be able to afford much sleep.
Think bats vs. zebras.
Likewise, the PCA with one component has positive loadings for three of the variables and a negative loading for hours of sleep.
Species with a high component score will be those with high weight, high predation rating, high sleep exposure, and low hours of sleep.