Can You Use Principal Component Analysis with a Training Set Test Set Model?

I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.

If you missed it, you can get the webinar recording here.

Question: Can you use Principal Component Analysis with a Training Set Test Set Model?

Answer: Yes and no.

Principal Component Analysis specifically could be used with a training and test data set, but it doesn’t make as much sense as doing so for Factor Analysis.

That’s because PCA is really just about creating an index variable from a set of correlated predictors.

Factor Analysis is an actual model that is measuring a latent variable. Any time you’re creating some sort of scale to measure an underlying construct, you want to use Factor Analysis.

Factor Analysis is definitely best done with a training and test data set.

In fact, ideally, you’d run multiple rounds of training and test data sets, in which the variables included on your scale are updated after each test.

Exploratory Factor Analysis and Confirmatory Factor Analysis are used on the training and test data sets, respectively.

So the idea would be you put together a scale and run an Exploratory Factor Analysis on a training data set. You may have to drop variables and you may find that the variables you intended to load together to create a subscale don’t load as intended. So you’ll go through a few rounds in EFA to come up with the best possible solution.

Once you have that solution, you’d run a Confirmatory Factor Analysis on the test data set. CFA needs to be run in Structural Equation Modeling procedures, as it’s important to be able to specify which variable should load on which factor and to get overall model fit statistics.

Now, of course, the hard part is that both EFA and CFA often require hundreds of observations. So you’re going to need a very large sample in order to run both.


Principal Component Analysis
Summarize common variation in many variables... into just a few. Learn the 5 steps to conduct a Principal Component Analysis and the ways it differs from Factor Analysis.

Reader Interactions

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.