I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.
If you missed it, you can get the webinar recording here.
Question: Can you use Principal Component Analysis with a Training Set Test Set Model?
Answer: Yes and no.
Principal Component Analysis specifically could be used with a training and test data set, but it doesn’t make as much sense as doing so for Factor Analysis.
That’s because PCA is really just about creating an index variable from a set of correlated predictors.
Factor Analysis is an actual model that is measuring a latent variable. Any time you’re creating some sort of scale to measure an underlying construct, you want to use Factor Analysis.
Factor Analysis is definitely best done with a training and test data set.
In fact, ideally, you’d run multiple rounds of training and test data sets, in which the variables included on your scale are updated after each test.
Exploratory Factor Analysis and Confirmatory Factor Analysis are used on the training and test data sets, respectively.
So the idea would be you put together a scale and run an Exploratory Factor Analysis on a training data set. You may have to drop variables and you may find that the variables you intended to load together to create a subscale don’t load as intended. So you’ll go through a few rounds in EFA to come up with the best possible solution.
Once you have that solution, you’d run a Confirmatory Factor Analysis on the test data set. CFA needs to be run in Structural Equation Modeling procedures, as it’s important to be able to specify which variable should load on which factor and to get overall model fit statistics.
Now, of course, the hard part is that both EFA and CFA often require hundreds of observations. So you’re going to need a very large sample in order to run both.