Training Data

Differences in Model Building Between Explanatory and Predictive Models

October 8th, 2018 by

Suppose you are asked to create a model that will predict who will drop out of a program your organization offers. You decide to use a binary logistic regression because your outcome has two values: “0” for not dropping out and “1” for dropping out.

Most of us were trained in building models for the purpose of understanding and explaining the relationships between an outcome and a set of predictors. But model building works differently for purely predictive models. Where do we go from here? (more…)


Can You Use Principal Component Analysis with a Training Set Test Set Model?

January 20th, 2017 by

I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.

If you missed it, you can get the webinar recording here.

Question: Can you use Principal Component Analysis with a Training Set Test Set Model?

Answer: Yes and no.

Principal Component Analysis specifically could be used with a training and test data set, but it doesn’t make as much sense as doing so for Factor Analysis.

That’s because PCA is really just about creating an index variable from a set of correlated predictors.

Factor Analysis is an actual model that is measuring a latent variable. Any time you’re creating some sort of scale to measure an underlying construct, you want to use Factor Analysis.

Factor Analysis is definitely best done with a training and test data set.

In fact, ideally, you’d run multiple rounds of training and test data sets, in which the variables included on your scale are updated after each test. (more…)