An incredibly useful tool in evaluating and comparing predictive models is the ROC curve.
Its name is indeed strange. ROC stands for Receiver Operating Characteristic. Its origin is from sonar back in the 1940s; ROCs were used to measure how well a sonar signal (e.g., from an enemy submarine) could be detected from noise (a school of fish).
In order to do this, a model needs to not only correctly predict a positive as a positive, but also a negative as a negative.
The ROC curve does this by plotting sensitivity, the probability of predicting a real positive will be a positive, against 1-specificity, the probability of predicting a real negative will be a positive. (A previous article covered the specifics of sensitivity and specificity, in case you need a review about what they mean–and why it’s important to know how accurately the model is predicting positives and negatives separately.)
The best decision rule is high on sensitivity and low on 1-specificity. It’s a rule that predicts most true positives will be a positive and few true negatives will be a positive.
Decision rules and models
I’ve been talking about decision rules, but what about models?
The thing is, predictive models like logistic regression don’t give you one decision rule. They give a predicted probability of a positive for each individual based on the values of that individual’s predictor values.
Your software may print out a classification table based on a default probability cutoff (usually .5). But really it’s up to you to decide what the probability cutoff should be to classify an individual as “predicted positive.”
The default isn’t always the best decision rule. Chance is only .5 if positive and negative outcomes are equally likely.
They usually aren’t.
Likewise, sometimes the cost of misclassification is different for positives and negatives, so you are willing to increase one type of misclassification in order to avoid the other.
And the optimal cutoff point isn’t always obvious.
Different models may do better at different decision rules. It’s hard to compare models as doing better or worse than each other if one performs better at one decision rule and the other does better at another.
Enter the ROC curve.
The ROC curve plots out the sensitivity and specificity for every possible decision rule cutoff between 0 and 1 for a model.
This plot tells you a few different things.
A model that predicts at chance will have an ROC curve that looks like the diagonal green line. That is not a discriminating model.
The further the curve is from the diagonal line, the better the model is at discriminating between positives and negatives in general.
There are useful statistics that can be calculated from this curve, like the Area Under the Curve (AUC) and the Youden Index. These tell you how well the model predicts and the optimal cut point for any given model (under specific circumstances).
Although ROCs are often used for evaluating and interpreting logistic regression models, they’re not limited to logistic regression. A common usage in medical studies is to run an ROC to see how much better a single continuous predictor (a “biomarker”) can predict disease status compared to chance.