In my last post I used the glm() command in R to fit a logistic model with binomial errors to investigate the relationships between the numeracy and anxiety scores and their eventual success.
Now we will create a plot for each predictor. This can be very helpful for helping us understand the effect of each predictor on the probability of a 1 response on our dependent variable.
We wish to plot each predictor separately, so first we fit a separate model for each predictor. This isn’t the only way to do it, but one that I find especially helpful for deciding which variables should be entered as predictors.
model_numeracy <- glm(success ~ numeracy, binomial) summary(model_numeracy) Call: glm(formula = success ~ numeracy, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.5814 -0.9060 0.3207 0.6652 1.8266 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.1414 1.8873 -3.254 0.001138 ** numeracy 0.6243 0.1855 3.366 0.000763 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 68.029 on 49 degrees of freedom Residual deviance: 50.291 on 48 degrees of freedom AIC: 54.291 Number of Fisher Scoring iterations: 5
We do the same for anxiety.
model_anxiety <- glm(success ~ anxiety, binomial) summary(model_anxiety) Call: glm(formula = success ~ anxiety, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.8680 -0.3582 0.1159 0.6309 1.5698 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 19.5819 5.6754 3.450 0.000560 *** anxiety -1.3556 0.3973 -3.412 0.000646 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 68.029 on 49 degrees of freedom Residual deviance: 36.374 on 48 degrees of freedom AIC: 40.374 Number of Fisher Scoring iterations: 6
Now we create our plots. First we set up a sequence of length values which we will use to plot the fitted model. Let’s find the range of each variable.
range(numeracy)  6.6 15.7 range(anxiety)  10.1 17.7
Given the range of both numeracy and anxiety. A sequence from 0 to 15 is about right for plotting numeracy, while a range from 10 to 20 is good for plotting anxiety.
xnumeracy <-seq (0, 15, 0.01) ynumeracy <- predict(model_numeracy, list(numeracy=xnumeracy),type="response")
Now we use the predict() function to set up the fitted values. The syntax type = “response” back-transforms from a linear logit model to the original scale of the observed data (i.e. binary).
plot(numeracy, success, pch = 16, xlab = "NUMERACY SCORE", ylab = "ADMISSION") lines(xnumeracy, ynumeracy, col = "red", lwd = 2)
The model has produced a curve that indicates the probability that success = 1 to the numeracy score. Clearly, the higher the score, the more likely it is that the student will be accepted.
Now we plot for anxiety.
xanxiety <- seq(10, 20, 0.1) yanxiety <- predict(model_anxiety, list(anxiety=xanxiety),type="response") plot(anxiety, success, pch = 16, xlab = "ANXIETY SCORE", ylab = "SUCCESS") lines(xanxiety, yanxiety, col= "blue", lwd = 2)
Clearly, those who score high on anxiety are unlikely to be admitted, possibly because their admissions test results are affected by their high level of anxiety.
About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.