# Generalized Linear Models in R, Part 1: Calculating Predicted Probability in Binary Logistic Regression

by

by David Lillis, Ph.D.

Ordinary Least Squares regression provides linear models of continuous variables. However, much data of interest to statisticians and researchers are not continuous and so other methods must be used to create useful predictive models.

The glm() command is designed to perform generalized linear models (regressions) on binary outcome data, count data, probability data, proportion data and many other data types.

In this blog post, we explore the use of R’s glm() command on one such data type. Let’s take a look at a simple example where we model binary data.

In the mtcars data set, the variable vs indicates if a car has a V engine or a straight engine.

We want to create a model that helps us to predict the probability of a vehicle having a V engine or a straight engine given a weight of 2100 lbs and engine displacement of 180 cubic inches.

First we fit the model:

We use the glm() function, include the variables in the usual way, and specify a binomial error distribution, as follows:

`model <- glm(formula= vs ~ wt + disp, data=mtcars, family=binomial)`
`summary(model)`
```Call:
glm(formula = vs ~ wt + disp, family = binomial, data = mtcars)```
```Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.67506  -0.28444  -0.08401   0.57281   2.08234```
```Coefficients:
Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  1.60859    2.43903   0.660    0.510
wt           1.62635    1.49068   1.091    0.275
disp        -0.03443    0.01536  -2.241    0.025 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1```
`(Dispersion parameter for binomial family taken to be 1)`
```Null deviance: 43.86 on 31 degrees of freedom
Residual deviance: 21.40 on 29 degrees of freedom
AIC: 27.4```
`Number of Fisher Scoring iterations: 6`

We see from the estimates of the coefficients that weight influences vs positively, while displacement has a slightly negative effect.

The model output is somewhat different from that of an ordinary least squares model. I will explain the output in more detail in the next article, but for now, let’s continue with our calculations.

Remember, our goal here is to calculate a predicted probability of a V engine, for specific values of the predictors: a weight of 2100 lbs and engine displacement of 180 cubic inches.

To do that, we create a data frame called newdata, in which we include the desired values for our prediction.

`newdata = data.frame(wt = 2.1, disp = 180)`

Now we use the predict() function to calculate the predicted probability. We include the argument type=”response” in order to get our prediction.

```predict(model, newdata, type="response")
0.2361081```

The predicted probability is 0.24.

That wasn’t so hard! In our next article, I will explain more about the output we got from the glm() function.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics. Understanding Probability, Odds, and Odds Ratios in Logistic Regression
Despite the way the terms are used in common English, odds and probability are not interchangeable. Join us to see how they differ, what each one means, and how to tame that tricky beast: Odds Ratios.

### Related Posts mbikazi

thanks for the practical post! AKIN Yanik

Thanks for the sharing.
If you can provide us the data on which you apply the different model of glm it will be kind et useful for me Emmanuelle

Hello David,

Thanks for the post! Please how did you ascertain which of the categorical levels the predicted probability (0.24 in this case) ascribes to? In the article , you mentioned that 0.24 is the probability of the engine being V-shaped. Why is it not the probability of the engine being straight? Could you please explain? Thanks. Phoebe Rivory

What would happen if you were wanting to create a glm which takes into consideration the interaction between weight and displacement?
i.e. p(V engine) = Bo + B1*weight + B2*displacement + B3*weight*displacement.
Is there also a way to visually assess the deviance difference as a way to determine the model fit (similar to the residual diagnostic plots for general linear models)?

I’m pretty new to glms so hopefully that makes sense! Thanks in advance. Karen Grace-Martin

Hi Phoebe,

It’s easy. Instead of the +, use *. For example:

glm(formula = vs ~ wt*disp)

will give you a main effect for wt, main effect for disp, and an interaction between the two.

And I have not seen a visual depiction of deviance difference. aditya saraogi

In the example above – the value of 0.24. To which factor value – V Engine / Straight engine – is it leaning towards ? How do i interpret this output value against the two factors that dont have a rank to them ? Samuel Darkwa