• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Generalized Linear Models in R, Part 1: Calculating Predicted Probability in Binary Logistic Regression

by guest 9 Comments

by David Lillis, Ph.D.

 

Ordinary Least Squares regression provides linear models of continuous variables. However, much data of interest to statisticians and researchers are not continuous and so other methods must be used to create useful predictive models.

The glm() command is designed to perform generalized linear models (regressions) on binary outcome data, count data, probability data, proportion data and many other data types.

In this blog post, we explore the use of R’s glm() command on one such data type. Let’s take a look at a simple example where we model binary data.

In the mtcars data set, the variable vs indicates if a car has a V engine or a straight engine.

We want to create a model that helps us to predict the probability of a vehicle having a V engine or a straight engine given a weight of 2100 lbs and engine displacement of 180 cubic inches.

First we fit the model:

We use the glm() function, include the variables in the usual way, and specify a binomial error distribution, as follows:

model <- glm(formula= vs ~ wt + disp, data=mtcars, family=binomial)
summary(model)
Call:
glm(formula = vs ~ wt + disp, family = binomial, data = mtcars)
Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.67506  -0.28444  -0.08401   0.57281   2.08234
Coefficients:
            Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  1.60859    2.43903   0.660    0.510
wt           1.62635    1.49068   1.091    0.275
disp        -0.03443    0.01536  -2.241    0.025 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.86 on 31 degrees of freedom
Residual deviance: 21.40 on 29 degrees of freedom
AIC: 27.4
Number of Fisher Scoring iterations: 6

We see from the estimates of the coefficients that weight influences vs positively, while displacement has a slightly negative effect.

The model output is somewhat different from that of an ordinary least squares model. I will explain the output in more detail in the next article, but for now, let’s continue with our calculations.

Remember, our goal here is to calculate a predicted probability of a V engine, for specific values of the predictors: a weight of 2100 lbs and engine displacement of 180 cubic inches.

To do that, we create a data frame called newdata, in which we include the desired values for our prediction.

newdata = data.frame(wt = 2.1, disp = 180)

Now we use the predict() function to calculate the predicted probability. We include the argument type=”response” in order to get our prediction.

predict(model, newdata, type="response")
0.2361081

The predicted probability is 0.24.

That wasn’t so hard! In our next article, I will explain more about the output we got from the glm() function.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

Bookmark and Share

Tagged With: generalized linear models, GLM, logistic regression, predicted probability, R

Related Posts

  • Generalized Linear Models in R, Part 5: Graphs for Logistic Regression
  • Generalized Linear Models in R, Part 3: Plotting Predicted Probabilities
  • Generalized Linear Models (GLMs) in R, Part 4: Options, Link Functions, and Interpretation
  • Generalized Linear Models in R, Part 2: Understanding Model Fit in Logistic Regression Output

Reader Interactions

Comments

  1. Leah Kamin says

    February 25, 2020 at 7:40 am

    Thank you for these posts! Is there a way to run the upper and lower confidence intervals of the predicted probabilities?

    Reply
  2. mbikazi says

    June 21, 2019 at 9:03 am

    thanks for the practical post!

    Reply
  3. AKIN Yanik says

    February 14, 2019 at 12:33 pm

    Thanks for the sharing.
    If you can provide us the data on which you apply the different model of glm it will be kind et useful for me

    Reply
  4. Emmanuelle says

    January 3, 2019 at 8:34 am

    Hello David,

    Thanks for the post! Please how did you ascertain which of the categorical levels the predicted probability (0.24 in this case) ascribes to? In the article , you mentioned that 0.24 is the probability of the engine being V-shaped. Why is it not the probability of the engine being straight? Could you please explain? Thanks.

    Reply
  5. Phoebe Rivory says

    April 20, 2018 at 9:25 pm

    Thanks for the helpful post!

    What would happen if you were wanting to create a glm which takes into consideration the interaction between weight and displacement?
    i.e. p(V engine) = Bo + B1*weight + B2*displacement + B3*weight*displacement.
    Is there also a way to visually assess the deviance difference as a way to determine the model fit (similar to the residual diagnostic plots for general linear models)?

    I’m pretty new to glms so hopefully that makes sense! Thanks in advance.

    Reply
    • Karen Grace-Martin says

      May 15, 2018 at 11:28 am

      Hi Phoebe,

      It’s easy. Instead of the +, use *. For example:

      glm(formula = vs ~ wt*disp)

      will give you a main effect for wt, main effect for disp, and an interaction between the two.

      And I have not seen a visual depiction of deviance difference.

      Reply
  6. aditya saraogi says

    November 25, 2017 at 9:42 pm

    In the example above – the value of 0.24. To which factor value – V Engine / Straight engine – is it leaning towards ? How do i interpret this output value against the two factors that dont have a rank to them ?

    Reply
  7. Samuel Darkwa says

    November 23, 2017 at 1:04 am

    Dear Dr. David Lillis,
    Hope this email finds you well. I am a graduate student (international student) interested in using R but have certain challenges. I could not get my methods while in class and I made a choice to pass the course and remain in the program and learn to use the software later.
    However, when I started to learn R on my own now for my research, I realized that most of the videos and materials online are not compatible with the current version of R (version 4.2.0) that I am using. it rather frustrates my studies as I am not able to replicate the examples. In most cases too, I am not able to get the dataset used in online examples. Again, I find it difficult to run the tests for my models.
    Please, how can you help me?
    Thanks
    Samuel

    Reply
    • Chibuike Ngene Nnamani says

      July 1, 2020 at 6:14 am

      What type of analysis do you want to perform?

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

Free Webinars

Effect Size Statistics on Tuesday, Feb 2nd

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.