• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Generalized Linear Models in R, Part 2: Understanding Model Fit in Logistic Regression Output

by guest contributer 3 Comments

by David Lillis, Ph.D.

In the last article, we saw how to create a simple Generalized Linear Model on binary data using the glm() command. We continue with the same glm on the mtcars data set (modeling the vs variable on the weight and engine displacement).

model <- glm(formula= vs ~ wt + disp, data=mtcars, family=binomial)
summary(model)
Call:
glm(formula = vs ~ wt + disp, family = binomial, data = mtcars)
Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.67506  -0.28444  -0.08401   0.57281   2.08234
Coefficients:
            Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  1.60859    2.43903   0.660    0.510
wt           1.62635    1.49068   1.091    0.275
disp        -0.03443    0.01536  -2.241    0.025 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.86 on 31 degrees of freedom
Residual deviance: 21.40 on 29 degrees of freedom
AIC: 27.4
Number of Fisher Scoring iterations: 6

We see that weight influences vs positively, while displacement has a slightly negative effect. We also see that the coefficient of weight is non-significant (p > 0.05), while the coefficient of displacement is significant. Later we will see how to investigate ways of improving our model.

In fact, the estimates (coefficients of the predictors weight and displacement) are now in units called logits. We will define the logit in a later blog.

Deviance

We see the word Deviance twice over in the model output. Deviance is a measure of goodness of fit of a generalized linear model. Or rather, it’s a measure of badness of fit–higher numbers indicate worse fit.

R reports two forms of deviance – the null deviance and the residual deviance. The null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean).

For our example, we have a value of 43.9 on 31 degrees of freedom. Including the independent variables (weight and displacement) decreased the deviance to 21.4 points on 29 degrees of freedom, a significant reduction in deviance.

The Residual Deviance has reduced by 22.46 with a loss of two degrees of freedom.

Fisher Scoring

What about the Fisher scoring algorithm? Fisher’s scoring algorithm is a derivative of Newton’s method for solving maximum likelihood problems numerically.

For model1 we see that Fisher’s Scoring Algorithm needed six iterations to perform the fit.

This doesn’t really tell you a lot that you need to know, other than the fact that the model did indeed converge, and had no trouble doing it.

Information Criteria

The Akaike Information Criterion (AIC) provides a method for assessing the quality of your model through comparison of related models.  It’s based on the Deviance, but penalizes you for making the model more complicated.  Much like adjusted R-squared, it’s intent is to prevent you from including irrelevant predictors.

However, unlike adjusted R-squared, the number itself is not meaningful. If you have more than one similar candidate models (where all of the variables of the simpler model occur in the more complex models), then you should select the model that has the smallest AIC.

So it’s useful for comparing models, but isn’t interpretable on its own.

Hosmer-Lemeshow Goodness of Fit

How well our model fits depends on the difference between the model and the observed data.  One approach for binary data is to implement a Hosmer Lemeshow goodness of fit test.

To implement this test, first install the ResourceSelection package, a follows.

install.packages("ResourceSelection")

Then load the package using the library() function. The test is available through the hoslem.test() function.

library(ResourceSelection)
hoslem.test(mtcars$vs, fitted(model))
Hosmer and Lemeshow goodness of fit (GOF) test
data: mtcars$vs, fitted(model)
X-squared = 6.4717, df = 8, p-value = 0.5945

Our model appears to fit well because we have no significant difference between the model and the observed data (i.e. the p-value is above 0.05).

As with all measures of model fit, we’ll use this as just one piece of information in deciding how well this model fits.  It doesn’t work well in very large or very small data sets, but is often useful nonetheless.

That wasn’t so hard! In our next article, we will plot our model.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

 

Bookmark and Share

Tagged With: AIC, Akaike Information Criterion, deviance, generalized linear models, GLM, Hosmer Lemeshow Goodness of Fit, logistic regression, R

Related Posts

  • Generalized Linear Models in R, Part 5: Graphs for Logistic Regression
  • Generalized Linear Models (GLMs) in R, Part 4: Options, Link Functions, and Interpretation
  • Generalized Linear Models in R, Part 3: Plotting Predicted Probabilities
  • Generalized Linear Models in R, Part 1: Calculating Predicted Probability in Binary Logistic Regression

Reader Interactions

Comments

  1. Nivedan I.B says

    August 3, 2017 at 12:47 pm

    Hello!
    You said, deviance is a measure of goodness of fit of a generalized linear model. What do you exactly mean by “fit”?

    Looking forward to hearing from you!

    Regards

    Nivedan I.B

    Reply
  2. Matt says

    December 14, 2016 at 12:29 pm

    Hi Mr. Lillis,

    your description of “deviance” helped me understanding it a bit better but one question is still coming up: how can I interpret the decrease from null deviance when adding independet variables (residual deviance)?

    Does it mean the model with indepedents fits better than the null model because of the lower value?

    Thank you for your response in advance

    Reply
    • chch says

      July 13, 2020 at 10:57 am

      difference b/w null deviance and residual deviance should be chi_squared distributed with 2 degrees of freedom in this case (df null – df model with more parameters)

      in this case it’s even highly significant (table below)

      https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Introduction to SPSS Software Tutorial

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT