• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

What is Degrees of Freedom?

by Jeff Meyer 1 Comment

No, degrees of freedom is not “having one foot out the door”!

Definitions are rarely very good at explaining the meaning of something. At least not in statistics. Degrees of freedom: “the number of independent values or quantities which can be assigned to a statistical distribution”.

This is no exception.

Let’s dig into an example to show you what degrees of freedom (df) really are.

We will use linear regression output to explain. Our outcome variable is BMI (body mass index). 

Degrees of Freedom with one Parameter Estimate

The starting point for understanding degrees of freedom is the total number of observations in the model. This model has 303 observations, shown in the top right corner. In any given sample, if we haven’t used it yet to calculate anything, every observation is free to vary. So we start with 303 df.

But once we use these observations to estimate a parameter the degrees of freedom change.

A model run with no predictors, the empty model, provides one estimated parameter value, the intercept (here labeled _cons).  The intercept in this model is just the mean of the outcome variable, BMI.  

Note that the “Residual” df and the “Total” df are both 302. The empty model has n-1 df, where  n = number of observations.

Why does the empty model have n-1 df and not n?

Once we calculate the mean of a series of numbers, we’ve restricted one of the observations. In other words, if I tell you the sample mean and I tell you the value of 302 of the observations, you can tell me with 100% certainty what the value is of the 303rd observation.

It’s like a (really bad) statistical card trick. You’ll always know the value of the last observation in the sample, once you know the mean and the other 302 observations.

The way we think of it statistically: there are no restrictions on the value of those numbers except for one of them.

There are no restrictions as to how the “other” subjects’ BMI can vary. Knowing the mean of BMI, the final subject’s BMI cannot vary.

Here is the mathematical equation:

In terms of our model above, 302 observations can vary, one cannot. Our empty model has 302 degrees of freedom.

Degrees of Freedom with more Parameter Estimates

What happens when we include a categorical predictor for body frame which has three categories: small, medium and large?

The “Total” number of degrees of freedom remains at n-1, 302. “Model” has been added as a “Source”. Its degrees of freedom is 2. Why?

Because we’ve added two new parameter estimates to the model—the regression coefficients for medium and large. The intercept (_cons) represents the mean value of BMI for the reference group, small frame. Medium frame is estimated to be 5.31 greater than small frame, 30.43.  Large frame is estimated to be 8.01 greater than small frame, 33.13.

How do these estimates impact the degrees of freedom?
We use the same mathematical logic here as we did for the empty model.

The Calculations

If we know the mean of BMI for small frame, all but one small frame individual’s observed value can vary.

If we know the mean of BMI for medium frame, all but one medium frame individual’s observed value can vary.

If we know the mean of BMI for large frame, all but one large frame individual’s observed value can vary.

We know the “Total” degrees of freedom equal n-1 as a result of calculating the intercept (mean for small frame individuals). One medium frame observation is no longer free to vary since we know the mean BMI for medium frame observations. The same is true for large frame individuals.

Our model has used a total of 2 degrees of freedom for the additional two mean values estimated. That is why “Model” has 2 df. The residual df represents the number of observations whose BMI can still vary. To calculate the residual’s df we simply subtract the “Model” df from the “Total” df.

Each time we add predictors to the model we add parameters to estimate, so are increasing the “Model” df. If the predictor is continuous, we are adding one df to the “Model” df. If the predictor is categorical, we are adding the number of categories minus one.

In other words, each parameter estimate summarizes the values of the sample observations. With each new summary, one fewer sample observation is free to have any value. So we specify the number of new estimates in the model df. And we specify what’s left over in residual df.

Tagged With: degree of freedom, distribution, Intercept, parameter estimates, predictor variable

Related Posts

  • Member Training: Reporting Structural Equation Modeling Results
  • Member Training: Quantile Regression: Going Beyond the Mean
  • Should You Always Center a Predictor on the Mean?
  • The Distribution of Independent Variables in Regression Models

Reader Interactions

Comments

  1. Daniel Zhang says

    March 26, 2022 at 12:49 am

    Thanks for the clear & intuitive explanation on d.f. Two thumbs up 🙂

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT