Regression models without intercepts

Stage 2A recent question on the Talkstats forum asked about dropping the intercept in a linear regression model since it makes the predictor’s coefficient stronger and more significant.  Dropping the intercept in a regression model forces the regression line to go through the origin–the y intercept must be 0.

The problem with dropping the intercept is if the slope is steeper just because you’re forcing the line through the origin, not because it fits the data better.  If the intercept really should be something else, you’re creating that steepness artificially.  A more significant model isn’t better if it’s inaccurate.

Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions


  1. Iwan Awaludin says

    Hi there, I am using regression in my final year project. I use it to predict energy consumption. let’s say that I have data from 1982 – 2006. I use data from 1982 to 1992 to find the coefficients and apply the obtained coefficients to data from 1993 to 2006. So it is like that I live in 1992 and try to predict 1993 until 2006 based on data from 1982 – 1992.
    The model itself is an ARX with 2 exogenous variables. When I use intercept, the model is less robust than without intercept. Meaning that, if I reduce the number of data (maybe from 1986 – 1992) the error of 1993 – 2006 with intercept becomes larger than without intercept.
    Any reference than I should read? Thank you.

    • admin says

      Iwan–I can’t think of a good reference, other than a good regression book. My favorite is Neter, Kutner, et al.

      It could all depend very well on how much data you have (one data point per year, or thousands?), how linear it is, and how close the intercept actually is to 0.

  2. admin says

    Mark-the F test that Andy is referring to is a test to compare the two models. It is NOT the same F test that will appear on either output.

    The idea is that because the no-intercept model is nested within the full model (nested b/c it contains only a subset of the parameters), you can test the fit of the model with an F test.

    To do so, this is the formula:

    F = [[SSE(R) – SSE(F)]/[df(R)-df(F)]/[SSE(F)/df(F)]

    Where (R) refers to values from the reduced model (with fewer parameters) and (F) refers to values from the full model

    citation: p. 80 of Neter, Kutner, Nachtsheim, & Wasserman’s Applied Linear Regression Models, 3rd Ed.

    It’s not as ugly as it seems if you write it out on paper. 🙂

  3. Mark says

    I am using LINEST in microsoft excel to do multiple independent variable regression analysis. I got the rational about why forcing the y-int to zero matches the data better, but when I do regression with the y-int my r^2 value is .99 but when running the equation the values are not matching within a decent standard deviation. This program returns an F value, how can I use this to give me input on my data sets strength?

  4. admin says

    Yes, exactly. My suggestion was to compare the fit of the two models. As you point out, since the models are nested, this is easily done with an F test.


  5. Andy says


    I’d want to see what happens when you compare the two nested models. Is the F significant?

    Making the slope estimate steeper wouldn’t be enough to make it a better fit as the residuals could well grow.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.