Link Functions and Errors in Logistic Regression

I recently held a free webinar in our The Craft of Statistical Analysis program about Binary, Ordinal, and Nominal Logistic Regression.

It was a record crowd and we didn’t get through everyone’s questions, so I’m answering some here on the site. They’re grouped by topic, and you will probably get more out of it if you watch the webinar recording. It’s free.

The following questions refer to this logistic regression model:

Q: So the link function is everything on the left of the equals sign?

Yes, exactly. The right side looks pretty much like every other regression equation you’ve seen.

But the left side has a link function instead of Y. Since P is  the conditional mean of Y, this ugly mess is simply a function of the mean. That’s the definition of a link function — a function of the mean of Y.

And although it looks ugly at first, it’s really not so bad once you learn more about logistic regression. It’s simply the natural log of the odds.

Q: What is the base of the log? e or 10 or maybe something else?

It’s e.

This is the natural log.

Q: Why isn’t there an error term in the logit model?

It’s because we’re only modeling the mean here, not each individual value of Y.

Logistic Regression is one type of Generalized Linear Model and they all have that same feature. Rather than model each value of Y with the predicted mean plus an error term, it simply models the predicted mean.

This is related to the two ways we can write a linear model. (See the next question, below). For generalized linear models, we can only use the second form AND we have to apply a link function to that predicted mean on the left.

Q:  Is there an error term in that form of the linear model also?

(This question refers to these two ways of writing the linear regression model, which we reviewed at the beginning of the webinar):

Only the first has an error term, because here we are writing a model for each of the i individual values of Y. You’ll notice that not only does the error term have a subscript i, but so do all the Xs and Y.

The second one doesn’t need an error term, because the left side of the equation is not each value of Y, but the mean of Y at given values of X. That’s the value on the regression line.

We needed the error term in the first equation to move us up or down from the regression line to get to the actual data point. Here we are just staying on the regression line.

That’s what we do in generalized linear models, like logistic regression. Just model the regression line, but we are unable at all to model individual points.

Binary, Ordinal, and Multinomial Logistic Regression for Categorical Outcomes
Get beyond the frustration of learning odds ratios, logit link functions, and proportional odds assumptions on your own. See the incredible usefulness of logistic regression and categorical data analysis in this one-hour training.

Reader Interactions


  1. Axf says

    Thanks , that’s a really useful post. I have a question – does this mean that glm are not useful for predictive modelling? I want to predict some data that looks like it fits a gamma distribution, so I thought I’d use a glm with gamma link. But if “we are unable at all to model individual points”, does that mean glm is of no use here? How can I add an error to the predicted mean?

    • Karen Grace-Martin says

      HI Axf,

      No, it doesn’t. In any regression model, the predictions are always about the conditional mean, not the individual points. The errors are there, but considered a nuisance term, not what we’re interested in.

  2. Geraud N. says

    Nice presentation. But is there any endogeneity in logistic regression? if yes how to address it?

    • Karen Grace-Martin says

      Hi Geraud,

      Endogeneity is about the inference, not about the model. So yes, it’s possible in logistic regression. In my experience, this is an issue only for econometricians due to the types of inferences they’re trying to make based on the types of data collection methods, so I’m not sure how they address it. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.