Regression models

Understanding Interactions Between Categorical and Continuous Variables in Linear Regression

May 14th, 2018 by

We’ve looked at the interaction effect between two categorical variables. Now let’s make things a little more interesting, shall we?

What if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two? (more…)


When to Use Logistic Regression for Percentages and Counts

April 30th, 2018 by

One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.

For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.

Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the (more…)


Why ANOVA is Really a Linear Regression, Despite the Difference in Notation

April 23rd, 2018 by

When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.”  But they never explained why.Stage 2

And I couldn’t figure it out.

The model notation is different.

The output looks different.

The vocabulary is different.

The focus of what we’re testing is completely different. How can they be the same model?

(more…)


Poisson or Negative Binomial? Using Count Model Diagnostics to Select a Model

March 19th, 2018 by

How do you choose between Poisson and negative binomial models for discrete count outcomes?

One key criterion is the relative value of the variance to the mean after accounting for the effect of the predictors. A previous article discussed the concept of a variance that is larger than the model assumes: overdispersion.

(Underdispersion is also possible, but much less common).

There are two ways to check for overdispersion: (more…)


Link Functions and Errors in Logistic Regression

March 14th, 2018 by

I recently held a free webinar in our The Craft of Statistical Analysis program about Binary, Ordinal, and Nominal Logistic Regression.

It was a record crowd and we didn’t get through everyone’s questions, so I’m answering some here on the site. They’re grouped by topic, and you will probably get more out of it if you watch the webinar recording. It’s free.

The following questions refer to this logistic regression model: (more…)


Getting Accurate Predicted Counts When There Are No Zeros in the Data

March 12th, 2018 by

We previously examined why a linear regression and negative binomial regression were not viable models for predicting the expected length of stay in the hospital for people with the flu.  A linear regression model was not appropriate because our outcome variable, length of stay, was discrete and not continuous.

A negative binomial model wasn’t the proper choice because the minimum length of stay is not zero. The minimum length of stay is one day. Negative binomial and Poisson models can only be used on data where the observations’ outcome have the possibility of having a zero count.

We need to use a truncated negative binomial model to analyze the expected length of stay of people admitted to the hospital who have the flu. Calculating the expected length of stay is an easy task once we create our model. (more…)