Linear Regression

What It Really Means to Remove an Interaction From a Model

September 17th, 2020 by

When you’re model building, a key decision is which interaction terms to include. And which interactions to remove.Stage 2

As a general rule, the default in regression is to leave them out. Add interactions only with a solid reason. It would seem like data fishing to simply add in all possible interactions.

And yet, that’s a common practice in most ANOVA models: put in all possible interactions and only take them out if there’s a solid reason. Even many software procedures default to creating interactions among categorical predictors.

(more…)


Simplifying a Categorical Predictor in Regression Models

January 14th, 2020 by

One of the many decisions you have to make when model building is which form each predictor variable should take. One specific version of thisStage 2 decision is whether to combine categories of a categorical predictor.

The greater the number of parameter estimates in a model the greater the number of observations that are needed to keep power constant. The parameter estimates in a linear (more…)


What is Multicollinearity? A Visual Description

November 20th, 2019 by

Multicollinearity is one of those terms in statistics that is often defined in one of two ways:

1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?

2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?

So what is it really? In English?

(more…)


Member Training: Practical Advice for Establishing Reliability and Validity

October 30th, 2019 by

How do you know your variables are measuring what you think they are? And how do you know they’re doing it well?

A key part of answering these questions is establishing reliability and validity of the measurements that you use in your research study. But the process of establishing reliability and validity is confusing. There are a dizzying number of choices available to you.

(more…)


Linear Regression for an Outcome Variable with Boundaries

July 22nd, 2019 by

The following statement might surprise you, but it’s true.

To run a linear model, you don’t need an outcome variable Y that’s normally distributed. Instead, you need a dependent variable that is:

  • Continuous
  • Unbounded
  • Measured on an interval or ratio scale

The normality assumption is about the errors in the model, which have the same distribution as Y|X. It’s absolutely possible to have a skewed distribution of Y and a normal distribution of errors because of the effect of X. (more…)


Confusing Statistical Terms #11: Confounder

June 26th, 2019 by

What is a Confounder?

Confounder (also called confounding variable) is one of those statistical terms that confuses a lot of people. Not because it represents a confusing concept, but because of how it’s used.

(Well, it’s a bit of a confusing concept, but that’s not the worst part).

It has slightly different meanings to different types of researchers. The definition is essentially the same, but the research context can have specific implications for how that definition plays out.

If the person you’re talking to has a different understanding of what it means, you’re going to have a confusing conversation.

Let’s take a look at some examples to unpack this.

(more…)