OptinMon

The Difference Between Association and Correlation

September 10th, 2019 by

What does it mean for two variables to be correlated?

Is that the same or different than if they’re associated or related?

This is the kind of question that can feel silly, but shouldn’t. It’s just a reflection of the confusing terminology used in statistics. In this case, the technical statistical term looks like, but is not exactly the same as, the way we mean it in everyday English. (more…)


R-Squared for Mixed Effects Models

August 21st, 2019 by

When learning about linear models —that is, regression, ANOVA, and similar techniques—we are taught to calculate an R2. The R2 has the following useful properties:

  • The range is limited to [0,1], so we can easily judge how relatively large it is.
  • It is standardized, meaning its value does not depend on the scale of the variables involved in the analysis.
  • The interpretation is pretty clear: It is the proportion of variability in the outcome that can be explained by the independent variables in the model.

The calculation of the R2 is also intuitive, once you understand the concepts of variance and prediction. (more…)


How Confident Are You About Confidence Intervals?

August 12th, 2019 by

Any time you report estimates of parameters in a statistical analysis, it’s important to include their confidence intervals.

How confident are you that you can explain what they mean? Even those of us who have a solid understand of confidence intervals get tripped up by the wording.

The Wording for Describing Confidence Intervals

Let’s look at an example. (more…)


Linear Regression for an Outcome Variable with Boundaries

July 22nd, 2019 by

The following statement might surprise you, but it’s true.

To run a linear model, you don’t need an outcome variable Y that’s normally distributed. Instead, you need a dependent variable that is:

  • Continuous
  • Unbounded
  • Measured on an interval or ratio scale

The normality assumption is about the errors in the model, which have the same distribution as Y|X. It’s absolutely possible to have a skewed distribution of Y and a normal distribution of errors because of the effect of X. (more…)


How to Reduce the Number of Variables to Analyze

July 10th, 2019 by

by Christos Giannoulis

Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.

It is easy to generate a vast array of poor ‘results’ by throwing everything into your software and waiting to see what turns up. (more…)


Confusing Statistical Terms #11: Confounder

June 26th, 2019 by

What is a Confounder?

Confounder (also called confounding variable) is one of those statistical terms that confuses a lot of people. Not because it represents a confusing concept, but because of how it’s used.

(Well, it’s a bit of a confusing concept, but that’s not the worst part).

It has slightly different meanings to different types of researchers. The definition is essentially the same, but the research context can have specific implications for how that definition plays out.

If the person you’re talking to has a different understanding of what it means, you’re going to have a confusing conversation.

Let’s take a look at some examples to unpack this.

(more…)