
Multicollinearity is one of those terms in statistics that is often defined in one of two ways:
1. Very mathematical terms that make no sense — I mean, what is a linear combination anyway?
2. Completely oversimplified in order to avoid the mathematical terms — it’s a high correlation, right?
So what is it really? In English?
(more…)
I’ve written about this before–there is just something about statistics that makes people feel…well, not so smart.

This makes people v-e-r-y reluctant to ask questions.
This fact really struck me years and years ago. Hit me hard.
(more…)
It’s easy to think that if you just knew statistics better, data analysis wouldn’t be so hard.

It’s true that more statistical knowledge is always helpful. But I’ve found that statistical knowledge is only part of the story.
Another key part is developing data analysis skills. These skills apply to all analyses. It doesn’t matter which statistical method or software you’re using. So even if you never need any statistical analysis harder than a t-test, developing these skills will make your job easier.
(more…)
Multilevel models and Mixed Models are generally the same thing. In our recent webinar on the basics of mixed
models, Random Intercept and Random Slope Models, we had a number of questions about terminology that I’m going to answer here.
If you want to see the full recording of the webinar, get it here. It’s free.
Q: Is this different from multi-level modeling?
A: No. I don’t really know the history of why we have the different names, but the difference in multilevel modeling (more…)
What does it mean for two variables to be correlated?
Is that the same or different than if they’re associated or related?
This is the kind of question that can feel silly, but shouldn’t. It’s just a reflection of the confusing terminology used in statistics. In this case, the technical statistical term looks like, but is not exactly the same as, the way we mean it in everyday English. (more…)
When learning about linear models —that is, regression, ANOVA, and similar techniques—we are taught to calculate an R2. The R2 has the following useful properties:
- The range is limited to [0,1], so we can easily judge how relatively large it is.
- It is standardized, meaning its value does not depend on the scale of the variables involved in the analysis.
- The interpretation is pretty clear: It is the proportion of variability in the outcome that can be explained by the independent variables in the model.
The calculation of the R2 is also intuitive, once you understand the concepts of variance and prediction. (more…)