One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.

For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.

Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the boundaries at 0 and 100.

Likewise, counts have a boundary at 0 and are discrete, not continuous. The general advice is to analyze these with some variety of a Poisson model.

Yet there is a very specific type of variable that can be considered either a count or a percentage, but has its own specific distribution.

Read the full article →
When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why. And I couldn’t figure it out. The model notation is different. The output looks different. The vocabulary is different. The focus of what we’re testing is completely different. How can they […]

Read the full article →