We’ve looked at the interaction effect between two categorical variables. Now let’s make things a little more interesting, shall we?
What if our predictors of interest, say, are a categorical and a continuous variable? How do we interpret the interaction between the two? (more…)
One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.
For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.
Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the (more…)
When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why.
And I couldn’t figure it out.
The model notation is different.
The output looks different.
The vocabulary is different.
The focus of what we’re testing is completely different. How can they be the same model?
(more…)
Have you ever heard that “2 tall parents will have shorter children”?
This phenomenon, known as regression to the mean, has been used to explain everything from patterns in hereditary stature (as Galton first did in 1886) to why movie sequels or sophomore albums so often flop.
So just what is regression to the mean (RTM)? (more…)
Most of the p-values we calculate are based on an assumption that our test statistic meets some distribution. These distributions are generally a good way to calculate p-values as long as assumptions are met.
But it’s not the only way to calculate a p-value.
Rather than come up with a theoretical probability based on a distribution, exact tests calculate a p-value empirically.
The simplest (and most common) exact test is a Fisher’s exact for a 2×2 table.
Remember calculating empirical probabilities from your intro stats course? All those red and white balls in urns? (more…)
How do you choose between Poisson and negative binomial models for discrete count outcomes?
One key criterion is the relative value of the variance to the mean after accounting for the effect of the predictors. A previous article discussed the concept of a variance that is larger than the model assumes: overdispersion.
(Underdispersion is also possible, but much less common).
There are two ways to check for overdispersion: (more…)