Some terms mean one thing in the English language, but have another (usually more specific) meaning in statistics. [Read more…] about Member Training: Confusing Statistical Terms
by Christos Giannoulis
Many data sets contain well over a thousand variables. Such complexity, the speed of contemporary desktop computers, and the ease of use of statistical analysis packages can encourage ill-directed analysis.
It is easy to generate a vast array of poor ‘results’ by throwing everything into your software and waiting to see what turns up. [Read more…] about How to Reduce the Number of Variables to Analyze
One important yet difficult skill in statistics is choosing a type model for different data situations. One key consideration is the dependent variable.
For linear models, the dependent variable doesn’t have to be normally distributed, but it does have to be continuous, unbounded, and measured on an interval or ratio scale.
Percentages don’t fit these criteria. Yes, they’re continuous and ratio scale. The issue is the [Read more…] about When to Use Logistic Regression for Percentages and Counts
by Jeff Meyer
We often have a continuous predictor in a model that we believe has non-constant relationship with the dependent variable along the predictor’s range. But how can we be certain? What is the best way to measure this?
Sometimes including a quadratic term will capture the change in the slope as we move from the bottom of the range to the top of the range. But a quadratic term only works in two situations:
- The rate of change increases and then at some point decreases, or:
- The opposite happens – the rate of change decreases and at some point increases.
We could also create a categorical variable. Each category within the categorical variable would represent a specific range within the continuous variable. [Read more…] about Segmented Regression for Non-Constant Relationships
When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, your model will not meet the assumptions of linear models.
Today I’m going to go into more detail about 6 common types of dependent variables that are not continuous, unbounded, and measured on an interval or ratio scale and the tests that work instead.
Side note: the usual advice is to use nonparametric tests when normality [Read more…] about When Dependent Variables Are Not Fit for Linear Models, Now What?
The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures. That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.
But you need to check the assumptions anyway, because some departures are so far that the p-value become inaccurate. And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.
But sometimes you can’t.
Sometimes it’s because the dependent variable just isn’t appropriate for a linear model. The [Read more…] about 6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption