When Linear Models Don’t Fit Your Data, Now What?

June 20th, 2022 by

When your dependent variable is not continuous, unbounded, and measured on an interval or ratio scale, linear models don’t fit. The data just will not meet the assumptions of linear models. But there’s good news, other models exist for many types of dependent variables.

Today I’m going to go into more detail about 6 common types of dependent variables that are either discrete, bounded, or measured on a nominal or ordinal scale and the tests that work for them instead. Some are all of these.


Member Training: Interpretation of Effect Size Statistics

August 30th, 2019 by

Effect size statistics are required by most journals and committees these days ⁠— for good reason. 

They communicate just how big the effects are in your statistical results ⁠— something p-values can’t do.

But they’re only useful if you can choose the most appropriate one and if you can interpret it.

This can be hard in even simple statistical tests. But once you get into  complicated models, it’s a whole new story. (more…)

Member Training: Equivalence Tests and Non-Inferiority

April 2nd, 2018 by
Statistics is, to a large extent, a science of comparison. You are trying to test whether one group is bigger, faster, or smarter than another.
You do this by setting up a null hypothesis that your two groups have equal means or proportions and an alternative hypothesis that one group is “better” than the other. The test has interesting results only when the data you collect ends up rejecting the null hypothesis.
But there are times when the interesting research question you’re asking is not about whether one group is better than the other, but whether the two groups are equivalent.


How to do a Chi-square test when you only have proportions and denominators

March 18th, 2011 by

by Annette Gerritsen, Ph.D.

In an earlier article I discussed how to do a cross-tabulation in SPSS. But what if you do not have a data set with the values of the two variables of interest?

For example, if you do a critical appraisal of a published study and only have proportions and denominators.

In this article it will be demonstrated how SPSS can come up with a cross table and do a Chi-square test in both situations. And you will see that the results are exactly the same.

‘Normal’ dataset

If you want to test if there is an association between two nominal variables, you do a Chi-square test.

In SPSS you just indicate that one variable (the independent one) should come in the row, (more…)

6 Types of Dependent Variables that will Never Meet the Linear Model Normality Assumption

September 17th, 2009 by

The assumptions of normality and constant variance in a linear model (both OLS regression and ANOVA) are quite robust to departures.  That means that even if the assumptions aren’t met perfectly, the resulting p-values will still be reasonable estimates.

But you need to check the assumptions anyway, because some departures are so far that the p-value become inaccurate.  And in many cases there are remedial measures you can take to turn non-normal residuals into normal ones.

But sometimes you can’t.

Sometimes it’s because the dependent variable just isn’t appropriate for a linear model.  The (more…)

Proportions as Dependent Variable in Regression–Which Type of Model?

January 26th, 2009 by

When the dependent variable in a regression model is a proportion or a percentage, it can be tricky to decide on the appropriate way to model it.

The big problem with ordinary linear regression is that the model can predict values that aren’t possible–values below 0 or above 1.  But the other problem is that the relationship isn’t linear–it’s sigmoidal.  A sigmoidal curve looks like a flattened S–linear in the middle, but flattened on the ends.  So now what?

The simplest approach is to do a linear regression anyway.  This approach can be justified only in a few situations.

1. All your data fall in the middle, linear section of the curve.  This generally translates to all your data being between .2 and .8 (although I’ve heard that between .3-.7 is better).  If this holds, you don’t have to worry about the two objections.  You do have a linear relationship, and you won’t get predicted values much beyond those values–certainly not beyond 0 or 1.

2. It is a really complicated model that would be much harder to model another way.  If you can assume a linear model, it will be much easier to do, say, a complicated mixed model or a structural equation model.  If it’s just a single multiple regression, however, you should look into one of the other methods.

A second approach is to treat the proportion as a binary response then run a logistic or probit regression.  This will only work if the proportion can be thought of and you have the data for the number of successes and the total number of trials.  For example, the proportion of land area covered with a certain species of plant would be hard to think of this way, but the proportion of correct answers on a 20-answer assessment would.

The third approach is to treat it the proportion as a censored continuous variable.  The censoring means that you don’t have information below 0 or above 1.  For example, perhaps the plant would spread even more if it hadn’t run out of land.  If you take this approach, you would run the model as a two-limit tobit model (Long, 1997).  This approach works best if there isn’t an excessive amount of censoring (values of 0 and 1).

Reference: Long, J.S. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage Publishing.