Model Building–choosing predictors–is one of those skills in statistics that is difficult to teach. It’s hard to lay out the steps, because at each step, you have to evaluate the situation and make decisions on the next step.
If you’re running purely predictive models, and the relationships among the variables aren’t the focus, it’s much easier. Go ahead and run a stepwise regression model. Let the data give you the best prediction.
But if the point is to answer a research question that describes relationships, you’re going to have to get your hands dirty.
It’s easy to say “use theory” or “test your research question” but that ignores a lot of practical issues. Like the fact that you may have 10 different variables that all measure the same theoretical construct, and it’s not clear which one to use. (more…)
I was recently asked about whether it’s okay to treat a likert scale as continuous as a predictor in a regression model. Here’s my reply. In the question, the researcher asked about logistic regression, but the same answer applies to all regression models.
1. There is a difference between a likert scale item (a single 1-7 scale, eg.) and a full likert scale , which is composed of multiple items. If it is a full likert scale, with a combination of multiple items, go ahead and treat it as numerical. (more…)
I often hear concern about the non-normal distributions of independent variables in regression models, and I am here to ease your mind.
There are NO assumptions in any linear model about the distribution of the independent variables. Yes, you only get meaningful parameter estimates from nominal (unordered categories) or numerical (continuous or discrete) independent variables. But no, the model makes no assumptions about them. They do not need to be normally distributed or continuous.
It is useful, however, to understand the distribution of predictor variables to find influential outliers or concentrated values. A highly skewed independent variable may be made more symmetric with a transformation.
Multicollinearity occurs when two or more predictor variables in a regression model are redundant. It is a real problem, and it can do terrible things to your results. However, the dangers of multicollinearity seem to have been so drummed into students’ minds that it created a panic.
True multicolllinearity (the kind that messes things up) is pretty uncommon. High correlations among predictor variables may indicate multicollinearity, but it is NOT a reliable indicator that it exists. It does not necessarily indicate a problem. How high is too high depends on (more…)