When we think about model assumptions, we tend to focus on assumptions like independence, normality, and constant variance. The other big assumption, which is harder to see or test, is that there is no specification error. The assumption of linearity is part of this, but it’s actually a bigger assumption.
What is this assumption of no specification error? (more…)
The practice of choosing predictors for a regression model, called model building, is an area of real craft.
There are many possible strategies and approaches and they all work well in some situations. Every one of them requires making a lot of decisions along the way. As you make decisions, one danger to look out for is overfitting—creating a model that is too complex for the the data. (more…)
When you’re model building, a key decision is which interaction terms to include. And which interactions to remove.
As a general rule, the default in regression is to leave them out. Add interactions only with a solid reason. It would seem like data fishing to simply add in all possible interactions.
And yet, that’s a common practice in most ANOVA models: put in all possible interactions and only take them out if there’s a solid reason. Even many software procedures default to creating interactions among categorical predictors.
One of the many decisions you have to make when model building is which form each predictor variable should take. One specific version of this decision is whether to combine categories of a categorical predictor.
The greater the number of parameter estimates in a model the greater the number of observations that are needed to keep power constant. The parameter estimates in a linear (more…)
One approach to model building is to use all predictors that make theoretical sense in the first model. For example, a first model for determining birth weight could include mother’s age, education, marital status, race, weight gain during pregnancy and gestation period.
The main effects of this model show that a mother’s education level and marital status are insignificant.
There is a bit of art and experience to model building. You need to build a model to answer your research question but how do you build a statistical model when there are no instructions in the box?
Should you start with all your predictors or look at each one separately? Do you always take out non-significant variables and do you always leave in significant ones?