Yesterday I gave a little quiz about interpreting regression coefficients. Today I’m giving you the answers.
If you want to try it yourself before you see the answers, go here. (It’s truly little, but if you’re like me, you just cannot resist testing yourself).
True or False?
1. When you add an interaction to a regression model, you can still evaluate the main effects of the terms that make up the interaction, just like in ANOVA.
In an ANOVA table (even the one in the regression output), categorical variables are Effect Coded. Because of that, the main effects remain main effects, and are evaluated independent of interactions.
But in the Regression Coefficients table, unless you are explicitly effect coding, they will be Dummy Coded. The coefficient for what looks like a main effect IS NOT a main effect. It’s a marginal effect–the effect of that predictor ONLY when the other predictor in the interaction =0! I kid you not.
You can get a little more info in this post or a lot more in this video or a whole lot more in the workshop.
2. The intercept is usually meaningless in a regression model.
This statement is only true if all predictors are continuous and the data don’t contain 0. If continuous predictors are centered and/or if there are dummy variables in the model, the intercept is meaningful and important.
3. In Analysis of Covariance, the covariate is a nuisance variable, and the real point of the analysis is to evaluate the means after controlling for the covariate.
It can be true, but it doesn’t have to be. Covariates are often important predictors that just happen to be observed and continuous. The only way to evaluate them is to examine their coefficients.
4. Standardized regression coefficients are meaningful for dummy-coded predictors.
This one is never ever true. Just because your software lets you get away with it doesn’t mean it’s meaningful.
5. The only way to evaluate an interaction between two independent variables is to categorize one or both of them.
Sure, it’s tricky to interpret interactions between two continuous variables, but by no means is it impossible or theoretically incorrect. (And centering really helps).
How did you do? (BTW, it took me years of figuring all this stuff out in a way that was really intuitive, even after many stats classes).
A question, or maybe point of clarification, on number four. By standardized regression coefficients, does this mean both predictors and responses? Couldn’t a dummy or dichotomiualy coded response variable for two groups (say, female =1, male = 0) be meaningfully regressed on a standardized response variable? Read then as being female having [beta coefficmet] standard deviation effect? The dummy predictor variable in this case is not standardized.
Karen Grace-Martin says
Essentially yes. There is a shortcut to take a regular regression coefficient and divide it by the standard deviations of both X and Y to produce what is called a “standardized regression coefficient.” It is interpreted as the number of standard deviation difference in Y, on average, associated with a one standard deviation difference in X.
You’re correct that if you’re doing the standardizing yourself of numerical variables, it’s not a problem. But if you’re using the software-created standardized regression coefficients, it’s not leaving out the dummy coded predictors. It’s standardizing them as well. So that standardized coefficient is for each one standard deviation difference in Female, which is nonsensical.
Bob Nau says
I enjoy reading your newsletters, and normally I think you are spot-on, but I have to disagree a little bit with one of your points in today’s post, namely that it is false that “the intercept is usually meaningless in a regression model.” Of course, a lot depends on the definition of “usually” or “meaningless”, but I usually don’t find the intercept to be worth contemplating, and I would even take mild exception to the counterexamples you mentioned. If the independent variables are centered, then the intercept is just the mean of the dependent variable, which is not news (or is that your point?). And if some but not all of the variables are dummies, a common situation is to have them for n-1 out of n mutually exclusive conditions of some kind or other. In that case the value of the intercept and its t-stat and P-value depend on the arbitrary choice of which one to leave out, which limits its meaning. And in general, if you apply a linear transformation with a non-zero constant to any one of the independent variables (say, Fahrenheit to Celsius), this does not affect the logic of the model, and it does not affect the coefficients of other variables or the significance of any of them, but it does affect the intercept and its apparent significance. “Usually” the role of the intercept is to just raise or lower the regression line or plane so that it passes through the center of mass of all the variables, and this is the most important issue to stress in regard to it, or at least that is what I do with my own students. Of course, if it is possible for all the independent variables to simultaneously be zero, and this has not been forced by centering, then the constant is nontrivially the predicted value of the dependent variable in that case, but it is not the usual case (by my standard, anyway). By the way, you might want to a look at the latest version of my free Excel add-in (see regressit.com). It now runs on both PC’s and Mac’s, there is a highly interactive logistic regression version for the PC (good for demonstrating logistic curves, the ROC curve, and cutoff values), all of them have a very novel ribbon interface for navigating the output worksheets and model space (also good for interactive demonstrations), and as of last week they all have a 2-way interface with R that allows R to be used as a back end for running linear or logistic regression models in Excel with a lot of bells and whistles, as well as generating customized table and chart output in RStudio. This does not require users to know anything about R, although it allows them to do some playing around in RStudio without typing code. I would be interested to get your opinion, and it is something you might find useful as a demonstration tool or something to use in a live workshop. Thanks again for your public broadcasts!
Karen Grace-Martin says
Yes, I think you’re right that it comes down to what you consider usual or meaningless. I find very few models in research where there are no categorical predictors or interactions, where the means are important to understand.
And as I think I’ve mentioned before, I can’t recommend people use excel for statistics, even with add-ins. Excel is not set up for statistical analysis or data management. Reproducibility is too important. It might be fine for playing around with data (I use it for that), but not for anything that will be published.
Carlos Camacho says
In the multilevel analysis, if you study hours and mathematics in different schools, the intercept indicates the level in mathematics when the study hours are zero. That can tell us the family background.