regression models

Eight Ways to Detect Multicollinearity

February 25th, 2019 by Karen Grace-Martin

Multicollinearity can affect any regression model with more than one predictor. It occurs when two or more predictor variables overlap so much in what they measure that their effects are indistinguishable.

When the model tries to estimate their unique effects, it goes wonky (yes, that’s a technical term).

So for example, you may be interested in understanding the separate effects of altitude and temperature on the growth of a certain species of mountain tree.

(more…)

15 comments

Parametric or Semi-Parametric Models in Survival Analysis?

August 13th, 2018 by guest contributer

It was Casey Stengel who offered the sage advice, “If you come to a fork in the road, take it.”

When you need to fit a regression model to survival data, you have to take a fork in the road. One road asks you to make a distributional assumption about your data and the other does not. (more…)

No comments yet

Why ANOVA is Really a Linear Regression, Despite the Difference in Notation

April 23rd, 2018 by Karen Grace-Martin

When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.” But they never explained why.

And I couldn’t figure it out.

The model notation is different.

The output looks different.

The vocabulary is different.

The focus of what we’re testing is completely different. How can they be the same model?

(more…)

3 comments

Can We Use PCA for Reducing Both Predictors and Response Variables?

January 20th, 2017 by Karen Grace-Martin

I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.

If you missed it, you can get the webinar recording here.

Question: Can we use PCA for reducing both predictors and response variables?

In fact, there were a few related but separate questions about using and interpreting the resulting component scores, so I’ll answer them together here.

How could you use the component scores?

A lot of times PCAs are used for further analysis — say, regression. How can we interpret the results of regression?

Let’s say I would like to interpret my regression results in terms of original data, but they are hiding under PCAs. What is the best interpretation that we can do in this case?

Answer:

So yes, the point of PCA is to reduce variables — create an index score variable that is an optimally weighted combination of a group of correlated variables.

And yes, you can use this index variable as either a predictor or response variable.

It is often used as a solution for multicollinearity among predictor variables in a regression model. Rather than include multiple correlated predictors, none of which is significant, if you can combine them using PCA, then use that.

It’s also used as a solution to avoid inflated familywise Type I error caused by running the same analysis on multiple correlated outcome variables. Combine the correlated outcomes using PCA, then use that as the single outcome variable. (This is, incidentally, what MANOVA does).

In both cases, you can no longer interpret the individual variables.

You may want to, but you can’t. (more…)

5 comments

Member Training: The LASSO Regression Model

November 1st, 2016 by guest contributer

The LASSO model (Least Absolute Shrinkage and Selection Operator) is a recent development that allows you to find a good fitting model in the regression context. It avoids many of the problems of overfitting that plague other model-building approaches.

In this Statistically Speaking Training, guest instructor Steve Simon, PhD, explains what overfitting is — and why it’s a problem.

Then he illustrates the geometry of the LASSO model in comparison to other regression approaches, ridge regression and stepwise variable selection.

Finally, he shows you how LASSO regression works with a real data set.

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)

No comments yet

Member Training: Multicollinearity

March 1st, 2014 by Karen Grace-Martin

Multicollinearity isn’t an assumption of regression models; it’s a data issue.

And while it can be seriously problematic, more often it’s just a nuisance.

In this webinar, we’ll discuss:

What multicollinearity is and isn’t
What it does to your model and estimates
How to detect it
What to do about it, depending on how serious it is

(more…)

No comments yet