regression models

Why ANOVA is Really a Linear Regression, Despite the Difference in Notation

April 23rd, 2018 by

When I was in graduate school, stat professors would say “ANOVA is just a special case of linear regression.”  But they never explained why.Stage 2

And I couldn’t figure it out.

The model notation is different.

The output looks different.

The vocabulary is different.

The focus of what we’re testing is completely different. How can they be the same model?

(more…)


Can We Use PCA for Reducing Both Predictors and Response Variables?

January 20th, 2017 by

I recently gave a free webinar on Principal Component Analysis. We had almost 300 researchers attend and didn’t get through all the questions. This is part of a series of answers to those questions.

If you missed it, you can get the webinar recording here.

Question: Can we use PCA for reducing both predictors and response variables?

In fact, there were a few related but separate questions about using and interpreting the resulting component scores, so I’ll answer them together here.

How could you use the component scores?

A lot of times PCAs are used for further analysis — say, regression. How can we interpret the results of regression?

Let’s say I would like to interpret my regression results in terms of original data, but they are hiding under PCAs. What is the best interpretation that we can do in this case?

Answer:

So yes, the point of PCA is to reduce variables — create an index score variable that is an optimally weighted combination of a group of correlated variables.

And yes, you can use this index variable as either a predictor or response variable.

It is often used as a solution for multicollinearity among predictor variables in a regression model. Rather than include multiple correlated predictors, none of which is significant, if you can combine them using PCA, then use that.

It’s also used as a solution to avoid inflated familywise Type I error caused by running the same analysis on multiple correlated outcome variables. Combine the correlated outcomes using PCA, then use that as the single outcome variable. (This is, incidentally, what MANOVA does).

In both cases, you can no longer interpret the individual variables.

You may want to, but you can’t. (more…)


Member Training: The LASSO Regression Model

November 1st, 2016 by

The LASSO model (Least Absolute Shrinkage and Selection Operator) is a recent development that allows you to find a good fitting model in the regression context. It avoids many of the problems of overfitting that plague other model-building approaches.

In this Statistically Speaking Training, guest instructor Steve Simon, PhD, explains what overfitting is — and why it’s a problem.

Then he illustrates the geometry of the LASSO model in comparison to other regression approaches, ridge regression and stepwise variable selection.

Finally, he shows you how LASSO regression works with a real data set.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)


Member Training: Multicollinearity

March 1st, 2014 by

Multicollinearity isn’t an assumption of regression models; it’s a data issue.

And while it can be seriously problematic, more often it’s just a nuisance.

In this webinar, we’ll discuss:

  • What multicollinearity is and isn’t
  • What it does to your model and estimates
  • How to detect it
  • What to do about it, depending on how serious it is

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)


7 Practical Guidelines for Accurate Statistical Model Building

June 24th, 2011 by

Stage 2Model  Building–choosing predictors–is one of those skills in statistics that is difficult to teach.   It’s hard to lay out the steps, because at each step, you have to evaluate the situation and make decisions on the next step.

If you’re running purely predictive models, and the relationships among the variables aren’t the focus, it’s much easier.  Go ahead and run a stepwise regression model.  Let the data give you the best prediction.

But if the point is to answer a research question that describes relationships, you’re going to have to get your hands dirty.

It’s easy to say “use theory” or “test your research question” but that ignores a lot of practical issues.  Like the fact that you may have 10 different variables that all measure the same theoretical construct, and it’s not clear which one to use. (more…)


The Steps for Running any Statistical Model

September 10th, 2009 by

No matter what statistical model you’re running, you need to go through the same steps.  The order and the specifics of how you do each step will differ depending on the data and the type of model you use.

These steps are in 4 phases.  Most people think of only the third as modeling.  But the phases before this one are fundamental to making the modeling go well. It will be much, much easier, more accurate, and more efficient if you don’t skip them.

And there is no point in running the model if you skip phase 4.

If you think of them all as part of the analysis, the modeling process will be faster, easier, and make more sense.

Phase 1: Define and Design

In the first 5 steps, the object is clarity. You want to make everything as clear as possible to yourself. The more clear things are at this point, the smoother everything will be. (more…)