R² is such a lovely statistic, isn’t it?  Unlike so many of the others, it makes sense–the percentage of variance in Y accounted for by a model.

I mean, you can actually understand that.  So can your grandmother.  And the clinical audience you’re writing the report for.

A big is always good and a small one is always bad, right?

Well, maybe. [click to continue…]

{ 4 comments }

Sample Size Estimates for Multilevel Randomized Trials

But there are many design issues that affect power in a study that go way beyond a z-test. Like:

repeated measures
clustering of individuals
blocking
including covariates in a model

Regular sample size software can accommodate some of these issues, but not all. And there is just something wonderful about finding a tool that does just what you need it to.

Especially when it’s free.

Read the full article →

Confusing Statistical Term #6: Factor

Factor is tricky much in the same way as hierarchical and beta, because it too has different meanings in different contexts. Factor might be a little worse, though, because its meanings are related.

In both meanings, a factor is a variable. But a factor has a completely different meaning and implications for use in two different contexts.

Factor analysis

In factor analysis, a factor is an unmeasured, latent variable, that expresses itself through its relationship with other measured variables.

Read the full article →

Five Extensions of the General Linear Model

Generalized linear models, linear mixed models, generalized linear mixed models, marginal models, GEE models. You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the general linear model.

This is true, and they extend our old friend in different ways, particularly in regard to the measurement level of the dependent variable and the independence of the measurements. So while the names are similar (and confusing), the distinctions are important.

Read the full article →

When to leave insignificant effects in a model

You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. The bigger effect [...]

Read the full article →

Data Mining Webinar with Peter Bruce, President, Statistics.com

Data Mining methods lie at the center of the constellation of techniques under the umbrella of “business analytics.”  These techniques deal with analysis of large existing datasets (as opposed to controlled experiments, or sample surveys). This webinar will give an overview of data mining techniques, which include: In predictive modeling, we build a model to [...]

Read the full article →

The Difference Between Interaction and Association

Interaction is different. Whether two variables are associated says nothing about whether they interact in their effect on a third variable. Likewise, if two variables interact, they may or may not be associated.

Read the full article →

When To Fight For Your Analysis and When To Jump Through Hoops

In the world of data analysis, there’s not always one clearly appropriate analysis for every research question. There are so many factors to take into account, including the research question to be answered, the measurement of the variables, the study design, data limitations and issues, the audience, practical constraints like software availability, and the purpose [...]

Read the full article →

Understanding Probability, Odds, and Odds Ratios in Logistic Regression

Odds ratios are the bane of many data analysts. Interpreting them can be like learning a whole new language. This webinar will go over an example to show how to interpret the odds ratios in binary logistic regression. You will learn: how probability and odds both measure the same thing on different scales the meaning [...]

Read the full article →

When Can Count Data be Considered Continuous?

Q: How high does the count scale have to be before you can consider it continuous?

I suspect you’re getting at the same issue as in the last question. It’s certainly true that when you get into very large numbers, many of the issues with count variables aren’t issues anymore.

Read the full article →