Power and Sample Size

Member Training: Small Sample Statistics

August 1st, 2016 by

Despite modern concerns about how to handle big data, there persists an age-old question: What can we do with small samples?

Sometimes small sample sizes are planned and expected.  Sometimes not. For example, the cost, ethical, and logistical realities of animal experiments often lead to samples of fewer than 10 animals.

Other times, a solid sample size is intended based on a priori power calculations. Yet recruitment difficulties or logistical problems lead to a much smaller sample. In this webinar, we will discuss methods for analyzing small samples.  Special focus will be on the case of unplanned small sample sizes and the issues and strategies to consider.


Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)


Member Training: An Overview of Effect Size Statistics and Why They are So Important

July 1st, 2015 by

Whenever we run an analysis of variance or run a regression one of the first things we do is look at the p-value of our predictor variables to determine whether

they are statistically significant. When the variable is statistically significant, did you ever stop and ask yourself how significant it is? (more…)


Three Issues in Sample Size Estimates for Multilevel Models

November 30th, 2012 by

If you’ve ever worked with multilevel models, you know that they are an extension of linear models. For a researcher learning them, this is both good and bad news.

The good side is that many of the concepts, calculations, and results are familiar. The down side of the extension is that everything is more complicated in multilevel models.

This includes power and sample size calculations. (more…)


Sample Size Estimates for Multilevel Randomized Trials

May 1st, 2012 by

If you learned much about calculating power or sample sizes in your statistics classes, chances are, it was on something very, very simple, like a z-test.

But there are many design issues that affect power in a study that go way beyond a z-test.  Like:

Regular sample size software can accommodate some of these issues, but not all.  And there is just something wonderful about finding a tool that does just what you need it to.

Especially when it’s free.

Enter Optimal Design Plus Empirical Evidence software. (more…)


5 Reasons to Run Sample Size Calculations Before Collecting Data

September 9th, 2011 by

Most of us run sample size calculations when a granting agency or committee requires it.  That’s reason 1.

That is a very good reason.  But there are others, and it can be helpful to keep these in mind when you’re tempted to skip this step or are grumbling through the calculations you’re required to do.

It’s easy to base your sample size on what is customary in your field (“I’ll use 20 subjects per condition”) or to just use the number of subjects in a similar study (“They used 150, so I will too”).

Sometimes you can get away with doing that.

However, there really are some good reasons beyond funding to do some sample size estimates. And since they’re not especially time-consuming, it’s worth doing them. (more…)


A Comparison of Effect Size Statistics

January 13th, 2011 by

If you’re in a field that uses Analysis of Variance, you have surely heard that p-values don’t indicate the size of an effect. You also need tostage 1 report effect size statistics.

Why? Because with a big enough sample size, any difference in means, no matter how small, can be statistically significant. P-values are designed to tell you if your result is a fluke, not if it’s big.

Unstandardized Effect Size Statistics

Truly the simplest and most straightforward effect size measure is the difference between two means. And you’re probably already reporting that. But the limitation of this measure as an effect size is not inaccuracy. It’s just hard to evaluate.

If you’re familiar with an area of research and the variables used in that area, you should know if a 3-point difference is big or small, although your readers may not. And if you’re evaluating a new type of variable, it can be hard to tell.

Standardized Effect Size Statistics

Standardized effect size statistics are designed for easier evaluation. They remove the units of measurement, so you don’t have to be familiar with the scaling of the variables.

Cohen’s d is a good example of a standardized effect size measurement. It’s equivalent in many ways to a standardized regression coefficient (labeled beta in some software). Both are standardized measures. They divide the size of the effect by the relevant standard deviations. So instead of being in terms of the original units of X and Y, both Cohen’s d and standardized regression coefficients are in terms of standard deviations.

There are some nice properties of standardized effect size measures. The foremost is you can compare them across variables. And in many situations, seeing differences in terms of number of standard deviations is very helpful.

Limitations

But they are most useful if you can also recognize their limitations. Unlike correlation coefficients, both Cohen’s d and beta can be greater than one. So while you can compare them to each other, you can’t just look at one and tell right away what is big or small. You’re just looking at the effect of the independent variable in terms of standard deviations.

This is especially important to note for Cohen’s d, because in his original book, he specified certain d values as indicating small, medium, and large effects in behavioral research. While the statistic itself is a good one, you should take these size recommendations with a grain of salt (or maybe a very large bowl of salt). What is a large or small effect is highly dependent on your specific field of study, and even a small effect can be theoretically meaningful.

Variance Explained

Another set of effect size measures have a more intuitive interpretation, and are easier to evaluate. They include Eta Squared, Partial Eta Squared, and Omega Squared. Like the R Squared statistic, they all have the intuitive interpretation of the proportion of the variance accounted for.

Eta Squared is calculated the same way as R Squared, and has the most equivalent interpretation: out of the total variation in Y, the proportion that can be attributed to a specific X.

Eta Squared, however, is used specifically in ANOVA models. Each effect in the model has its own Eta Squared. So you get a specific, intuitive measure of the effect of that variable.

Eta Squared has two drawbacks, however. One is that as you add more variables to the model, the proportion explained by any one variable will automatically decrease. This makes it hard to compare the effect of a single variable in different studies.

Partial Eta Squared solves this problem, but has a less intuitive interpretation. There, the denominator is not the total variation in Y, but the unexplained variation in Y plus the variation explained just by that X. So any variation explained by other Xs is removed from the denominator. This allows a researcher to compare the effect of the same variable in two different studies, which contain different covariates or other factors.

In a one-way ANOVA, Eta Squared and Partial Eta Squared will be equal. But this isn’t true in models with more than one independent variable.

The drawback for Eta Squared is that it is a biased measure of population variance explained (although it is accurate for the sample). It always overestimates it.

This bias gets very small as sample size increases. For small samples, an unbiased effect size measure is Omega Squared. Omega Squared has the same basic interpretation, but uses unbiased measures of the variance components. Because it is an unbiased estimate of population variances, Omega Squared is always smaller than Eta Squared.

See my post containing equations of all these effect size measures and a list of great references for further reading on effect sizes.