A Comparison of Effect Size Statistics

If you’re in a field that uses Analysis of Variance, you have surely heard that p-values don’t indicate the size of an effect. You also need tostage 1 report effect size statistics.

Why? Because with a big enough sample size, any difference in means, no matter how small, can be statistically significant. P-values are designed to tell you if your result is a fluke, not if it’s big.

Unstandardized Effect Size Statistics

Truly the simplest and most straightforward effect size measure is the difference between two means. And you’re probably already reporting that. But the limitation of this measure as an effect size is not inaccuracy. It’s just hard to evaluate.

If you’re familiar with an area of research and the variables used in that area, you should know if a 3-point difference is big or small, although your readers may not. And if you’re evaluating a new type of variable, it can be hard to tell.

Standardized Effect Size Statistics

Standardized effect size statistics are designed for easier evaluation. They remove the units of measurement, so you don’t have to be familiar with the scaling of the variables.

Cohen’s d is a good example of a standardized effect size measurement. It’s equivalent in many ways to a standardized regression coefficient (labeled beta in some software). Both are standardized measures. They divide the size of the effect by the relevant standard deviations. So instead of being in terms of the original units of X and Y, both Cohen’s d and standardized regression coefficients are in terms of standard deviations.

There are some nice properties of standardized effect size measures. The foremost is you can compare them across variables. And in many situations, seeing differences in terms of number of standard deviations is very helpful.

Limitations

But they are most useful if you can also recognize their limitations. Unlike correlation coefficients, both Cohen’s d and beta can be greater than one. So while you can compare them to each other, you can’t just look at one and tell right away what is big or small. You’re just looking at the effect of the independent variable in terms of standard deviations.

This is especially important to note for Cohen’s d, because in his original book, he specified certain d values as indicating small, medium, and large effects in behavioral research. While the statistic itself is a good one, you should take these size recommendations with a grain of salt (or maybe a very large bowl of salt). What is a large or small effect is highly dependent on your specific field of study, and even a small effect can be theoretically meaningful.

Variance Explained

Another set of effect size measures have a more intuitive interpretation, and are easier to evaluate. They include Eta Squared, Partial Eta Squared, and Omega Squared. Like the R Squared statistic, they all have the intuitive interpretation of the proportion of the variance accounted for.

Eta Squared is calculated the same way as R Squared, and has the most equivalent interpretation: out of the total variation in Y, the proportion that can be attributed to a specific X.

Eta Squared, however, is used specifically in ANOVA models. Each effect in the model has its own Eta Squared. So you get a specific, intuitive measure of the effect of that variable.

Eta Squared has two drawbacks, however. One is that as you add more variables to the model, the proportion explained by any one variable will automatically decrease. This makes it hard to compare the effect of a single variable in different studies.

Partial Eta Squared solves this problem, but has a less intuitive interpretation. There, the denominator is not the total variation in Y, but the unexplained variation in Y plus the variation explained just by that X. So any variation explained by other Xs is removed from the denominator. This allows a researcher to compare the effect of the same variable in two different studies, which contain different covariates or other factors.

In a one-way ANOVA, Eta Squared and Partial Eta Squared will be equal. But this isn’t true in models with more than one independent variable.

The drawback for Eta Squared is that it is a biased measure of population variance explained (although it is accurate for the sample). It always overestimates it.

This bias gets very small as sample size increases. For small samples, an unbiased effect size measure is Omega Squared. Omega Squared has the same basic interpretation, but uses unbiased measures of the variance components. Because it is an unbiased estimate of population variances, Omega Squared is always smaller than Eta Squared.

See my post containing equations of all these effect size measures and a list of great references for further reading on effect sizes.

 

Effect Size Statistics
Statistical software doesn't always give us the effect sizes we need. Learn some of the common effect size statistics and the ways to calculate them yourself.

Reader Interactions

Comments

  1. Federico says

    Thank you, Karen.
    What do you mean by: “Cohen’s d is a good example of a standardized effect size measurement. It’s equivalent in many ways to a standardized regression coefficient (labeled beta in some software). Both are standardized measures. They divide the size of the effect by the relevant standard deviations”?
    Do you mean that, instead of being in terms of the original units of X and Y, both Cohen’s d and standardized regression coefficients are in terms of standard deviations.?
    Does it imply that both Cohen’s d and standardized Beta coefficient divide the Beta coefficient by a standard deviation, but that the 2 SDs used as denominator are different? If I well understand here:
    https://stats.stackexchange.com/a/473813/159259 (I am confused by this answer), one measure considers only the variance of the outcome and the other one both the variance of the outcome and the one of the predictor: is it correct?

    • Karen Grace-Martin says

      Hi Federico,

      Yes, the reason a standardized regression coefficient needs to divide by both sd(x) and sd(y) is that x is numeric. With Cohen’s d, only y is numeric. S is a grouping variable. So there really isn’t a sd(x)

  2. sisay says

    shall i use both cohen d and partial eta square in a single study. cohens d for group variation and partial eta square for multiple group comparison example

    Source of Variation SS df MS F Sig. ηp2
    Between Groups 358.081 2 179.041 9.371 .000 0.28
    Within Groups 12380.812 648 19.106
    Total 12738.893 650
    post hoc
    (I) Group of resp. (J) Group of resp. N Mean Std. Dev. F d p
    Students Teachers
    Experts
    Teachers Students
    Experts
    Experts Students
    Teachers
    Total

  3. JTM says

    Greetings! I urge you to stop recommending omega^2 as an “unbiased” alternative to eta^2, as omega^2 actually over-corrects and has negative bias. The measure of effect size that has the least bias (and is very close to unbiased) is epsilon^2.

    I would also add a discussion of Cohen’s f to this post, but that’s a separate issue.

    Also, I would include a date on these posts, as I have no idea if I commenting on something recent or something you wrote years ago.

    cheers

  4. José says

    Hi Karen,

    I would like to know how can I compare the difference between two independent effect sizes. If I have two independent Cohen d, how can I compare if one of them is substantially higher than the other?

    Thanks

    José

  5. Jim says

    I am currently taking a statistics for doctoral learners course working on effect size correlations. I am confused on the r-squared and Cohen’s d (formula which uses the t value and square root of n). Working a problem with one study using 10 subjects having a t=1.0 and comparing to another study with 100 subject also with a t=1.9. In computing the r-squared and Cohen’s d it appears as the sample size increases the effect size is less? For the above sample with 9(df) the r-squared is .535 and d=1.267, with 99(df) the r-squared is .187 and d=.3819. I understand that another important factor is testing for power, but looking at this correlation both the r-squared and Cohen’s d appear to show less effect as sample size increases?

  6. sarah says

    Hi Karen,

    What does it mean if the associated p-value for R2 is not significant. For example:
    R-SQUARE

    Observed Two-Tailed
    Variable Estimate S.E. Est./S.E. P-Value

    Variable1 0.138 0.074 1.860 0.063 (or higher)

    Does this invalidate the results of the regression model, even if some of my IVs are significant?
    Thank you!

  7. Usha says

    Hi,
    How to interpret the effect size of eta-squarred as small, medium, large? What reference can be cited for the same?

  8. Bryna Chrismas says

    Good afternoon,

    Am I correct in thinking that you cannot calculate cohens d for linear mixed models?

    Do you have a reference/paper that discusses the use of effect sizes for mixed models?

    Is the reason why you cannot use cohen’s d due to the way a LMM works ie maximum likelihood?

    Many thanks

    Kind regards

    Bryna

  9. Susan says

    Thought you might be interested in this article: Bias and precision of some classical ANOVA effect sizes when assumptions are violated. It was published in Behavior Research Methods (doi: 10.3758/s13428-012-0257-2). There is a free spreadsheet available @ http://www.shsu.edu/~sts008/ to calculate eta squared, epsilon squared, and omega squared.

  10. Richard says

    Hi Karen,

    I am wondering if there’s a way to obtain measures of effect size when using Stata survey commands in generalized linear models. Based on your post and on my stats textbook, I think that omega squared would be the most appropriate given that it’s a population-based survey.

    Thanks in advance,

    Richard

  11. Karen says

    Hi Jae,

    Almost. The partial eta-squared itself won’t follow an F. The partial eta-squared is SS(effect)/SS(Error-other effects).

    To get an F distribution, each SS has to be divided by its degrees of freedom.

    Karen

  12. Dan says

    Hello, Karen.
    I wonder if there is a way two compare two partial eta measures between themselves in order to claim that one effect is stronger than another?

    Thanks,
    Dan

    • Karen says

      Hi Dan,

      I don’t know of a statistic that directly tests the partial eta-squared values. But it doesn’t seem too hard to construct something at least close.

      So partial eta-squared is the ratio of two Sums of Squares. Any ratio of two Mean Squares (which is just Sum of Squares/df) follows an F distribution. So you could create an F-test on your own, if you included the appropriate df.

      It’s really a matter of using a different denominator for the values of your F statistic. This isn’t unheard off–it’s done in simple effects testing.

      Karen

      • Jae says

        Hi Karen,

        I also had the same question.

        Do you mean that the ratio of partial eta-squared, which follows an F distribution, also follows an F distribution?

        Thank you,
        Jae


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.