If you’re in a field that uses Analysis of Variance, you have surely heard that p-values don’t indicate the size of an effect. You also need to
report effect size statistics.
Why? Because with a big enough sample size, any difference in means, no matter how small, can be statistically significant. P-values are designed to tell you if your result is consistent with the null hypothesis, not if it’s big.
Unstandardized Effect Size Statistics
Truly the simplest and most straightforward effect size measure is the difference between two means. And you’re probably already reporting that. But the limitation of this measure as an effect size is not inaccuracy. It’s just sometimes hard to evaluate.
If you’re familiar with an area of research and the variables used in that area, you should know if a 3-point difference is big or small, although your readers may not. And if you’re evaluating a new type of variable, it can be hard to tell.
Standardized Effect Size Statistics
Standardized effect size statistics are designed for easier evaluation. They remove the units of measurement, so you don’t have to be familiar with the scaling of the variables.
Cohen’s d is a good example of a standardized effect size measurement. It’s equivalent in many ways to a standardized regression coefficient (labeled beta in some software).
Both are standardized measures. They divide the size of the effect by the relevant standard deviations. So instead of being in terms of the original units of X and Y, both Cohen’s d and standardized regression coefficients are in terms of standard deviations.
There are some nice properties of standardized effect size measures. The foremost is you can compare them across variables. And in many situations, seeing differences in terms of number of standard deviations is very helpful.
Limitations
But they are most useful if you also recognize their limitations. Unlike correlation coefficients, both Cohen’s d and beta can be greater than one. So while you can compare them to each other, you can’t just look at one and tell right away what is big or small. You’re just looking at the effect of the independent variable in terms of standard deviations.
This is especially important to note for Cohen’s d, because in his original book, Cohen specified certain d values as indicating small, medium, and large effects in behavioral research only.
While the statistic itself is a good one, you should take these size recommendations with a grain of salt (or maybe a very large bowl of salt). What is a large or small effect is highly dependent on your specific field of study, and even a small effect can be theoretically meaningful.
Variance Explained
Another set of effect size measures have a more intuitive interpretation, and are easier to evaluate. They include Eta Squared, Partial Eta Squared, and Omega Squared. Like the R Squared statistic, they all have the intuitive interpretation of the proportion of the variance accounted for.
Eta Squared is calculated the same way as R Squared, and has the most equivalent interpretation: out of the total variation in Y, the proportion that can be attributed to a specific X.
Eta Squared, however, is used specifically in ANOVA models. Each effect in the model has its own Eta Squared. So you get a specific, intuitive measure of the effect of that variable.
Eta Squared has two drawbacks, however. One is that as you add more variables to the model, the proportion explained by any one variable will automatically decrease. This makes it hard to compare the effect of a single variable in different studies.
Partial Eta Squared solves this problem, but has a less intuitive interpretation. There, the denominator is not the total variation in Y, but the unexplained variation in Y plus the variation explained just by that X. So any variation explained by other Xs is removed from the denominator. This allows a researcher to compare the effect of the same variable in two different studies, even if those studies contain different covariates or other factors.
In a one-way ANOVA, Eta Squared and Partial Eta Squared will be equal. But this isn’t true in models with more than one independent variable.
The drawback for Eta Squared is that it is a biased measure of population variance explained (although it is accurate for the sample). It always overestimates it.
This bias gets very small as sample size increases. For small samples, an unbiased effect size measure is Omega Squared. Omega Squared has the same basic interpretation, but uses unbiased measures of the variance components. Because it is an unbiased estimate of population variances, Omega Squared is always smaller than Eta Squared.
See my post containing equations of all these effect size measures and a list of great references for further reading on effect sizes.
There are many effect size statistics for ANOVA and regression, and as you may have noticed, journal editors are now requiring you include one.
Unfortunately, the one your editor wants or is the one most appropriate to your research may not be the one your software makes available (SPSS, for example, reports Partial Eta Squared only, although it labels it Eta Squared in early versions).
Luckily, all the effect size measures are relatively easy to calculate from information in the ANOVA table on your output. Here are a few common ones: (more…)
If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.
I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.
1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.
2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, (more…)
Censored data are inherent in any analysis, like Event History or Survival Analysis, in which the outcome measures the Time to Event TTE. Censoring occurs when the event doesn’t occur for an observed individual during the time we observe them.
Despite the name, the event of “survival” could be any categorical event that you would like to describe the mean or median TTE. To take the censoring into account, though, you need to make sure your data are set up correctly.
Here is a simple example, for a data set that measures days after surgery until an (more…)
Time to event analyses (aka, Survival Analysis and Event History Analysis) are used often within medical, sales and epidemiological research. Some examples of time-to-event analysis are measuring the median time to death after being diagnosed with a heart condition, comparing male and female time to purchase after being given a coupon and estimating time to infection after exposure to a disease.
Survival time has two components that must be clearly defined: a beginning point and an endpoint that is reached either when the event occurs or when the follow-up time has ended.
One basic concept needed to understand time-to-event (TTE) analysis is censoring.
In simple TTE, you should have two types of observations:
1. The event occurred, and we are able to measure when it occurred OR
2. The event did NOT occur during the time we observed the individual, and we only know the total number of days in which it didn’t occur. (CENSORED).
Again you have two groups, one where the time-to-event is known exactly and one where it is not. The latter group is only known to have a certain amount of time where the event of interest did not occur. We don’t know if it would have occurred had we observed the individual longer. But knowing that it didn’t occur for so long tells us something about the risk of the envent for that person.
For example, let the time-to-event be a person’s age at onset of cancer. If you stop following someone after age 65, you may know that the person did NOT have cancer at age 65, but you do not have any information after that age.
You know that their age of getting cancer is greater than 65. But you do not know if they will never get cancer or if they’ll get it at age 66, only that they have a “survival” time greater than 65 years. They are censored because we did not gather information on that subject after age 65.
So one cause of censoring is merely that we can’t follow people forever. At some point you have to end your study, and not all people will have experienced the event.
But another common cause is that people are lost to follow-up during a study. This is called random censoring. It occurs when follow-up ends for reasons that are not under control of the investigator.
In survival analysis, censored observations contribute to the total number at risk up to the time that they ceased to be followed. One advantage here is that the length of time that an individual is followed does not have to be equal for everyone. All observations could have different amounts of follow-up time, and the analysis can take that into account.
Allison, P. D. (1995). Survival Analysis Using SAS. Cary, NC: SAS Institute Inc.
Hosmer, D. W. (2008). Applied Survival Analysis (2nd ed.). Hoboken, NJ: John Wiley & Sons, Inc.
Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts.
Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways. And these different ways of using the term have BIG implications for what your model means.
The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to adjust for the effect of an observed, continuous variable–the covariate.
In this context, the covariate is always continuous, never the key independent variable, (more…)