ANOVA

A Comparison of Effect Size Statistics

January 13th, 2011 by Karen Grace-Martin

If you’re in a field that uses Analysis of Variance, you have surely heard that p-values don’t indicate the size of an effect. You also need to report effect size statistics.

Why? Because with a big enough sample size, any difference in means, no matter how small, can be statistically significant. P-values are designed to tell you if your result is a fluke, not if it’s big.

Unstandardized Effect Size Statistics

Truly the simplest and most straightforward effect size measure is the difference between two means. And you’re probably already reporting that. But the limitation of this measure as an effect size is not inaccuracy. It’s just hard to evaluate.

If you’re familiar with an area of research and the variables used in that area, you should know if a 3-point difference is big or small, although your readers may not. And if you’re evaluating a new type of variable, it can be hard to tell.

Standardized Effect Size Statistics

Standardized effect size statistics are designed for easier evaluation. They remove the units of measurement, so you don’t have to be familiar with the scaling of the variables.

Cohen’s d is a good example of a standardized effect size measurement. It’s equivalent in many ways to a standardized regression coefficient (labeled beta in some software). Both are standardized measures. They divide the size of the effect by the relevant standard deviations. So instead of being in terms of the original units of X and Y, both Cohen’s d and standardized regression coefficients are in terms of standard deviations.

There are some nice properties of standardized effect size measures. The foremost is you can compare them across variables. And in many situations, seeing differences in terms of number of standard deviations is very helpful.

Limitations

But they are most useful if you can also recognize their limitations. Unlike correlation coefficients, both Cohen’s d and beta can be greater than one. So while you can compare them to each other, you can’t just look at one and tell right away what is big or small. You’re just looking at the effect of the independent variable in terms of standard deviations.

This is especially important to note for Cohen’s d, because in his original book, he specified certain d values as indicating small, medium, and large effects in behavioral research. While the statistic itself is a good one, you should take these size recommendations with a grain of salt (or maybe a very large bowl of salt). What is a large or small effect is highly dependent on your specific field of study, and even a small effect can be theoretically meaningful.

Variance Explained

Another set of effect size measures have a more intuitive interpretation, and are easier to evaluate. They include Eta Squared, Partial Eta Squared, and Omega Squared. Like the R Squared statistic, they all have the intuitive interpretation of the proportion of the variance accounted for.

Eta Squared is calculated the same way as R Squared, and has the most equivalent interpretation: out of the total variation in Y, the proportion that can be attributed to a specific X.

Eta Squared, however, is used specifically in ANOVA models. Each effect in the model has its own Eta Squared. So you get a specific, intuitive measure of the effect of that variable.

Eta Squared has two drawbacks, however. One is that as you add more variables to the model, the proportion explained by any one variable will automatically decrease. This makes it hard to compare the effect of a single variable in different studies.

Partial Eta Squared solves this problem, but has a less intuitive interpretation. There, the denominator is not the total variation in Y, but the unexplained variation in Y plus the variation explained just by that X. So any variation explained by other Xs is removed from the denominator. This allows a researcher to compare the effect of the same variable in two different studies, which contain different covariates or other factors.

In a one-way ANOVA, Eta Squared and Partial Eta Squared will be equal. But this isn’t true in models with more than one independent variable.

The drawback for Eta Squared is that it is a biased measure of population variance explained (although it is accurate for the sample). It always overestimates it.

This bias gets very small as sample size increases. For small samples, an unbiased effect size measure is Omega Squared. Omega Squared has the same basic interpretation, but uses unbiased measures of the variance components. Because it is an unbiased estimate of population variances, Omega Squared is always smaller than Eta Squared.

See my post containing equations of all these effect size measures and a list of great references for further reading on effect sizes.

20 comments

The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes

September 17th, 2010 by Karen Grace-Martin

Just recently, a client got some feedback from a committee member that the Analysis of Covariance (ANCOVA) model she ran did not meet all the assumptions.

Specifically, the assumption in question is that the covariate has to be uncorrelated with the independent variable.

This committee member is, in the strictest sense of how analysis of covariance is used, correct.

And yet, they over-applied that assumption to an inappropriate situation.

ANCOVA for Experimental Data

Analysis of Covariance was developed for experimental situations and some of the assumptions and definitions of ANCOVA apply only to those experimental situations.

The key situation is the independent variables are categorical and manipulated, not observed.

The covariate–continuous and observed–is considered a nuisance variable. There are no research questions about how this covariate itself affects or relates to the dependent variable.

The only hypothesis tests of interest are about the independent variables, controlling for the effects of the nuisance covariate.

A typical example is a study to compare the math scores of students who were enrolled in three different learning programs at the end of the school year.

The key independent variable here is the learning program. Students need to be randomly assigned to one of the three programs.

The only research question is about whether the math scores differed on average among the three programs. It is useful to control for a covariate like IQ scores, but we are not really interested in the relationship between IQ and math scores.

So in this example, in order to conclude that the learning program affected math scores, it is indeed important that IQ scores, the covariate, is unrelated to which learning program the students were assigned to.

You could not make that causal interpretation if it turns out that the IQ scores were generally higher in one learning program than the others.

So this assumption of ANCOVA is very important in this specific type of study in which we are trying to make a specific type of inference.

ANCOVA for Other Data

But that’s really just one application of a linear model with one categorical and one continuous predictor. The research question of interest doesn’t have to be about the causal effect of the categorical predictor, and the covariate doesn’t have to be a nuisance variable.

A regression model with one continuous and one dummy-coded variable is the same model (actually, you’d need two dummy variables to cover the three categories, but that’s another story).

The focus of that model may differ–perhaps the main research question is about the continuous predictor.

But it’s the same mathematical model.

The software will run it the same way. YOU may focus on different parts of the output or select different options, but it’s the same model.

And that’s where the model names can get in the way of understanding the relationships among your variables. The model itself doesn’t care if the categorical variable was manipulated. It doesn’t care if the categorical independent variable and the continuous covariate are mildly correlated.

If those ANCOVA assumptions aren’t met, it does not change the analysis at all. It only affects how parameter estimates are interpreted and the kinds of conclusions you can draw.

In fact, those assumptions really aren’t about the model. They’re about the design. It’s the design that affects the conclusions. It doesn’t matter if a covariate is a nuisance variable or an interesting phenomenon to the model. That’s a design issue.

The General Linear Model

So what do you do instead of labeling models? Just call them a General Linear Model. It’s hard to think of regression and ANOVA as the same model because the equations look so different. But it turns out they aren’t.

Regression and ANOVA model equations

If you look at the two models, first you may notice some similarities.

Both are modeling Y, an outcome.
Both have a “fixed” portion on the right with some parameters to estimate–this portion estimates the mean values of Y at the different values of X.
Both equations have a residual, which is the random part of the model. It is the variation in Y that is not affected by the Xs.

But wait a minute, Karen, are you nuts?–there are no Xs in the ANOVA model!

Actually, there are. They’re just implicit.

Since the Xs are categorical, they have only a few values, to indicate which category a case is in. Those j and k subscripts? They’re really just indicating the values of X.

(And for the record, I think a couple Xs are a lot easier to keep track of than all those subscripts. Ever have to calculate an ANOVA model by hand? Just sayin’.)

So instead of trying to come up with the right label for a model, focus instead on understanding (and describing in your paper) the measurement scales of your variables, if and how much they’re related, and how that affects the conclusions.

In my client’s situation, it was not a problem that the continuous and the categorical variables were mildly correlated. The data were not experimental and she was not trying to draw causal conclusions about only the categorical predictor.

So she had to call this ANCOVA model a multiple regression.

18 comments

3 Situations When it Makes Sense to Categorize a Continuous Predictor in a Regression Model

July 24th, 2009 by Karen Grace-Martin

In many research fields, a common practice is to categorize continuous predictor variables so they work in an ANOVA. This is often done with median splits. This is a way of splitting the sample into two categories: the “high” values above the median and the “low” values below the median.

Reasons Not to Categorize a Continuous Predictor

There are many reasons why this isn’t such a good idea: (more…)

4 comments

Beyond Median Splits: Meaningful Cut Points

June 26th, 2009 by Karen Grace-Martin

I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.

But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily. Essentially, this means you need to be (more…)

No comments yet

Interpreting Interactions: When the F test and the Simple Effects disagree.

May 11th, 2009 by Karen Grace-Martin

The way to follow up on a significant two-way interaction between two categorical variables is to check the simple effects. Most of the time the simple effects tests give a very clear picture about the interaction. Every so often, however, you have a significant interaction, but no significant simple effects. It is not a logical impossibility. They are testing two different, but related hypotheses.

Assume your two independent variables are A and B. Each has two values: 1 and 2. The interaction is testing if A1 – B1 = A2 – B2 (the null hypothesis). The simple effects are testing whether A1-B1=0 and A2-B2=0 (null) or not.

If you have a crossover interaction, you can have A1-B1 slightly positive and A2-B2 slightly negative. While neither is significantly different from 0, they are significantly different from each other.

And it is highly useful for answering many research questions to know if the differences in the means in one condition equal the differences in the means for the other. It might be true that it’s not testing a hypothesis you’re interested in, but in many studies, all the interesting effects are in the interactions.

98 comments

Checking Assumptions in ANOVA and Linear Regression Models: The Distribution of Dependent Variables

April 10th, 2009 by Karen Grace-Martin

Here’s a little reminder for those of you checking assumptions in regression and ANOVA:

The assumptions of normality and homogeneity of variance for linear models are not about Y, the dependent variable. (If you think I’m either stupid, crazy, or just plain nit-picking, read on. This distinction really is important). (more…)

24 comments