In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.
1. Because she was making you calculate everything by hand. Sums of squares require a different formula if sample sizes are unequal, but SPSS (and other statistical software) will automatically use the right formula.
2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced. Instead of the grand mean, you need to use a weighted mean. That’s not a big deal if you’re aware of it.
The only practical issue in one-way ANOVA is that very unequal sample sizes can affect the homogeneity of variance assumption. ANOVA is considered robust to moderate departures from this assumption, but the departure needs to stay smaller when the sample sizes are very different. According to Keppel (2003), there isn’t a good rule of thumb for the point at which unequal sample sizes make heterogeneity of variance a problem.
Real issues with unequal sample sizes do occur in factorial ANOVA, if the sample sizes are confounded in the two (or more) factors. For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are age (young vs. old) and marital status (married vs. not). If there are twice as many young people as old and the young group has a much larger percentage of singles than the older group, the effect of marital status cannot be distinguished from the effect of age.
Power is based on the smallest sample size, so while it doesn’t hurt power to have more observations in the larger group, it doesn’t help either.
A very comprehensive article with more information about ANOVA in general and how sample sizes affect it is at http://www2.chass.ncsu.edu/garson/PA765/anova.htm.





{ 21 comments… read them below or add one }
This may be a silly questions, but what if you are doing a 2x2x2 and your comparing males and females on their reaction times (2 tasks) and their anxiety (high or low)
and there are more females in the study than males.
Would this be a confound?
Hi Jessica,
It’s not a confound just if there are more females than males. It’s a confound only if, say, there are more females AND females are more likely to be anxious.
If your task and anxiety conditions are manipulated, so that you’re assigning people to them, then you have no problem. The example I gave could only occur if you also measured anxiety, not manipulated it.
Karen
I apologize in advance but I have bunch of questions about unequal sample sizes and one-way ANOVAs in a particular case study.
I am conducting an experiment with very different numbers of sample sites. control: n=60; dose 1 n=114; dose 2 n=175. My main question is if the response to dose 1 and dose 2 are significantly different? Response was measured by difference in plant’s presence or absence before and after treatment. So if the the plant was present at the sample site before treatment and absent at the same site after treatment it was considered a 1 for response, if it was present before and it was present after it was considered a 0 for response and if the plant was not present before and was present after it was considered -1 (I already know from previous research that the two doses should be significantly different from the control but I would like to do an ANOVA test to compare the resposes in the control group and the two different doses)
Q1: Is there a test, like the levene test, for determining the equality of variances for unequal sample sizes?
Q2: Should I not use ANOVA because the sample sizes are too different?
Q3: Would it be better or worse to conduct a series of t-tests?
Q4: If I choose to use ANOVA should I use a Welch ANOVA followed by games howell pairwise comparison as suggested here in the below pdf because the sample sizes are different? http://frank.mtsu.edu/~dkfuller/notes302/anova.pdf
Q5: Should I not use ANOVA or a t-test because I pretty sure the data is not gaussian due to the fact that the data is practically boolean? And if so is there another test for comparing this kind of data?
Any help you could give me would be greatly appreciated. I feel pretty lost.
Thank you in advance,
Erika
Hi Erika,
Sorry it took me a while to respond. Hope this is still useful. You do have a lot of questions, but I’ll do my best.
1. Levene works with unequal samples sizes. Equal variance is even MORE important if sample sizes are unequal.
2. No. It’s fine to use ANOVA (assuming variances are equal) with unequal sample sizes. But you should NOT use ANOVA in this study because your response is categorical, not continuous.
3. Worse. Always worse.
4. Welch’s test could work in your design (if ANOVA were appropriate), but according to Keppel (1991), it’s “unsatisfactory” when you’re comparing more than 4 means.
5. Exactly. You could just run a Chi-square, or if you want to get really fancy, or you have covariates you want to include, a logistic regression.
Karen
Thank you Karen! You really helped to clear these things up for me. I really appreciate it. Sorry again for all of the questions.
Thanks Again,
Erika
Hi Karen,
I conducted a two-way ANOVA to test if there are differences in levels of teaching innovation (scores 0-6) between teachers based on school (1=regular school, 2=all-day school) and in-service training (1=none, 2=Basic ICT Skills, 3=Educational applications of ICT). I used unequal sample sizes (75 all-day teachers and 90 regular teachers).
The ANOVA table showed that there are no differences in either main effects or interaction effect (p<0.05). However, the Model p-value was smaller than 0.05 showing that there are significant differences in the model.
I discussed only the interaction and main effects p-values. My chair told me to recheck the data analysis because it does not make sense with the Model having significant differences whereas none of the effects (main and interaction) had no significant differences. When I deleted the Model row from the table claiming that the only important p-values to discuss were the main and interaction effects p-values, my chair said this was wrong.
The data analysis is correct–I double checked it. My question is: What does this Model p-value mean? Does it have to do with the unequal sample sizes? How should I discuss this Model p-value? Is it really this important to include it in my results?
Thanks in advance,
Adamantia
Hi Adamantia,
Thanks for being patient–I’ve been out of the office and just got back.
I can’t give you a definite answer of what is going on without trying it on data, but this is what is *probably* going on.
The Model p-value evaluates the overall effect of all IVs. IF all the IVs are completely independent and sample sizes are equal, the overall model effect won’t be significant if no IVs are.
IVs are usually only independent when you have randomly assigned subjects to conditions.
The other thing that can happen is if your p-values are close to .05, different tests might be falling on one side of that cutoff or the other. They’re not really changing much, and even just rounding can be creating differences. So if that’s the case, don’t take the .05 cutoff too seriously.
Karen
Hi there,
I have some data that gives the amount of time taken by three different surgeons to undertake a specific procedure. Given that I have a varying number of data points for each surgeon (e.g. 50/40/25) and that there may be unequal variance (e.g. slower surgeons having a greater variety of recorded times), what is the best way to figure out if there are significant differences in the time taken by each surgeon?
Cheers!
Hi Paulo,
I would start by seeing if the unequal variances are large enough to cause problems. If they are, with a one-way analysis like that, you could easily just run a nonparametric test.
Karen
good day! this is very urgent…. we have a report to pass tomorrow and our research design is two-way (2×2) anova factorial design. we dont know how to make results in spss. thank you!
hello again, we have unequal sample size.. thank you again!
Hi Vanna,
Hmm, we may be past your deadline anyway, but in any case, I’d need more information about what you need. The fact that you have unequal sample sizes in the ANOVA isn’t problematic. Just run it as you would any 2×2 ANOVA. If you need help running a 2×2 ANOVA in SPSS, I can tell you to use Univariate GLM. If you need more detail than that, I need a better idea of what you understand already and what you need help with.
Karen
hi in my test am comparing a single variable among 3 groups having different sample size. can i do one way ANOVA inspite of the unequal sample size?
Sure.
Hi Karen, I am interested in your 3rd response to Erika the 7th December 2010. You write Erika should not use ANOVA as the response is categorical, not continuous.
Do you mean that because of Erika’s design, a control: n=60; dose 1 n=114; and dose 2 n=175, it is inappropriate to use ANOVA here?
I am interested in this as I have similar conditions; a grouping variable with three categorical (depending on viewpoint) responses, a (very) unbalanced design, and for some dependents, unequal variances. Hence, I wonder, what analysis would be appropriate if I conclude my response is categorical, rather than a continuous?
Best Bud
Hi Bud,
Good question. No, the control/Dose 1/Dose 2 variable is her Independent Variable. It’s totally appropriate to have grouping (ie. categorical) variables for the independent variable.
In Erika’s study, her Dependent Variable (aka Response Variable or Outcome) is ALSO categorical: Is the Presence of the plant the same after as it was before: Yes or No.
ANOVA is comparing means in the Dependent variable for the different categories of the Independent Variable. Since there is no way to calculate a mean of Yes/No, you can’t use anova.
So I’m not sure based on how you’ve described your study whether your dependent variable is indeed categorical. You mention unequal variances, which makes me think they really are numerical.
Here are a few posts that might be helpful:
When Dependent Variables Are Not Fit for GLM, Now What?
6 Types of Dependent Variables that will Never Meet the GLM Normality Assumption
Karen
Hi again Karen, and thanks!
I see your point, her (Erika’s) dependent was a categorical. Mine, however, are not. That I am sure of. However, a greater concern for me is that my sample sizes vary considerably: group 1 equals 464, group 2 = 444, and group 3 = 24.
My problem is that even though an ANOVA shows significant differences for the three groups on a specific dependent variable, and the largest calculated mean-difference is between the smallest group and one of the other, post hoc tests cannot tell apart the smallest group from the group where the largest mean-difference appear.
Standardized mean scores for the groups:
Group 1: -.08 (a)
Group 2: .27 (b)
Group 3: -.11 (ab)
Currently, I use Hochberg’s GT2 post hoc test, as it, I have read, is quite robust to violations of homogeneity of variance. I also, where indicated by the Levene’s test, modify the p-values using the Welch modification.
I know that this may be a lot to ask but I wonder whether you think I could benefit from bootstrapping or if such a procedure will not help me as the ratio among my three groups will not differ?
Best Viktor
Hi Karen,
I have run a two way ANOVA (2 by 2 facotrial design) and gained a significant Levene’s Test p = .012. I have adjusted the crititcal alpha for interpretation of significance for both the main and interaction effects, however I was wondering what are the practical methods that can be used in future studies such that Levene’s is not violated? and are you able to give me some references.
Also, with another 2 by 2 factorial design that reveals a significant interaction, I am aware that follow up simple effects are required. Through the use of the split data method in SPSS and recalculated the F statistic using the overall MSE. Is there a need to control for Type 1 error by using Bonferroni’s?
Thanks
Nicole
hi Nicole,
I never use Levene’s test. With large sample sizes, it’s almost always significant. With small sample sizes, it’s almost never significant.
So it’s not very helpful. Jeffrey couples book design and analysis of experiments has a good section on this.
Or if you want a full explanation and demonstration about assumptions, what they mean and better ways to check them, I would actually recommend my workshop on assumptions in linear models. We have a home study version and you can get more information at: http://www.theanalysisinstitute.com/workshops/GLM-Assumptions/index.html
Karen
HI Karen,
I just want to know if i could actually use two way factorial anova for this.
I have two groups of DEvice 1 (n=27) and Device 2 (n=28). in each group, I have 5 sub categories of participants (very low, low, moderate, high and very high experience of playing games). For the Device 1 group I have 9, 8, 5, 2, 2 and 1 for ach sub category. For the Device 2, I have 7, 4, 7, 8, 2, 0 for each sub category. Can I use two way ANOVA for this? Or should I just provide descriptive analysis? The main objective of the experiment is to see if there is any difference on the participants total score when playing games in Device 1 or 2.
Hi Lulu,
You could run a two-way anova as is without the interaction on this. The problem subcategories are the ones with 1 and 0 people in them.
The other alternative, if the interaction seems necessary, is to collapse the experience variable into fewer categories.
I would suggest graphing the means to see if the interaction is important, and if not, leave it out. If it is, you’d be better of collapsing.
Karen