When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA

by Karen

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand.  Sums of squares require a different formula if sample sizes are unequal, but SPSS (and other statistical software) will automatically use the right formula.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced.  Instead of the grand mean, you need to use a weighted mean.  That’s not a big deal if you’re aware of it.

The only practical issue in one-way ANOVA is that very unequal sample sizes can affect the homogeneity of variance assumption.  ANOVA is considered robust to moderate departures from this assumption, but the departure needs to stay smaller when the sample sizes are very different.  According to Keppel (1993), there isn’t a good rule of thumb for the point at which unequal sample sizes make heterogeneity of variance a problem.

Real issues with unequal sample sizes do occur in factorial ANOVA, if the sample sizes are confounded in the two (or more) factors.  For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are age (young vs. old) and marital status (married vs. not).  If there are twice as many young people as old and the young group has a much larger percentage of singles than the older group, the effect of marital status cannot be distinguished from the effect of age.

Power is based on the smallest sample size, so while it doesn’t hurt power to have more observations in the larger group, it doesn’t help either.

Send to Kindle

{ 104 comments… read them below or add one }

Anne February 4, 2013 at 9:50 am

Good morning Karen,

Great site!
I have a few questions:
My data: gender comparisons re knowledge, attitudes, beliefs.
Male n=68, female n=263.

1) I am running multiple regression, t-test and MANOVA.
I want to know if I need to run non parametrics to account for the unequal group n’s?
Doesn’t the Central Limit Theorem kick in due to my large sample sizes?

2) In my MANOVA, my Levene’s test shows two variables that are significant at both the.05 and .01 levels.
Should I not use MANOVA and look at other tests instead?

Thanks so much for your advice,
Anne

Reply

Karen February 6, 2013 at 2:31 pm

Hi Anne,

1) You can run nonparametrics, but it’s usually not necessary. It’s hard to say what you need to do in any specific situation without all the details.

2) I’m not sure I understand this question, and as for what you should do, see my response to #1. If you want to restate that, I can give you some info so you can decide what you should do. :)

Karen

Reply

David Lane February 20, 2013 at 3:47 pm

There is a good discussion of what to do when the variances are unequal here: http://beheco.oxfordjournals.org/content/17/4/688.full and it presents a good solution that holds for unequal n.

I have a simulation that lets you explore the issue for the test that assumes homogeneity of variance here: http://onlinestatbook.com/2/tests_of_means/robust_sim.html

and a discussion of unequal n in multi-factor designs here: http://onlinestatbook.com/2/analysis_of_variance/unequal.html

Reply

Anne February 22, 2013 at 4:05 pm

Good afternoon Karen,

I have a question for you….my sample size is 351 (68 male/283 female).
I am comparing male/female on several continuous variables and using parametric tests; t-test and manova, etc.
The issue is the large difference between groups and feeling that I should conduct non parametrics? Would this ‘satisfy’ those reading my work? The results are the same with both para/non para., but I am concerned about the great differences due to the fact that this is my major hypothesis.

Thanks so much for your advise.
Anne

Reply

Karen March 4, 2013 at 11:03 am

Hi Anne,

If you’ve checked assumptions and have no problem with unequal variance, it’s fine.

That said, reviewers don’t always know that, so they may challenge you. If it would make you feel safer, and you are getting the same results anyway, there is nothing wrong with running it as a nonparametric for the t-test. You may have more trouble with the manova though–I don’t know of a nonparametric equivalent.

Karen

Reply

Karri Kauppinen March 18, 2013 at 8:45 am

I love you. Thanks! :)

Reply

hasna March 18, 2013 at 9:08 pm

how can treat with non-parametric paired t-test if you have unequal samples size using r ?

Reply

Karen April 2, 2013 at 5:48 pm

Unequal sample sizes ARE a problem if the data are paired. Do you mean that some pairs are missing one half of the pair?

Reply

Muj April 3, 2013 at 2:47 pm

Hey Karen,

I have conducted an ANOVA for 3 between factor groups A (n=26) B (n=19) and a neutral group (n=68). no significant effect was found, but i would like to know if this was likely due to the neutral group? what problems would the large size of this neutral group present for this situation?

Thanks,
Muj

Reply

Karen April 3, 2013 at 4:26 pm

Hi Muj,

I’m not sure what you mean by if it was due to that group. Because it has the largest size, it should have the narrowest standard error. It would entirely depend on the order of the three means. It’s the two small groups that would potentially cause problems. That’s where your power is limited.

Karen

Reply

Muj April 3, 2013 at 5:34 pm

Thanks for the prompt reply,
I am testing the effects of schizotypy on memory performances in particular accuracy and reaction time, its proposed that there would be a difference between high and low groups with high groups performing significantly worse…however no significant effect was found
the neutral group does have the narrowest standard error (25.95), compared to a low schizotypy group (43.58) and a high schizotypy group (50.98)…
means for RTare low = 848.58. high = 965.13. neutral =927.29

I was asked by my supervisor to comment on the potential problems of the large neutral group, could it be that she means that my other two samples were not as matched and had reduced power and so there was not a significant effect?

Sorry for my essay ^
but many thanks for your help! :D

Reply

Chantal April 25, 2013 at 8:55 am

Hi Karen,

I am working on my masterthesis and am confronted with a dataset with 2 unequal groups sizes (n=48, n=160) at baseline (T1). I have to test whether there is a difference between the two groups at baseline before the start of the treatment but also after 3 and 9 months (T2, T3). Besides, the second group size gets smaller over time (n=132 at T3), so I am wondering what test to perform to deal with these difficulties.
Hope to hear from you.

Greetz Chantal

Reply

Karen April 29, 2013 at 6:37 pm

Hi Greetz,

It’s really not a problem if the groups are unequal sizes. The bigger problem would be why one group is losing subjects over time but the other isn’t (although maybe I’m just assuming that last part)

Reply

Marco April 25, 2013 at 1:14 pm

Hi,

I’ve run a 2 (groups) x 3 (modalities) x 3 (intervals) mixed ANOVA.

Now, in group 1 there are 17 subjects, while in group 2 there are 15 subjects.

One reviewer asked if I applied any “correction” to take into account the different sample size.

I did not think this was a problem, above all with this small difference. Do you have any advice? What should I do?

Thank’s,

Marco

Reply

Karen April 29, 2013 at 6:36 pm

Hi Marco, there is no need to do anything, particularly if at least two of those IVs are manipulated. It’s only a problem if there’s a relationship among the IVs. Even so, those n’s are very similar, even if not equal.

Reply

Lia May 1, 2013 at 4:44 am

i have a 2 x 2 x 2 mixed anova design as well,
it’s a 2×2 repeated measures followed by a between group (gender).
but my sample size difference is 59 and 29, is that too big a difference?

Reply

Lia May 1, 2013 at 4:58 am

Also, past research have said females would generally do better, so with it at 59 and males at 29, should i report a possible confound?

Reply

Karen May 1, 2013 at 12:08 pm

Hi Lia,

It could. This is exactly the situation where the bigger sample of females could cause problems. Are the results the same within each gender?

Reply

Lia May 2, 2013 at 9:21 am

there is a marginal significance p=0.058 in only one of the interaction between gender and another IV

Reply

Leave a Comment

Previous post:

Next post: