When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA

by Karen

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand.  Sums of squares require a different formula if sample sizes are unequal, but SPSS (and other statistical software) will automatically use the right formula.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced.  Instead of the grand mean, you need to use a weighted mean.  That’s not a big deal if you’re aware of it.

The only practical issue in one-way ANOVA is that very unequal sample sizes can affect the homogeneity of variance assumption.  ANOVA is considered robust to moderate departures from this assumption, but the departure needs to stay smaller when the sample sizes are very different.  According to Keppel (1993), there isn’t a good rule of thumb for the point at which unequal sample sizes make heterogeneity of variance a problem.

Real issues with unequal sample sizes do occur in factorial ANOVA, if the sample sizes are confounded in the two (or more) factors.  For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are age (young vs. old) and marital status (married vs. not).  If there are twice as many young people as old and the young group has a much larger percentage of singles than the older group, the effect of marital status cannot be distinguished from the effect of age.

Power is based on the smallest sample size, so while it doesn’t hurt power to have more observations in the larger group, it doesn’t help either.

DABB_logoCould you use some affordable ongoing statistical training, along with the opportunity to ask questions about statistical topics that have been stumping you? Consider joining our Data Analysis Brown Bag program with monthly online seminars and question & answer sessions.

Send to Kindle

{ 178 comments… read them below or add one }

Anne February 4, 2013 at 9:50 am

Good morning Karen,

Great site!
I have a few questions:
My data: gender comparisons re knowledge, attitudes, beliefs.
Male n=68, female n=263.

1) I am running multiple regression, t-test and MANOVA.
I want to know if I need to run non parametrics to account for the unequal group n’s?
Doesn’t the Central Limit Theorem kick in due to my large sample sizes?

2) In my MANOVA, my Levene’s test shows two variables that are significant at both the.05 and .01 levels.
Should I not use MANOVA and look at other tests instead?

Thanks so much for your advice,


Karen February 6, 2013 at 2:31 pm

Hi Anne,

1) You can run nonparametrics, but it’s usually not necessary. It’s hard to say what you need to do in any specific situation without all the details.

2) I’m not sure I understand this question, and as for what you should do, see my response to #1. If you want to restate that, I can give you some info so you can decide what you should do. :)



David Lane February 20, 2013 at 3:47 pm

There is a good discussion of what to do when the variances are unequal here: http://beheco.oxfordjournals.org/content/17/4/688.full and it presents a good solution that holds for unequal n.

I have a simulation that lets you explore the issue for the test that assumes homogeneity of variance here: http://onlinestatbook.com/2/tests_of_means/robust_sim.html

and a discussion of unequal n in multi-factor designs here: http://onlinestatbook.com/2/analysis_of_variance/unequal.html


Anne February 22, 2013 at 4:05 pm

Good afternoon Karen,

I have a question for you….my sample size is 351 (68 male/283 female).
I am comparing male/female on several continuous variables and using parametric tests; t-test and manova, etc.
The issue is the large difference between groups and feeling that I should conduct non parametrics? Would this ‘satisfy’ those reading my work? The results are the same with both para/non para., but I am concerned about the great differences due to the fact that this is my major hypothesis.

Thanks so much for your advise.


Karen March 4, 2013 at 11:03 am

Hi Anne,

If you’ve checked assumptions and have no problem with unequal variance, it’s fine.

That said, reviewers don’t always know that, so they may challenge you. If it would make you feel safer, and you are getting the same results anyway, there is nothing wrong with running it as a nonparametric for the t-test. You may have more trouble with the manova though–I don’t know of a nonparametric equivalent.



Karri Kauppinen March 18, 2013 at 8:45 am

I love you. Thanks! :)


hasna March 18, 2013 at 9:08 pm

how can treat with non-parametric paired t-test if you have unequal samples size using r ?


Karen April 2, 2013 at 5:48 pm

Unequal sample sizes ARE a problem if the data are paired. Do you mean that some pairs are missing one half of the pair?


Muj April 3, 2013 at 2:47 pm

Hey Karen,

I have conducted an ANOVA for 3 between factor groups A (n=26) B (n=19) and a neutral group (n=68). no significant effect was found, but i would like to know if this was likely due to the neutral group? what problems would the large size of this neutral group present for this situation?



Karen April 3, 2013 at 4:26 pm

Hi Muj,

I’m not sure what you mean by if it was due to that group. Because it has the largest size, it should have the narrowest standard error. It would entirely depend on the order of the three means. It’s the two small groups that would potentially cause problems. That’s where your power is limited.



Muj April 3, 2013 at 5:34 pm

Thanks for the prompt reply,
I am testing the effects of schizotypy on memory performances in particular accuracy and reaction time, its proposed that there would be a difference between high and low groups with high groups performing significantly worse…however no significant effect was found
the neutral group does have the narrowest standard error (25.95), compared to a low schizotypy group (43.58) and a high schizotypy group (50.98)…
means for RTare low = 848.58. high = 965.13. neutral =927.29

I was asked by my supervisor to comment on the potential problems of the large neutral group, could it be that she means that my other two samples were not as matched and had reduced power and so there was not a significant effect?

Sorry for my essay ^
but many thanks for your help! :D


Chantal April 25, 2013 at 8:55 am

Hi Karen,

I am working on my masterthesis and am confronted with a dataset with 2 unequal groups sizes (n=48, n=160) at baseline (T1). I have to test whether there is a difference between the two groups at baseline before the start of the treatment but also after 3 and 9 months (T2, T3). Besides, the second group size gets smaller over time (n=132 at T3), so I am wondering what test to perform to deal with these difficulties.
Hope to hear from you.

Greetz Chantal


Karen April 29, 2013 at 6:37 pm

Hi Greetz,

It’s really not a problem if the groups are unequal sizes. The bigger problem would be why one group is losing subjects over time but the other isn’t (although maybe I’m just assuming that last part)


Marco April 25, 2013 at 1:14 pm


I’ve run a 2 (groups) x 3 (modalities) x 3 (intervals) mixed ANOVA.

Now, in group 1 there are 17 subjects, while in group 2 there are 15 subjects.

One reviewer asked if I applied any “correction” to take into account the different sample size.

I did not think this was a problem, above all with this small difference. Do you have any advice? What should I do?




Karen April 29, 2013 at 6:36 pm

Hi Marco, there is no need to do anything, particularly if at least two of those IVs are manipulated. It’s only a problem if there’s a relationship among the IVs. Even so, those n’s are very similar, even if not equal.


Lia May 1, 2013 at 4:44 am

i have a 2 x 2 x 2 mixed anova design as well,
it’s a 2×2 repeated measures followed by a between group (gender).
but my sample size difference is 59 and 29, is that too big a difference?


Lia May 1, 2013 at 4:58 am

Also, past research have said females would generally do better, so with it at 59 and males at 29, should i report a possible confound?


Karen May 1, 2013 at 12:08 pm

Hi Lia,

It could. This is exactly the situation where the bigger sample of females could cause problems. Are the results the same within each gender?


Lia May 2, 2013 at 9:21 am

there is a marginal significance p=0.058 in only one of the interaction between gender and another IV


Rebekah May 24, 2013 at 11:59 am

Hi Karen,

I have completed an independent samples t-test and because equal variances are not assumed, I go with the statistics which SPSS provides for that correction. However, my sample sizes are not similar (71/242) and therefore I have been taught to be very leery of the corrected t statistic. One solution I have been told is to select a random sample of the bigger group (so I would select 71 cases randomly out of the 242) and then run the test so that you have equal groups (71 to 71) to run your t test. Have you ever heard of this? Is this the most robust way of dealing with the issue of having both unequal variances and unequal n size?

Any help/suggestions would be much appreciated!


Karen June 6, 2013 at 5:20 pm

I have heard of that (just read it in a book again yesterday). You’re absolutely right that when the sample sizes are that different, you have to be careful about unequal variances.

Another option, btw, would be a nonparametric test, like Wilcoxon Rank Sum.


Richard June 2, 2013 at 5:29 pm

I have 3 sample groups I wish to compare. Sample A = 20 Sample B = 20 Sample C = 40. Do I need to adjust my ANOVA to compare them? If so, how do I calculate the weighted mean? The samples come from 3 different stakeholder groups i.e. different populations. Does make a difference when calculating the weighted mean?

All my data is in Excel. Is it possible to carry out an ANOVA with weighted mean in Excel?

Sorry for all the questions. Your help would be greatly appreciated.


Karen June 6, 2013 at 5:17 pm

Hi Richard,

I suspect is it possible to do an ANOVA with weighted means in Excel, but I don’t ever use Excel for data analysis, so I have no idea how.

You would need to do adjustments to means if you’re calculating by hand, but stat software will do it for you automatically.


Kathy June 12, 2013 at 2:25 pm

I am running both t-tests and logistic regression analyses looking at income differences between two groups. One group has 980 subjects; the other 9800. In another comparison, one group has 980 and the other group has 430,000.

I have run t-tests using the lincom function in stata (with unequal variances). I have also drawn a random sample of 10% of the larger group and re-run some of the analyses. While my means change slightly with the smaller samples, the overall patterns persist and statistical significance does not change.

I have a reviewer who has asked whether I have applied any corrections to take sample size differences into account. Would you suggest any additional corrections, other than what I have already done? The reviewer in particular questioned whether I could trust my results that indicated statistical significance, given the very different sizes between the two groups. Would you agree with this concern?

I appreciate any feedback!


Karen July 1, 2013 at 4:31 pm

Hi Kathy,

I understand doing corrections in a factorial situation, but you don’t have that. It sounds like you already tried the subset of the larger group, and got the same answer. I’m not sure what other corrections you’re supposed to try.


Chantalle June 12, 2013 at 8:26 pm

Hi Karen,

I’m hoping to run a one-way ANOVA with 4 independent factors. The sample sizes are 102, 100, 100 & 59. Levine’s test was significant (0.001) after an arcsin transformation (data were percentages). The distributions are normal.

I read somewhere that if there is less than a 5-fold difference in standard deviations, the ANOVA should still be robust, even with heterogeneity of variance, but the site did not list any references. In my case, there is a 1.49-fold difference between the largest and smallest standard deviation.

I was wondering whether you think I can use an ANOVA?

Also, I’m having trouble tracking down the paper you referenced (Keppel, 1993). In what journal was it published?

Thank you very much! :-)


Karen June 14, 2013 at 3:10 pm

HI Chantalle,

5-fold sounds higher than I’ve seen, but 1.49 is probably fine. Keppel is a textbook, not a journal article. Desighn and Analysis: A researcher’s Handbook is the title.



Mark Lowe June 13, 2013 at 11:11 pm

Hi Karen,
I am wanting to run a one-way between groups ANOVA, however my groups sizes are 88, 76 and 7. Do you have any suggestions or comments on whether this is going to provide useful information?


Karen June 14, 2013 at 3:21 pm

Hi Mark,

It’s hard to do any sort of comparison with only 7 observations in a group. That said, in some studies that’s all you ever have. This could be useful, but pay very close attention to those assumptions. A non-parametric test, like Kruskall-Wallis, may be a safer approach.


Kaye June 15, 2013 at 2:00 am

hi Karen,

I’m new in spss and research analysis hope you can help me. I am doing an analysis on the influence of teacher characteristics (ex.academic background) to student scores. i have 135 teachers and more than 4000 students. how should i prepare my data set so i can do a multiple regression? thank you!!!


Karen July 1, 2013 at 1:09 pm

Hi Kaye, if you’re looking at teacher characteristics on their students, you need to account for the fact that the students with the same teacher are not independent. You do this with a multilevel or mixed model. You can get a lot more info here: http://www.theanalysisfactor.com/category/mixed-and-multilevel-models/


Mike July 11, 2013 at 12:37 am

If my samples from two groups were slightly unbalanced (8 vs 9), but the homogeneity of variance was not violated (Levene’s test > 0.05). Does it mean that I could interpret the results as if the data were balanced? Thank you very much.


Karen July 15, 2013 at 3:49 pm



Mike July 16, 2013 at 8:44 am



Alex August 1, 2013 at 1:42 pm

Hi Karen,

I was hoping to use ANCOVA to compare a battery of neuropsychological tests in carriers vs non-carriers, controlling for age, gender and education level and I have three questions about that which I was hoping you could help me with. :)

Firstly, do I have to demean the covariates, before feeding them into (the) SPSS (multivariate general linear model)?

Secondly, is Levene’s Test of Equality of Error Variances the test I need to do to check if the variances are sufficiently similar to perform the ANCOVA on?

Lastly, assuming this is the case, what happens if Levene’s test is significant? Does it matter a lot for ANCOVA (or is it very robust anyway)? Is there a non-parametric alternative that I could use instead?

Thank you very much!



Karen August 7, 2013 at 3:32 pm

Hi Alex,

1. I’m not sure what you mean by demean. I assume you mean “mean center.’ If so, it’s not necessary, but can be helpful.
2. Levene’s is popular, but I don’t use it, at least not as a sole criterion.
3. It’s robust, unless sample sizes are quite different.


Grace August 12, 2013 at 4:33 am

Please help me with my assignment. I really dont know what to do cause our prof didn’t teach this yet and this is some kind of advance study for us but its so hard :(

HOMEWORK – Introduction to Analysis of Variance
A psychologist conducts a research to compare learning performance for three (3) species of monkeys. The animals are tested individually on a delayed-response task. A raisin is hidden in one of three containers while the animal watcher from its cage window. A shade is then pulled over the window for 1 minute to block the view. After this delay period, the monkey is allowed to respond by tipping over one container. If its response is correct, the monkey is rewarded with the raisin. The number of trials it takes before the animal makes five (5) consecutive correct responses is recorded. The researcher used all the available animals from each species which resulted in unequal sample size (n). The data is summarized below. Ref. (Gravetter, Frederick J.; Walnau, Larry B.;, 2012)


n=4 n=10 n=6 N=20
M=9 M=14 M=4 G=200
T=36 T=140 T=24

SS=200 SS=500 SS=320

Summary Table for One-Way ANOVA
Source SS df MS F

Fcrit = ? at alpha 0.05

Guide Questions:
1. Formulate the steps in hypothesis testing (10 pts)
2. Construct the summary table for One-way ANOVA (8 pts)
3. Identify if the problem uses one-tail or two-tail of alpha level? Explain why? (2 pts)


Karen September 5, 2013 at 4:50 pm

Hi Grace, while I appreciate how hard this can be, as a rule, I don’t help with homework. That’s what your TA is paid the big bucks to do. :)


David August 17, 2013 at 12:35 pm

Hi Karen,
I’m running Anova to compare means. Anova sig. = .129 but post hoc test concludes there’s significant difference at 0.05 level. How come?


Karen September 4, 2013 at 2:28 pm

Hi David,

I was going to refer you to another article, but just realized I haven’t written anything on this. It’s so important (and common). Here’s the quick answer:

1. They’re not actually testing the exact same thing.
2. The F test always trumps the post-hoc. If it’s not significant, don’t run a post-hoc. :)


Daniel August 22, 2013 at 9:12 am

Hi Karen,

I would be most grateful if you could help me as I have an ANCOVA question for you.

Two of my independent variables have unequal sample sizes, for example: the first variable (depression) was drawn from a student sample, the depression variable has 6 ordinal levels with: n=55, 16, 6, 5, 4, 1 (in each level of depression). The second variable (anxiety), also from a student sample and has 4 ordinal levels with: n=36, 28, 17, 6. As you probably assumed: when depression and anxiety increases the n for level of the respective group gets smaller (there are few subjects with higher levels of anxiety or depression in the sample).

Question: Should run the analysis as it is (I have used levene’s test of equality of error variance and it was non-significant), or should I merge i.e the levels 3-6 in the depression variable and 3 & 4 in the anxiety variable. What would you do?

Thank you very much for your time,


Karen September 4, 2013 at 2:20 pm

Hi Daniel,

There isn’t one right answer to this one, since you don’t seem to have problems with unequal variance.

But I can tell you a group with n=1 (the highest depression) has no variance, so isn’t useful. It is certainly reasonable to combine those groups, as long as it makes theoretical and logical sense.

And as long as those natural groupings aren’t giving you opposite results, it should help your power as well.


Mona September 1, 2013 at 10:21 am

In my paper, males and females compared through Manova Test. The number of males is 37 and females are 86. Is this difference of numbers affect the results? How can I justify this difference?



Ambika K.C. September 3, 2013 at 11:12 pm

Namste Mam
I have some problem in my statistics, I have two sample size one 18 and other 17 when i test normality, from Shapiro test(R) presenting p values of 17(sample size) 0.007442i.e p is less than o.o5 and (18 sample size) 0.3423 i.e p is greater than o.o5 respectively. With the p-Values it is observed that one has normal distribution but next does not present normal distribution. In this situation which test is suitable, Can i use Wilcox.test rank sum test (nonparametric test).
I have drawn this sample from one community Forest which is divided into two blocks one is unmanaged and other is managed block of CFs


Karen September 12, 2013 at 2:01 pm

Namaste Ambika,

I don’t like Shapiro Wilk test as a final decision maker about normality. I would first investigate what distributions you do have. If the one doesn’t look normal, why not? Skew? An Ourlier? Uniform?

That said, the Wilcoxon is considered distribution-free, so it’s safe to use, if it answers your research question.


Keneth Tumwebaze September 10, 2013 at 10:41 am

When I analyse data with ANOVA, I am able to present my p values and means in a table and this acceptable. However, i have a study in which i intend to KruskWallis and i would want to have my results in a table from. Is it order to put the medians or i use p values only? i have not come across this very later situation. Advice.


Karen September 12, 2013 at 1:57 pm

Hi Keneth, although technically a Kruskal Wallis is not testing medians, it is pretty common to report medians as a descriptive stat, along with the K-W test statistic and p-value.


Hector September 12, 2013 at 11:39 pm

Hi Karen,

Thank you for sharing your knowledge with us.
I have an ANCOVA question for you. I am trying to compare a treatment and a control group, across 8 different segments of people. My sample sizes for treatment and control groups for each of the 8 segments are not even. The worst uneven sample sizes are n(treatment)=20, n(control)=8. My results are showing significant difference between the treatment and control groups in only one of the eight segments, however the “observed power” for the test is much lower than 0.8. So, I am wondering whether these results are reliable at all?
If I want to increase the power, is there any way other than increasing the sample size (because I can not)? For instance, is there any other test?

Thank you for your help, in advance,


Karen September 25, 2013 at 10:37 am

Hi Hector,

Yes, if a test is insignificant and the true effect size is the effect size you measured, then you have insufficient power to detect that effect. You don’t need observed power to check that.

Here are pretty much the only -ways to increase power. http://www.theanalysisfactor.com/5-ways-to-increase-power-in-a-study/


sanaz September 15, 2013 at 2:25 am

I was wondering if you can help me to find an answer for my question?
I have collected 567 data on smoking status. 11 respondents (2.5%) are smoker and 553 (97.5%) are non-smoker. I want to conduct a t-test to compare these two groups regarding their difference in mean of another variables. Is is doable? I just ignored testing this variable due to very unbalanced sample size. is that right?

Thank you


Karen September 25, 2013 at 10:17 am

It’s doable. Just be very careful to check the equal variance assumption. The bigger issue is that 11 is very small, and you may not want to make inferences on the responses from 11 people.


nisha September 29, 2013 at 12:00 pm

hello mam,
my total sample is 218, divided into three different groups and count is: group a:65, group b:61, group c:92. i have to do comparison between these three groups. for that i used anova for comparison and after find the result (p) value i have to use post post hoc test. Could you please suggest me what type of post hoc test i can use in my study, because my sample is large.
thank you. please reply asap.


sufala October 1, 2013 at 1:43 am

Hi, i m doing a studt with six groups , so i have to do anova. but when i check for normality by using shepiro wilks test or kolmogrv test, data in two of the six groups is not normally distributed. can i still continue with anova or KW test?


hellen October 2, 2013 at 5:25 am

I am analysing my data using STATISTICA, I have a problem of getting standard error as zero across my dry matter variable yet other variables do not have a zero standard error. what could be the problem? Thank you


Karen October 7, 2013 at 11:27 am

Hi Hellen,

I would need a lot more information, and probably to actually see the analysis to figure this one out. It sounds like you’re overspecifying the model in some way.


Mohammed October 3, 2013 at 2:57 pm


I have 3 subgroups from the main group. The no. of sample in each group was 6,7,9. Can I use ANOVA or Kruskall Wallis H test in comparison and why?


yasmine October 14, 2013 at 5:12 pm

Hey Karen

I have a question, when running a one way anova with three levels (60, 62, 63 participants in each group) and one group not having met the normality assumption (although the histogram looks like it satisfies normality) but equal variance was met, what kind of post hoc test should I be using? and why?

thanks!!! :)


Karen October 16, 2013 at 9:54 am

Hi Yasmine,

There isn’t a post hoc for a situation of non-normality. If the normality is close enough for the ANOVA F test, it’s good enough for posthocs.


AMY October 23, 2013 at 10:40 pm


if I have three different sample sizes which are 48 , 46 and 44.

can I use one-way ANOVA.

Thanks. : )


Kevin Kirkpatrick December 11, 2013 at 6:21 am

I’m using ANOVA to compare user preference ratings R within various cities, for groups A, B, and C. Unfortunately, my group sizes are HUGELY skewed – group A will typically have 20,000 or more members per city, group B will have ~1,000, and group C can have as few as 100.

In response, I have been running ANOVA by
1) determining count of C members in each city, call this Cn (let’s say 130 C people in Dallas)
2) randomly pick Cn members from group A within each city, calling this a sample-A group (in contrast to population-A for the city). So in my hypothetical, this might mean picking 130 A ratings out of 25,000.
3) I then perform a one-sample t-test on the sample-A vs population-A within each city – in the Dallas hypothetical, comparing the 130 sample-A to the 25,000 population-A.
4) repeat steps 2 and 3 until until I get a sample-A selection with no significant difference from population-A for each city. This might mean I re-pick the 130 Dallas A ratings several times until I’ve picked a representative sample.
5) I repeat 2 – 4 for group B.
6) I perform my ANOVA test on Sample-A, Sample-B, and Sample-C within each city.

This seems to be working quite well; indeed, I’ve clearly identified cities where the ratings of A, B, and C groups truly seem to differ. However, I’m not an experience statistician, and since this approach feels ad-hoc, I’m curious as to whether the results would stand up to scrutiny.


Karen December 23, 2013 at 2:02 pm

Hi Kevin,

Your sampling seems fine. The one thing I would change, though, is eliminate steps 3-5. Those are still based on the very large pop size. As long as your sampling is truly random, there should theoretically be no difference between the mean of the population and the sample.


ryan December 11, 2013 at 2:20 pm

Hi Karen,

I get confused with my data analysis. Im about to study motivation towards grade achievement. The motivation is divided into 2 categories: intrinsic (interest and attitude) and also extrinsic (family, social, teaching style, learning style). grade is defined in term of A, A-, B+,B, B-, C+, C, C-, D and E. Since I have run the ANOVA one way test, the result shows there are sig. different among those means. But when I try to run the post hoc test, its comes out like this:
Post hoc tests are not performed for Gred because at least one group has fewer than two cases.

Can I know how to solve such problem please?? Im new in statistic..

Thanks =)


Karen December 23, 2013 at 1:29 pm

Hi Ryan,

It’s hard to tell exactly what is going on without looking at it, but it sounds like there is one group within your motivation categories with only one person. I would start with some frequency tables.


mauricio December 13, 2013 at 9:16 am

Hello. Than ks for the information. I would like to ask, what is recommended to use as post hoc when runnin on-way ANOVA with different size samples.
4 groups n = 10, 1 control group n = 30. thanks a lot :)


Karen December 23, 2013 at 1:24 pm

I would usually use a Tukey. Tukey Kramer is the version for unequal sample sizes.


Yannis January 14, 2014 at 12:00 pm

Sorry for double posting, I meant to create a new reply but replied to a post instead:

Hi Karen,

Thank you for this article, both the article and the discussions below are enlightening :)

Can I ask your opinion on one related thing; I want to run a two-way ANOVA with unequal sample sizes. The reason for the unequal sizes is that there is a third factor that doesn’t participate to this ANOVA and requires its own data points. What would be the way to go when downsizing the larger sample groups in terms of randomization?

To give an example, let’s say we compare responses from athletes and non-athletes, which are either male or female. So the factors are Gender (Male, Female) and Athlete (Yes, No). This will be analyzed with a two-way ANOVA, let’s call it ANOVA A. So we have:

Male Athletes: n=20
Male Non-Athletes: n=20
Female Athletes: n=40, but we want to make it n=20
Female Non-Athletes: n=40, but we want to make it n=20

The Female subjects are more because in the same study but a different analysis we will do exactly the same comparison, but with an added factor, eg. In-pregnancy (Yes, No), which doesn’t apply to males. So that one will be another two factor ANOVA, let’s call it ANOVA B:

Female Athletes In-Pregnancy: n=20
Female Non-Atheltes In-Pregnancy: n=20
Female Atheltes Not-In-Pregnancy: n=20
Female Non-Athletes Not-In-Pregnancy: n=20

How do we choose which females to use in the downsized group for ANOVA A? It sounds logical to randomly select 20 Female Athletes and 20 female Non-Athletes, but should we care if they are In-Pregnancy or not? Or should we account for that as well?

Thanks a lot,



Karen January 15, 2014 at 10:37 am

Hi Yannis,

That’s a great question.

I assume that if you had not had the pregnant/non-pregnant groups selected out for the second study, you would have just randomly selected 20 Female athletes and 20 female athletes. Unless it’s standard or relevant to find out if they’re pregnant, you wouldn’t ever know, right?

So there are two options for the study where pregnancy is not relevant.

1. Figure out what percentage of the female athlete population is usually pregnant at any given time, then sample your two samples at the same rate.
2. Decide that the population of interest is non-pregnant female athletes and just use that sample.


Shari January 22, 2014 at 9:27 pm

Hi Karen,

I’m looking at differences in fish weight between a control groups and 4 different treatments groups from experiment start to finish.

I am a Masters thesis student and have a run a 2-way ANOVA on my data to but have unequal groups (unavoidable and I was told this wouldn’t be a problem by my supervisors). I have 3 independent variables {sample period, treatment and frequency} and 1 dependent {weight}.

So turns out it is a problem – the levene’s test is 0.017. My data conforms to normality and my model is significant 0.018. My factor (sample period) which is significant to the .001.

Should I be running another stats test or is there a way to adjust for the lack of homogeneity?

Thanks for help!


Karen January 24, 2014 at 1:22 pm

Hi Shari,

I would investigate those variances more. Levene’s test isn’t very useful for testing assumptions (see Keppel, 1993).


Richa Gupta January 26, 2014 at 12:30 pm

Is it compulsory to have no of patients equal in both group for data analysis?? If not then can i exclude a single patient to remove bias at the end of study for analysis to make equal sample in both groups?


Karen February 3, 2014 at 4:16 pm

It’s not necessary at all, unless you had some sort of patient matching. It sounds like you don’t, so you’re good to go.


Manoj January 30, 2014 at 5:39 pm

Hi Karen,

Could you please help me with your valuable suggestions in stats?

I have three groups (n1=16, n2=23 and n3=24) with different sample sizes. I want to see the significant difference between these groups based on a parameter in common. Please let me know the best method or tool to analyse.




Karen February 3, 2014 at 5:24 pm

Hi Manoj,

Well it depends on which parameter you want to compare. If it’s the mean of each group on some dependent variable, then you can use one way ANOVA. The different sample sizes are no problem.



gautam February 2, 2014 at 12:32 pm

Hi. I have done an analysis on 3 groups. Group 1 has 24 subjest, group 2 has 398 and group 3 has 755 subjects. On analysing variable vomiting; group 1 had 12 subjects with vomiting out of 24 (50%); group 2 had 169 subjects out of 398 ( 42.5%) and group 3 had 270 out of 756 (35.8%) with vomiting. On analysis by chi square (3×3) pvalue was statistically significant ( .041). To find out which group differed from each other i did pair wise comaprison between group1 and2, group 1 and 3 and group 2 and 3. The pvalue for group 2 and 3 analysis was less than .05 thus statistically significant but for group 1 and 2 and group 2 and 3 the analysis was not statistically significant. My question is: the difference between group 2 with 42.5% of cases and 35.8% of cases with vomiting was statitically significant but why the difference between group 1 with 50% ( which is higher than proportion of cases seen in group 2) when comapred with group 3 with 35.8% was not statistically significant. Is it because of very less number of subjects in group 1 the difference was not sigmificant or something else.

Thank u.


Karen February 3, 2014 at 5:30 pm

Hi Gautam,

Yes, that’s probably it. With so few people in Group 1, you don’t have much power to find a difference.


Marko February 9, 2014 at 1:27 am

Hi Karen,

So glad I found this site! I’m having trouble accepting my analysis and perhaps I’m doing it wrong so hopefully you can shed some light.

My master’s thesis is on female choice. I conducted three-choice experiments in which females are presented 3 different acoustic stimuli simultaneously. I record which stimulus they choose as well as the time it took them to make the choice (latency). My issue is with the latency analysis. I assumed that a one-way ANOVA was a proper test because my independent factor is categorical (choice) and my dependent factor is continuous (latency–time).

My sample sizes:
Stimulus 1: 2
Stimulus 2: 10
Stimulus 3: 18

One issue I have is that the variance for the group with two individuals is HUGE, mainly because one female took her time to choose that stimulus, whereas another female chose that same stimulus rather quickly. I found no significance across the board, but is it because of that low sample size of group 1?

Thank you so much for your help. I really appreciate it.



Karen February 14, 2014 at 2:13 pm

Hi Marko,

Theoretically it doesn’t matter that your samples are unequal, but practically, you’re going to have a hard time if a sample is only 2.

Your choices are to run more subjects or drop that stimulus group. Unfortunately, that’s about all you can do. Since none of your groups is very large, running more subjects would be the best, if you can manage it.


Colin Jones February 12, 2014 at 4:20 pm

I am trying to figure out sample size of an article on socially conscious mutual funds. The article takes a look at industries/sectors that are screened out of these mutual funds in order to evaluate performance. The three independent sectors that are looked at are tobacco, alcohol, and gambling. Each sector is compared to the S&P 500 Index over an 11 year span. Tobacco has 15 stocks in the industry, alcohol-18, and gambling-22. Do you know what the number of the sample size would be for this? Would it be 3? Or 1, since they are all exclusive?


Karen February 14, 2014 at 2:15 pm

Hi Colin,

It’s hard for me to say without seeing the paper and exactly which analysis they’re doing and how. It could either be the number of stocks or it could be, as you suggested, the number of industries.


Maria February 21, 2014 at 11:15 am

Hi Karen, I´m running a 2×2 mixed ANOVA (between factor is gender male and female, within is measurement at Time 1 and Time 2) with 7 males and 29 females. Is it okay do to that or is the samplesizes too unequal? The variances in score (using two different scales) are mostly twice as much for woman than for men, for instance std. (man/woman) = 0.4/0.8 , 0.4/0.9 and the scores from the other scale 5.6/4.9 and 3.7/6.3. Or should I randomly (SPSS can do it) take 7 males and then perform the 2×2 mixed ANOVA?


Maria February 21, 2014 at 11:16 am

sorry I mean, pick randomly 7 females :)


Karen March 10, 2014 at 5:26 pm

Hi Maria,

This is tricky–unequal sample sizes are definitely a problem with two-way models, but at the same time 7 is a very, very small sample. Is there any way to get more males instead?


Sarah March 2, 2014 at 10:15 pm

I need to run an ANOVA with two samples (n is unequal for the groups) for several measurements. I am not able to carry this out, perhaps because the sample sizes are different? I am comparing 28 different categories between two groups at 3 different ages. How do I do this? I ran student t-tests that gave good information, but am now asked to run an ANOVA. Any help would be appreciated.


Karen March 10, 2014 at 5:10 pm

Hi Sarah,

I’m not sure I understand what is your DV. Is it the 28 categories? Or you’re saying you have 28 DVs?


Seaneen March 6, 2014 at 1:35 pm

Hi Karen, I am hoping you might be able to offer some suggestions regarding two questions I am struggling with for my data analysis.

1) I have one study which has shown a statistically siginificant difference between two sample groups, using a Mann-Whitney test as the data is not normal, however the groups are unequal in size (Group 1 = 3369, Group 2 = 1524). My supervisor has asked whether I can apply a correction factor to account for the difference in group size, however I was under the impression that the Mann-Whiteny already accounts for this? Any ideas??

2) Another study has two sample groups with almost exactly equal means (Group 1=5.67, Group 2=5.75), which to me intuitively says they are not statistically different, however again the data are not normally distributed (and not equal in size either Group 1=103, Group 2 = 221), so I am assuming I have to run a non-parametric test, which results in statistically significant differnece between the groups??

I hope that all makes sense!

Any light at all you can shed on this would be greatly appreciated, I have been struggling for days and have exhausted the textbooks and web pages!!! Thanks in advance!



Karen March 10, 2014 at 5:04 pm

Hi Seaneen,

1) No correction necessary. M-W is fine for unequal samples.
2) It’s possible to have so-small-it’s-not-interesting but statistically significant results. But another possibility is that the nonparametric test isn’t comparing means. If you have an outlier or two, that would affect means (possibly making them closer than say, the medians) but would not affect the nonparametric test. So it’s possible those two distributions have the same mean, but aren’t generally overlapping as much as the close means would indicate. I say graph them.


Seaneen March 13, 2014 at 11:24 am

Thanks so much Karen, that makes a bit more sense now! Will have a go at graphing them. Thanks again!


Stella March 22, 2014 at 7:10 am

Hello, Karen I’m glad I came across this site! Please I’m facing a challenge with my research work. I sampled 6 different land use types, replicated 4 land use types 5times and the other two, 4 and 2 (due to their limited size for sampling). Now I want to see to significant difference using a parameter between different replications and their means using ANOVA. This shows an unbalanced sampling, and I’ve tried to use Gabriel test but my variance shows unequal and my data is not normally distributed. Please, how do I go about this analysis? Thanks!


Karen April 4, 2014 at 12:59 pm

Hi Stella,

I’d have to know a lot more about your study and data to make suggestions about an analysis. I’m just not comfortable making suggestions as it’s too easy for someone to have left out crucial info. It seems you have a lot going on there. So I’d suggest a consultation.


Olivia March 22, 2014 at 11:37 pm

Hi I was wondering what the full reference is for Keppel (1993). I’m interested in looking at that paper. Thanks


Karen April 4, 2014 at 9:40 am

Hi Olivia,

It’s a book, not a paper. “Design and Analysis: A Researcher’s Handbook.”


Josh April 6, 2014 at 12:51 am

Hi Karen,

I’m doing an analysis on mechanical properties with one factor. I have 3 groups, group 1 (n=5), group 2 (n=9) and group 3 (n=8). I have read the comment people asked and the replied you have given. So am I right to say that for one way ANOVA, is alright to analysis different sample size per group.


Karen April 7, 2014 at 5:00 pm



Leave a Comment

Previous post:

Next post: