When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA

Stage 2

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand.  Sums of squares require a different formula* if sample sizes are unequal, but statistical software will automatically use the right formula. So we’re not too concerned. We’re definitely using software.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced.  Instead of the grand mean, you need to use a weighted mean.  That’s not a big deal if you’re aware of it.

But there are a few real issues with unequal sample sizes in ANOVA. They don’t invalidate an analysis, but it’s important to be aware of them as you’re interpreting your output.

Two Practical Issues for Unequal Sample Sizes in One-Way ANOVA

1. Assumption Robustness with Unequal Samples

The main practical issue in one-way ANOVA is that unequal sample sizes affect the robustness of the equal variance assumption.

ANOVA is considered robust to moderate departures from this assumption. But that’s not true when the sample sizes are very different.  According to Keppel (1993), there is no good rule of thumb for how unequal the sample sizes need to be for heterogeneity of variance to be a problem.

So if you have equal variances in your groups and unequal sample sizes, no problem. If you have unequal variances and equal sample sizes, no problem.

The only problem is if you have unequal variances and unequal sample sizes.

2. Power with Unequal samples

The statistical power of a hypothesis test that compares groups is highest when groups have equal sample sizes.

Power is based on the smallest sample size, so while it doesn’t hurt power to have more observations in the larger group, it doesn’t help either.

So if you have a specific number of individuals to randomly assign to groups, you’ll have the most power if you assign them equally.

If your grouping is a natural one, you’re not making decisions based on a total number of individuals. It’s very common to just happen to get a larger sample of one group compared to the others.

That doesn’t bias your test or give you incorrect results. It just means the power you have is based on the smaller sample.

So if you have 30 individuals with Treatment A and 40 individuals with Treatment B and 300 controls, that’s fine. It’s just that you could have stopped with 30 controls. The extra 270 didn’t help the power of this particular test.

Yes, this all holds true for independent samples t-tests

Independent samples t-tests are essentially a simplificiation of a one-way ANOVA for only two groups. In fact, if you run your t-test as an ANOVA, you’ll get the same p-value. And the between-groups F statistic will be the square of the t statistic you got in your t-test.

(Really, try it…. pretty cool, huh?)

This means they work the same way. Unbalanced t-tests have the same practical issues with unequal samples, but it doesn’t otherwise affect the validity or bias in the test.

Problems in Factorial ANOVA

Factorial ANOVA includes all those ANOVA models with more than one crossed factor. It generally involves one or more interaction terms.

Real issues with unequal sample sizes do occur in factorial ANOVA in one situation: when the sample sizes are confounded in the two (or more) factors. Let’s unpack this.

For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are Age (young vs. old) and Marital Status (married vs. not).

Let’s say there are twice as many young people as old. So unequal sample sizes.

And say the younger group has a much larger percentage of singles than the older group.  In other words, the two factors are not independent of each other.  The effect of marital status cannot be distinguished from the effect of age.

So you may get a big mean difference between the marital statuses, but it’s really being driven by age.

What about Chi Square Tests?

(This article is about ANOVA (and t-tests), but I’ve updated to include Chi-Square tests after getting a lot of questions).

There are a number of different chi-square tests, but the two that can seem concerning in this context are the Chi-Square Test of Independence and The Chi-Square Test of Homogeneity. Both have two categorical variables. Both count the the frequencies of the combinations of these categories.

They calculate the test statistic the same way. Without getting into the math, it’s basically a comparison of the actual frequencies of the combinations with the frequencies you’d expect under the null hypothesis.

And luckily, unequal sample sizes do not affect the ability to calculate that chi-square test statistic. It’s pretty rare to have equal sample sizes, in fact. The expected values take the sample sizes into account. So no problems at all here.

That said, when there is a third variable involved, you can have an issue with Simpson’s Paradox. You may or may not have collected that third variable, so it’s worth thinking about whether there could be something else that is creating an association in a combination of two groups of that third variable that doesn’t exist in each group alone.

But that’s not really an issue with unequal sample sizes. That’s an issue of omitting an important variable from an analysis.

Updated Dec 18, 2020 to add more detail


Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions


  1. Nidhi says

    Hi there,
    I have two groups: one has a sample size of 41 and the other has a sample size of 59. I have one IV for which I am want to see its effects on three different DVs, each DV having four levels. My distribution is a non-normal one and an assessment of normality even after the log transformation doesn’t make my distribution normal. So what kind of analysis should I use and how do I deal with unequal sample sizes?
    Kindly help me out here
    Thank you,

  2. PK says

    Hi, I was wondering if you have references that I can cite in my paper. This has been extremely useful. Thank you!

  3. Annalise LaPlume says

    This is an excellent article. Thank you.

    What do you suggest doing for the marriage x age example you mentioned, when sample sizes are confounded in two factors?

    Your example “For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are Age (young vs. old) and Marital Status (married vs. not).

    Let’s say there are twice as many young people as old. So unequal sample sizes.

    And say the younger group has a much larger percentage of singles than the older group. In other words, the two factors are not independent of each other. The effect of marital status cannot be distinguished from the effect of age.”

    • Karen Grace-Martin says


      Great question.

      Well the first thing to do is to simply interpret with this in mind. Don’t just assume you can interpret the effect of one variable out of context with the other.

      If you have a large enough sample, you can also take a random subsample from your larger group to stratify. So for example, you could randomly subsample within each of the four categories of Age and Marital status so that you have exactly 50 (or whatever number suits this) young singles; young marrieds; old singles; old marrieds.

  4. Lee Tinkler says

    Hello, great article! I am looking to compare information between 2 groups from 2007 to 2017. Normally I’d use a t-test but for some of the groups the totals are different i.e. in 2007 n = 115 whereas in 2017 n = 84. I want to be able to see if there are any differences between the 2 years and if any improvements have been made.

    Any advice would be much appreciated!



  5. David M Scott says

    I am running a study on the differences between distance learning, hybrid learning, and face to face. I have three different numbers for the groups Hybrid (N= 312); Distance (N= 1131), and Face to Face (N= 1007). The syllabi are the same for each group. I was running a Krustal-Wallis Test to determine any significance between each group. Then it occurred to me that I had different numbers for each group. Is this a problem for this study?


  6. MAK says

    Dear Sir
    I have two variables with unequal sample size. I has 46 and 2nd has only 13 samples. How I will determine the significant difference among them

  7. Dr. Qaiser Abbas says

    hello everyone..
    i have conducted a clinical trial in four groups of animals with different sample
    size A(6) animals B(7) animals C(6) animals D(10) animals …i have checked effect of different medicine …observation was taken repeatedly untill the desired results obtained …each animal show different time of recovery ..so i have different sample size with unequal repeated observation …please suggest me a proper statistical analysis model.

  8. franciele says

    Hi, I am running a factorial anova with 3 factors and 2 groups as IV
    My groups are divided as 28 and 19 for one group
    And 21 and 25 for the other group
    DO you think this might be a problem for me when running the anova?


  9. Per says

    Hi Karen,

    I am conducting a linear regression on Valence ratings (continuous response variable) with Condition (4-level factor) and Culture (x-level factor) as explanatory variable, using non-parametric bootstrapping because of non-normality and most importantly heteroskedasticity in my data. I am wondering which culture groups I can include in my analyses, given that the number of observations per culture group in each condition are very unequal, going from 8 to 217. Do you think that I should aim for a minimum of, say, 20 observations per level of Culture in each level of Condition? Is there a rule of thumb about the ratio of minimum:maximum number of observations per cell? Or of minimum number of observations in general?
    Thank you!

  10. Ayala says

    Hi Karen,

    Thanks for the website!
    I have a bit of a different question related to this topic. I have a repeated measures design with 3 conditions. In condition A there are 150 observations for each participant, in condition B I have 20 observations for each participant and in C I have 150 observations for each participant. The total number of participants is 24 (i.e., each of the 24 subjects did all the three conditions in the experiment).

    How is repeated measures ANOVA affected by this unequal numbers of observations in each condition? Would you happen to know where I can read more about this?

    Thank you!

  11. Kenneth Lewis says

    I have a question.
    What do you suggest I do to compare the means of two groups when group 1 has a sample size of 17 persons and group 2 has a sample size of 82 persons?
    The sample variance values for the two groups are not that different.
    The sample means don’t look all that different.
    The only concern I have is that group 1 has n1 = 17 respondents and group 2 has n2 = 82 respondents.
    Any help is greatly appreciated.

    Kenneth Lewis

  12. Yacine HAJJI says

    Dear Karen,

    Thank you for this article, it is very interesting.
    I have a question linked to this problematic.

    When applying a post-hoc test comparing each group of the ANOVA with only one (say vehicle group versus all group doses of a treatment; with a Dunnett step-down post-hoc comparison), and you chose to higher the sample size of the vehicle at the cost of other groups’ sample size, are there known scenarios in which the power of the comparisons will be higher than in the balanced design? (without an alpha risk inflation?)

    Thank you in advance

  13. Baran says

    I have 3 different levels of English proficiency taking 6 different tests. However, all these different groups have different numbers of examinees. The first group has 490 participants, the second group has 1919 participants and the third group has 529 participants. Thus, I can say that I have unequal sample sizes for Mixed ANOVA.
    When I do the analysis by using SPSS, it calculates the sum of squares and degrees of freedom by using the minimum sample size of the first group, which is 490. Is there a way to make SPSS analyze all the data of the unequal groups?

  14. Muhammad I Khan says

    Hello; Some variables in my data set are non-normal and my data also not independent, the data have also unequal sample or unbalanced data. Please suggest me what statistical test I should adopt.


  15. A. S. says

    Hi! You talk about real issues with unequal sample sizes/variances in factorial anova, is this less of an issue when there is only one IV?

  16. Tonette Aclan says

    Its not a comment but a question. How can I compute for a sample size when I have 2 groups to choose from? I mean, see I have data for the population size of both male and female households in a particular site however, they are unequal. I need respondents from each group because I am having a comparative analysis.

  17. Yohay says

    Hi Karen, Great site!!

    I was wondering where can I find the formulas for calculating 2-way Anova for non-balanced samples.
    Moreover, I’ll be happy know the differences between 2 way Anova with and without replication (and formulas for both cases will be great).

    Best wishes,

  18. Demtiw says

    Very educative discussions. I am working on a research, which entails 2way unequal sample size. i am wondering if the SPSS version 20 can perform such task because thats what i have on my system.

    thanks as i look forward to hearing from you.

  19. Imran says

    Hi Karen

    I need your help regarding my project . I divided the patients according to severity of disease in three groups , Group A=55, Group B=29 and Group c=30. I want to apply one-way anova but my data is not normally distributed and i need mean and standard deviation . Is it ok that if i will continue with one way anova. In my second project i have two group Group A=79 and Group B=35 and i want to apply independent t-test but again problem is that the data is not normally distributed. Please suggest me.

    I will be really grateful to you

    Dr.Mudassar Imran

    • ria sadhu says

      For each population,the response variable that you want to measure is not normally distributed,then if the sample size is large enough then there is no need for normality because the 3 sample size and 3 sample standard deviation will be close to 3 population parameters which is required if null hypothesis is true.

  20. ioana says

    great discussion!
    I was wondering if the repeated-measures ANOVA using STATISTICA software is adjusting the sums of squares equation for unequal samples size like SPSS does?

  21. chinedu says

    i am doing a study on the prevalence and patterns of urinary tract infection amongst pregnant women attending a particular hospital in my country, comparing them to the non pregnant controls. i attained my study sample based on the prevalence. please how do i attain a formula to calculate the sample size now since i have been asked to stratify my pregnant patients into ist , 2nd and 3rd trimester?

  22. Helena says

    Hi Karen,

    I have a question related to unequal sample sizes. I have a 2 (language background first language speaker (L1)/second language (L2) speaker) x3 (visual status: early blind/late blind/sighted) design. I investigate whether it is an advantage to have become blind as a child when it comes to second language acquisition.

    In total I have N80: 40 L1 speakers and 40 L2 speakers (equal sizes), and each of these two groups have 11 early blind, 9 late blind and 20 sighted participants. Are these unequal sample sizes related to visual status a problem when using a 2×3 Anova? What do you suggest?

    Many many thanks!

  23. manisha says

    Hello Sir,
    Sir in my research study, i had done work in three groups Group A( n=50), Group B( n=50) and Group C(n=25), i have used one way anova. is there any problem for selection of uneven sample size of Group C or it may affect statistical analysis. Please sir advice mee.
    Thanking u

  24. Mon says

    Hi, I’m doing study for me Bachelor of Science thesis too. Currently, I’m having problem with data analysis. My experiment design is 3×3 factorial design which consists of two independent variables (frying temperature and frying duration). However, for the duration factor is abit special which it has different duration). The setting is fried at 140C for 4, 5, 6 minutes while 160 and 180C fried for 1,2 and 3mins). Shall I use one way ANOVA or two way to analyse the effect on my sample?


  25. Vicki says

    Hi Karen,

    I’m looking at the spatial variation of fish parasites for my Bachelor of Science thesis. I want to compare mean parasite abundance between male (n=71) and female (n=105) fish. I log transformed the parasite data and it has a normal distribution and equal variances, I was just wondering if I can use a One-way ANOVA to compare the mean abundance between sexes or would it be safer to just apply a non parametric Mann-Whitney U or Kruskal Wallis Test. Hope to hear from you.

  26. Yang says

    Hi Karen,

    My experiment model have two factors – temperature and different time points. I performed 2-Way GLM for the unequal sample size I have. However, it seems that there is no effect from the interaction of two factors and the temperature itself. My question is that will the result of comparison between two temp at different time points be valid if I perform them using one-way GLM after the no significant finding in the initial 2-way GLM?


  27. Kalyani says

    Hi Karen,

    I hope you can help me. I’m trying to finish a paper for this term. I’ve just run two ANCOVAs. There were no problems with outliers, some problems with normality (skew and kurtosis <|3| although formal tests were significant), no problems with collinearity, correlation between covariates or homogeneity of slopes. Levene's test was significant for both ANCOVAS. The cell sizes and SDs are as follows:

    ANCOVA with DV "A"
    N=30, SD=1.31
    N=78, SD=1.16
    N=55, SD=.88
    N=171, SD =1.21

    ANCOVA with DV "B"
    N=30, SD=0.91
    N=78, SD=0.89
    N=55, SD=.74
    N=171, SD =.72

    I realize that the smallest group has the highest variance in both cases. I hate to transform variables since it makes interpretation so complicated. What other options do I have?



  28. Muluken Tigistu says

    in my paper i am comparing the psychological well being of orphan and non-orphan children.sample size of n1=166,n2=333.is there a problem in computing independent t-test?

  29. trish says


    I have data of 2 years 2004 and 2008 and I realized the sample size is not equal for both of the year how can i do data cleaning in stata in this case..

  30. Somayyeh says

    Dear Karen

    Thanks for the information that you provided here. I have the same issue. I have a caregiver group of 96 and 42 control participants that I compare them on one variables. I checked for the variance and there were no significant differences in the variance. so I guess that refer to that. However, do you know any published book that I can cite?


  31. TJ says

    Hi Karen,

    I am working with a data set that has n~200, n~13, n~20. I would like to do an ANOVA but I am not sure how to approach this. What was the sample 20/200 you mentioned? Would a weighted mean account for these differences? The number of samples is also related to the number of interesting components for that group (not due to poor sampling).

  32. TW says

    Hi, I found your website very helpful and have a few of questions:

    1) I have data of a entire population and am comparing the means of three groups. Do I still need to do significance testing since this isn’t really sampling?

    2) The 3 groups have very different size (1200, 12000, 40000). I found the data not normal, so can I just use Kruskall-Wallis test?

    3) I understand ANOVA is popular but I never found any data set that is normal. i.e. shapiro wilks test or kolmogrv test always have sig. <0 so I kept on using Kruskall walli test. Is that ok?

    Many thanks!

  33. Josh says

    Hi Karen,

    I’m doing an analysis on mechanical properties with one factor. I have 3 groups, group 1 (n=5), group 2 (n=9) and group 3 (n=8). I have read the comment people asked and the replied you have given. So am I right to say that for one way ANOVA, is alright to analysis different sample size per group.

  34. Olivia says

    Hi I was wondering what the full reference is for Keppel (1993). I’m interested in looking at that paper. Thanks

  35. Stella says

    Hello, Karen I’m glad I came across this site! Please I’m facing a challenge with my research work. I sampled 6 different land use types, replicated 4 land use types 5times and the other two, 4 and 2 (due to their limited size for sampling). Now I want to see to significant difference using a parameter between different replications and their means using ANOVA. This shows an unbalanced sampling, and I’ve tried to use Gabriel test but my variance shows unequal and my data is not normally distributed. Please, how do I go about this analysis? Thanks!

    • Karen says

      Hi Stella,

      I’d have to know a lot more about your study and data to make suggestions about an analysis. I’m just not comfortable making suggestions as it’s too easy for someone to have left out crucial info. It seems you have a lot going on there. So I’d suggest a consultation.

  36. Lou says

    Hi Karen,
    I am in the process of collecting data and plan to use a 2 (gender, between subjects) x 3 (condition, between subjects) x 3 (time of testing, within subjects) ANOVA to analyse my data.

    I want to run an a priori power analysis to check how many participants I should have in each cell. I am unsure if I am using Gpower correctly (particularly if an effect size of .3 is ok), but it gives me a sample of 102 overall (17 per cell?). I wonder if this seems right and if having vastly mismatched cells will matter? (some cells currently have 49 participants).

    Thank you in advance!

  37. Seaneen says

    Hi Karen, I am hoping you might be able to offer some suggestions regarding two questions I am struggling with for my data analysis.

    1) I have one study which has shown a statistically siginificant difference between two sample groups, using a Mann-Whitney test as the data is not normal, however the groups are unequal in size (Group 1 = 3369, Group 2 = 1524). My supervisor has asked whether I can apply a correction factor to account for the difference in group size, however I was under the impression that the Mann-Whiteny already accounts for this? Any ideas??

    2) Another study has two sample groups with almost exactly equal means (Group 1=5.67, Group 2=5.75), which to me intuitively says they are not statistically different, however again the data are not normally distributed (and not equal in size either Group 1=103, Group 2 = 221), so I am assuming I have to run a non-parametric test, which results in statistically significant differnece between the groups??

    I hope that all makes sense!

    Any light at all you can shed on this would be greatly appreciated, I have been struggling for days and have exhausted the textbooks and web pages!!! Thanks in advance!


    • Karen says

      Hi Seaneen,

      1) No correction necessary. M-W is fine for unequal samples.
      2) It’s possible to have so-small-it’s-not-interesting but statistically significant results. But another possibility is that the nonparametric test isn’t comparing means. If you have an outlier or two, that would affect means (possibly making them closer than say, the medians) but would not affect the nonparametric test. So it’s possible those two distributions have the same mean, but aren’t generally overlapping as much as the close means would indicate. I say graph them.

  38. Sarah says

    I need to run an ANOVA with two samples (n is unequal for the groups) for several measurements. I am not able to carry this out, perhaps because the sample sizes are different? I am comparing 28 different categories between two groups at 3 different ages. How do I do this? I ran student t-tests that gave good information, but am now asked to run an ANOVA. Any help would be appreciated.

    • Karen says

      Hi Maria,

      This is tricky–unequal sample sizes are definitely a problem with two-way models, but at the same time 7 is a very, very small sample. Is there any way to get more males instead?

  39. Maria says

    Hi Karen, I´m running a 2×2 mixed ANOVA (between factor is gender male and female, within is measurement at Time 1 and Time 2) with 7 males and 29 females. Is it okay do to that or is the samplesizes too unequal? The variances in score (using two different scales) are mostly twice as much for woman than for men, for instance std. (man/woman) = 0.4/0.8 , 0.4/0.9 and the scores from the other scale 5.6/4.9 and 3.7/6.3. Or should I randomly (SPSS can do it) take 7 males and then perform the 2×2 mixed ANOVA?

  40. Nidhi says

    I m using a multiple regression for my research project. My sample sizes are unequal like students-720, parents-135 and teachers-80. I want to find the effect of parents and teachers on students. I have used SPSS software to calculate it, but still want to confirm from you whether you can do muliple regression with unequal sample size. Pls help me as i am confused and stuck in this. Thanks.

  41. Ali says

    Hi Keren,

    I am a business student and i dont have a strong statistic background but im not afraid of learning if there are any articles that can help please let me know. I have three variables. one is independent, second a mediator and third is dependent. Data will be collected from managers and employees. IV and DV data will be collected from managers and mediator data from employees. Now the problem is if there are 20 managers and there are 100 employees. I was following baren and kenny (1986) approach and Jud and kenny (1981b) recomendation to run regresson models to analyze data . Now im looking at other techniques due to unequal sample size. Can i analyze data in anova if there is any artice on this sourt of problem please let me know i appreciate any help i get. Thanks

  42. Colin Jones says

    I am trying to figure out sample size of an article on socially conscious mutual funds. The article takes a look at industries/sectors that are screened out of these mutual funds in order to evaluate performance. The three independent sectors that are looked at are tobacco, alcohol, and gambling. Each sector is compared to the S&P 500 Index over an 11 year span. Tobacco has 15 stocks in the industry, alcohol-18, and gambling-22. Do you know what the number of the sample size would be for this? Would it be 3? Or 1, since they are all exclusive?

    • Karen says

      Hi Colin,

      It’s hard for me to say without seeing the paper and exactly which analysis they’re doing and how. It could either be the number of stocks or it could be, as you suggested, the number of industries.

  43. Marko says

    Hi Karen,

    So glad I found this site! I’m having trouble accepting my analysis and perhaps I’m doing it wrong so hopefully you can shed some light.

    My master’s thesis is on female choice. I conducted three-choice experiments in which females are presented 3 different acoustic stimuli simultaneously. I record which stimulus they choose as well as the time it took them to make the choice (latency). My issue is with the latency analysis. I assumed that a one-way ANOVA was a proper test because my independent factor is categorical (choice) and my dependent factor is continuous (latency–time).

    My sample sizes:
    Stimulus 1: 2
    Stimulus 2: 10
    Stimulus 3: 18

    One issue I have is that the variance for the group with two individuals is HUGE, mainly because one female took her time to choose that stimulus, whereas another female chose that same stimulus rather quickly. I found no significance across the board, but is it because of that low sample size of group 1?

    Thank you so much for your help. I really appreciate it.


    • Karen says

      Hi Marko,

      Theoretically it doesn’t matter that your samples are unequal, but practically, you’re going to have a hard time if a sample is only 2.

      Your choices are to run more subjects or drop that stimulus group. Unfortunately, that’s about all you can do. Since none of your groups is very large, running more subjects would be the best, if you can manage it.

  44. gautam says

    Hi. I have done an analysis on 3 groups. Group 1 has 24 subjest, group 2 has 398 and group 3 has 755 subjects. On analysing variable vomiting; group 1 had 12 subjects with vomiting out of 24 (50%); group 2 had 169 subjects out of 398 ( 42.5%) and group 3 had 270 out of 756 (35.8%) with vomiting. On analysis by chi square (3×3) pvalue was statistically significant ( .041). To find out which group differed from each other i did pair wise comaprison between group1 and2, group 1 and 3 and group 2 and 3. The pvalue for group 2 and 3 analysis was less than .05 thus statistically significant but for group 1 and 2 and group 2 and 3 the analysis was not statistically significant. My question is: the difference between group 2 with 42.5% of cases and 35.8% of cases with vomiting was statitically significant but why the difference between group 1 with 50% ( which is higher than proportion of cases seen in group 2) when comapred with group 3 with 35.8% was not statistically significant. Is it because of very less number of subjects in group 1 the difference was not sigmificant or something else.

    Thank u.

  45. Manoj says

    Hi Karen,

    Could you please help me with your valuable suggestions in stats?

    I have three groups (n1=16, n2=23 and n3=24) with different sample sizes. I want to see the significant difference between these groups based on a parameter in common. Please let me know the best method or tool to analyse.



    • Karen says

      Hi Manoj,

      Well it depends on which parameter you want to compare. If it’s the mean of each group on some dependent variable, then you can use one way ANOVA. The different sample sizes are no problem.


  46. Richa Gupta says

    Is it compulsory to have no of patients equal in both group for data analysis?? If not then can i exclude a single patient to remove bias at the end of study for analysis to make equal sample in both groups?

  47. Shari says

    Hi Karen,

    I’m looking at differences in fish weight between a control groups and 4 different treatments groups from experiment start to finish.

    I am a Masters thesis student and have a run a 2-way ANOVA on my data to but have unequal groups (unavoidable and I was told this wouldn’t be a problem by my supervisors). I have 3 independent variables {sample period, treatment and frequency} and 1 dependent {weight}.

    So turns out it is a problem – the levene’s test is 0.017. My data conforms to normality and my model is significant 0.018. My factor (sample period) which is significant to the .001.

    Should I be running another stats test or is there a way to adjust for the lack of homogeneity?

    Thanks for help!

  48. Yannis says

    Sorry for double posting, I meant to create a new reply but replied to a post instead:

    Hi Karen,

    Thank you for this article, both the article and the discussions below are enlightening 🙂

    Can I ask your opinion on one related thing; I want to run a two-way ANOVA with unequal sample sizes. The reason for the unequal sizes is that there is a third factor that doesn’t participate to this ANOVA and requires its own data points. What would be the way to go when downsizing the larger sample groups in terms of randomization?

    To give an example, let’s say we compare responses from athletes and non-athletes, which are either male or female. So the factors are Gender (Male, Female) and Athlete (Yes, No). This will be analyzed with a two-way ANOVA, let’s call it ANOVA A. So we have:

    Male Athletes: n=20
    Male Non-Athletes: n=20
    Female Athletes: n=40, but we want to make it n=20
    Female Non-Athletes: n=40, but we want to make it n=20

    The Female subjects are more because in the same study but a different analysis we will do exactly the same comparison, but with an added factor, eg. In-pregnancy (Yes, No), which doesn’t apply to males. So that one will be another two factor ANOVA, let’s call it ANOVA B:

    Female Athletes In-Pregnancy: n=20
    Female Non-Atheltes In-Pregnancy: n=20
    Female Atheltes Not-In-Pregnancy: n=20
    Female Non-Athletes Not-In-Pregnancy: n=20

    How do we choose which females to use in the downsized group for ANOVA A? It sounds logical to randomly select 20 Female Athletes and 20 female Non-Athletes, but should we care if they are In-Pregnancy or not? Or should we account for that as well?

    Thanks a lot,


    • Karen says

      Hi Yannis,

      That’s a great question.

      I assume that if you had not had the pregnant/non-pregnant groups selected out for the second study, you would have just randomly selected 20 Female athletes and 20 female athletes. Unless it’s standard or relevant to find out if they’re pregnant, you wouldn’t ever know, right?

      So there are two options for the study where pregnancy is not relevant.

      1. Figure out what percentage of the female athlete population is usually pregnant at any given time, then sample your two samples at the same rate.
      2. Decide that the population of interest is non-pregnant female athletes and just use that sample.

  49. mauricio says

    Hello. Than ks for the information. I would like to ask, what is recommended to use as post hoc when runnin on-way ANOVA with different size samples.
    4 groups n = 10, 1 control group n = 30. thanks a lot 🙂

  50. ryan says

    Hi Karen,

    I get confused with my data analysis. Im about to study motivation towards grade achievement. The motivation is divided into 2 categories: intrinsic (interest and attitude) and also extrinsic (family, social, teaching style, learning style). grade is defined in term of A, A-, B+,B, B-, C+, C, C-, D and E. Since I have run the ANOVA one way test, the result shows there are sig. different among those means. But when I try to run the post hoc test, its comes out like this:
    Post hoc tests are not performed for Gred because at least one group has fewer than two cases.

    Can I know how to solve such problem please?? Im new in statistic..

    Thanks =)

    • Karen says

      Hi Ryan,

      It’s hard to tell exactly what is going on without looking at it, but it sounds like there is one group within your motivation categories with only one person. I would start with some frequency tables.

  51. Kevin Kirkpatrick says

    I’m using ANOVA to compare user preference ratings R within various cities, for groups A, B, and C. Unfortunately, my group sizes are HUGELY skewed – group A will typically have 20,000 or more members per city, group B will have ~1,000, and group C can have as few as 100.

    In response, I have been running ANOVA by
    1) determining count of C members in each city, call this Cn (let’s say 130 C people in Dallas)
    2) randomly pick Cn members from group A within each city, calling this a sample-A group (in contrast to population-A for the city). So in my hypothetical, this might mean picking 130 A ratings out of 25,000.
    3) I then perform a one-sample t-test on the sample-A vs population-A within each city – in the Dallas hypothetical, comparing the 130 sample-A to the 25,000 population-A.
    4) repeat steps 2 and 3 until until I get a sample-A selection with no significant difference from population-A for each city. This might mean I re-pick the 130 Dallas A ratings several times until I’ve picked a representative sample.
    5) I repeat 2 – 4 for group B.
    6) I perform my ANOVA test on Sample-A, Sample-B, and Sample-C within each city.

    This seems to be working quite well; indeed, I’ve clearly identified cities where the ratings of A, B, and C groups truly seem to differ. However, I’m not an experience statistician, and since this approach feels ad-hoc, I’m curious as to whether the results would stand up to scrutiny.

    • Karen says

      Hi Kevin,

      Your sampling seems fine. The one thing I would change, though, is eliminate steps 3-5. Those are still based on the very large pop size. As long as your sampling is truly random, there should theoretically be no difference between the mean of the population and the sample.

  52. yasmine says

    Hey Karen

    I have a question, when running a one way anova with three levels (60, 62, 63 participants in each group) and one group not having met the normality assumption (although the histogram looks like it satisfies normality) but equal variance was met, what kind of post hoc test should I be using? and why?

    thanks!!! 🙂

    • Karen says

      Hi Yasmine,

      There isn’t a post hoc for a situation of non-normality. If the normality is close enough for the ANOVA F test, it’s good enough for posthocs.

  53. Mohammed says


    I have 3 subgroups from the main group. The no. of sample in each group was 6,7,9. Can I use ANOVA or Kruskall Wallis H test in comparison and why?

  54. hellen says

    I am analysing my data using STATISTICA, I have a problem of getting standard error as zero across my dry matter variable yet other variables do not have a zero standard error. what could be the problem? Thank you

    • Karen says

      Hi Hellen,

      I would need a lot more information, and probably to actually see the analysis to figure this one out. It sounds like you’re overspecifying the model in some way.

  55. sufala says

    Hi, i m doing a studt with six groups , so i have to do anova. but when i check for normality by using shepiro wilks test or kolmogrv test, data in two of the six groups is not normally distributed. can i still continue with anova or KW test?

  56. nisha says

    hello mam,
    my total sample is 218, divided into three different groups and count is: group a:65, group b:61, group c:92. i have to do comparison between these three groups. for that i used anova for comparison and after find the result (p) value i have to use post post hoc test. Could you please suggest me what type of post hoc test i can use in my study, because my sample is large.
    thank you. please reply asap.

  57. sanaz says

    I was wondering if you can help me to find an answer for my question?
    I have collected 567 data on smoking status. 11 respondents (2.5%) are smoker and 553 (97.5%) are non-smoker. I want to conduct a t-test to compare these two groups regarding their difference in mean of another variables. Is is doable? I just ignored testing this variable due to very unbalanced sample size. is that right?

    Thank you

    • Karen says

      It’s doable. Just be very careful to check the equal variance assumption. The bigger issue is that 11 is very small, and you may not want to make inferences on the responses from 11 people.

  58. Hector says

    Hi Karen,

    Thank you for sharing your knowledge with us.
    I have an ANCOVA question for you. I am trying to compare a treatment and a control group, across 8 different segments of people. My sample sizes for treatment and control groups for each of the 8 segments are not even. The worst uneven sample sizes are n(treatment)=20, n(control)=8. My results are showing significant difference between the treatment and control groups in only one of the eight segments, however the “observed power” for the test is much lower than 0.8. So, I am wondering whether these results are reliable at all?
    If I want to increase the power, is there any way other than increasing the sample size (because I can not)? For instance, is there any other test?

    Thank you for your help, in advance,

  59. Keneth Tumwebaze says

    When I analyse data with ANOVA, I am able to present my p values and means in a table and this acceptable. However, i have a study in which i intend to KruskWallis and i would want to have my results in a table from. Is it order to put the medians or i use p values only? i have not come across this very later situation. Advice.

    • Karen says

      Hi Keneth, although technically a Kruskal Wallis is not testing medians, it is pretty common to report medians as a descriptive stat, along with the K-W test statistic and p-value.

  60. Ambika K.C. says

    Namste Mam
    I have some problem in my statistics, I have two sample size one 18 and other 17 when i test normality, from Shapiro test(R) presenting p values of 17(sample size) 0.007442i.e p is less than o.o5 and (18 sample size) 0.3423 i.e p is greater than o.o5 respectively. With the p-Values it is observed that one has normal distribution but next does not present normal distribution. In this situation which test is suitable, Can i use Wilcox.test rank sum test (nonparametric test).
    I have drawn this sample from one community Forest which is divided into two blocks one is unmanaged and other is managed block of CFs

    • Karen says

      Namaste Ambika,

      I don’t like Shapiro Wilk test as a final decision maker about normality. I would first investigate what distributions you do have. If the one doesn’t look normal, why not? Skew? An Ourlier? Uniform?

      That said, the Wilcoxon is considered distribution-free, so it’s safe to use, if it answers your research question.

  61. Mona says

    In my paper, males and females compared through Manova Test. The number of males is 37 and females are 86. Is this difference of numbers affect the results? How can I justify this difference?


  62. Daniel says

    Hi Karen,

    I would be most grateful if you could help me as I have an ANCOVA question for you.

    Two of my independent variables have unequal sample sizes, for example: the first variable (depression) was drawn from a student sample, the depression variable has 6 ordinal levels with: n=55, 16, 6, 5, 4, 1 (in each level of depression). The second variable (anxiety), also from a student sample and has 4 ordinal levels with: n=36, 28, 17, 6. As you probably assumed: when depression and anxiety increases the n for level of the respective group gets smaller (there are few subjects with higher levels of anxiety or depression in the sample).

    Question: Should run the analysis as it is (I have used levene’s test of equality of error variance and it was non-significant), or should I merge i.e the levels 3-6 in the depression variable and 3 & 4 in the anxiety variable. What would you do?

    Thank you very much for your time,

    • Karen says

      Hi Daniel,

      There isn’t one right answer to this one, since you don’t seem to have problems with unequal variance.

      But I can tell you a group with n=1 (the highest depression) has no variance, so isn’t useful. It is certainly reasonable to combine those groups, as long as it makes theoretical and logical sense.

      And as long as those natural groupings aren’t giving you opposite results, it should help your power as well.

  63. David says

    Hi Karen,
    I’m running Anova to compare means. Anova sig. = .129 but post hoc test concludes there’s significant difference at 0.05 level. How come?

    • Karen says

      Hi David,

      I was going to refer you to another article, but just realized I haven’t written anything on this. It’s so important (and common). Here’s the quick answer:

      1. They’re not actually testing the exact same thing.
      2. The F test always trumps the post-hoc. If it’s not significant, don’t run a post-hoc. 🙂

  64. Grace says

    Please help me with my assignment. I really dont know what to do cause our prof didn’t teach this yet and this is some kind of advance study for us but its so hard 🙁

    HOMEWORK – Introduction to Analysis of Variance
    A psychologist conducts a research to compare learning performance for three (3) species of monkeys. The animals are tested individually on a delayed-response task. A raisin is hidden in one of three containers while the animal watcher from its cage window. A shade is then pulled over the window for 1 minute to block the view. After this delay period, the monkey is allowed to respond by tipping over one container. If its response is correct, the monkey is rewarded with the raisin. The number of trials it takes before the animal makes five (5) consecutive correct responses is recorded. The researcher used all the available animals from each species which resulted in unequal sample size (n). The data is summarized below. Ref. (Gravetter, Frederick J.; Walnau, Larry B.;, 2012)


    n=4 n=10 n=6 N=20
    M=9 M=14 M=4 G=200
    T=36 T=140 T=24

    SS=200 SS=500 SS=320

    Summary Table for One-Way ANOVA
    Source SS df MS F

    Fcrit = ? at alpha 0.05

    Guide Questions:
    1. Formulate the steps in hypothesis testing (10 pts)
    2. Construct the summary table for One-way ANOVA (8 pts)
    3. Identify if the problem uses one-tail or two-tail of alpha level? Explain why? (2 pts)

    • Karen says

      Hi Grace, while I appreciate how hard this can be, as a rule, I don’t help with homework. That’s what your TA is paid the big bucks to do. 🙂

  65. Alex says

    Hi Karen,

    I was hoping to use ANCOVA to compare a battery of neuropsychological tests in carriers vs non-carriers, controlling for age, gender and education level and I have three questions about that which I was hoping you could help me with. 🙂

    Firstly, do I have to demean the covariates, before feeding them into (the) SPSS (multivariate general linear model)?

    Secondly, is Levene’s Test of Equality of Error Variances the test I need to do to check if the variances are sufficiently similar to perform the ANCOVA on?

    Lastly, assuming this is the case, what happens if Levene’s test is significant? Does it matter a lot for ANCOVA (or is it very robust anyway)? Is there a non-parametric alternative that I could use instead?

    Thank you very much!


    • Karen says

      Hi Alex,

      1. I’m not sure what you mean by demean. I assume you mean “mean center.’ If so, it’s not necessary, but can be helpful.
      2. Levene’s is popular, but I don’t use it, at least not as a sole criterion.
      3. It’s robust, unless sample sizes are quite different.

  66. Mike says

    If my samples from two groups were slightly unbalanced (8 vs 9), but the homogeneity of variance was not violated (Levene’s test > 0.05). Does it mean that I could interpret the results as if the data were balanced? Thank you very much.

  67. Kaye says

    hi Karen,

    I’m new in spss and research analysis hope you can help me. I am doing an analysis on the influence of teacher characteristics (ex.academic background) to student scores. i have 135 teachers and more than 4000 students. how should i prepare my data set so i can do a multiple regression? thank you!!!

  68. Mark Lowe says

    Hi Karen,
    I am wanting to run a one-way between groups ANOVA, however my groups sizes are 88, 76 and 7. Do you have any suggestions or comments on whether this is going to provide useful information?

    • Karen says

      Hi Mark,

      It’s hard to do any sort of comparison with only 7 observations in a group. That said, in some studies that’s all you ever have. This could be useful, but pay very close attention to those assumptions. A non-parametric test, like Kruskall-Wallis, may be a safer approach.

  69. Chantalle says

    Hi Karen,

    I’m hoping to run a one-way ANOVA with 4 independent factors. The sample sizes are 102, 100, 100 & 59. Levine’s test was significant (0.001) after an arcsin transformation (data were percentages). The distributions are normal.

    I read somewhere that if there is less than a 5-fold difference in standard deviations, the ANOVA should still be robust, even with heterogeneity of variance, but the site did not list any references. In my case, there is a 1.49-fold difference between the largest and smallest standard deviation.

    I was wondering whether you think I can use an ANOVA?

    Also, I’m having trouble tracking down the paper you referenced (Keppel, 1993). In what journal was it published?

    Thank you very much! 🙂

    • Karen says

      HI Chantalle,

      5-fold sounds higher than I’ve seen, but 1.49 is probably fine. Keppel is a textbook, not a journal article. Desighn and Analysis: A researcher’s Handbook is the title.


  70. Kathy says

    I am running both t-tests and logistic regression analyses looking at income differences between two groups. One group has 980 subjects; the other 9800. In another comparison, one group has 980 and the other group has 430,000.

    I have run t-tests using the lincom function in stata (with unequal variances). I have also drawn a random sample of 10% of the larger group and re-run some of the analyses. While my means change slightly with the smaller samples, the overall patterns persist and statistical significance does not change.

    I have a reviewer who has asked whether I have applied any corrections to take sample size differences into account. Would you suggest any additional corrections, other than what I have already done? The reviewer in particular questioned whether I could trust my results that indicated statistical significance, given the very different sizes between the two groups. Would you agree with this concern?

    I appreciate any feedback!

    • Karen says

      Hi Kathy,

      I understand doing corrections in a factorial situation, but you don’t have that. It sounds like you already tried the subset of the larger group, and got the same answer. I’m not sure what other corrections you’re supposed to try.

  71. Richard says

    I have 3 sample groups I wish to compare. Sample A = 20 Sample B = 20 Sample C = 40. Do I need to adjust my ANOVA to compare them? If so, how do I calculate the weighted mean? The samples come from 3 different stakeholder groups i.e. different populations. Does make a difference when calculating the weighted mean?

    All my data is in Excel. Is it possible to carry out an ANOVA with weighted mean in Excel?

    Sorry for all the questions. Your help would be greatly appreciated.

    • Karen says

      Hi Richard,

      I suspect is it possible to do an ANOVA with weighted means in Excel, but I don’t ever use Excel for data analysis, so I have no idea how.

      You would need to do adjustments to means if you’re calculating by hand, but stat software will do it for you automatically.

  72. Rebekah says

    Hi Karen,

    I have completed an independent samples t-test and because equal variances are not assumed, I go with the statistics which SPSS provides for that correction. However, my sample sizes are not similar (71/242) and therefore I have been taught to be very leery of the corrected t statistic. One solution I have been told is to select a random sample of the bigger group (so I would select 71 cases randomly out of the 242) and then run the test so that you have equal groups (71 to 71) to run your t test. Have you ever heard of this? Is this the most robust way of dealing with the issue of having both unequal variances and unequal n size?

    Any help/suggestions would be much appreciated!

    • Karen says

      I have heard of that (just read it in a book again yesterday). You’re absolutely right that when the sample sizes are that different, you have to be careful about unequal variances.

      Another option, btw, would be a nonparametric test, like Wilcoxon Rank Sum.

  73. Lia says

    i have a 2 x 2 x 2 mixed anova design as well,
    it’s a 2×2 repeated measures followed by a between group (gender).
    but my sample size difference is 59 and 29, is that too big a difference?

    • Lia says

      Also, past research have said females would generally do better, so with it at 59 and males at 29, should i report a possible confound?

      • Karen says

        Hi Lia,

        It could. This is exactly the situation where the bigger sample of females could cause problems. Are the results the same within each gender?

        • Lia says

          there is a marginal significance p=0.058 in only one of the interaction between gender and another IV

  74. Marco says


    I’ve run a 2 (groups) x 3 (modalities) x 3 (intervals) mixed ANOVA.

    Now, in group 1 there are 17 subjects, while in group 2 there are 15 subjects.

    One reviewer asked if I applied any “correction” to take into account the different sample size.

    I did not think this was a problem, above all with this small difference. Do you have any advice? What should I do?



    • Karen says

      Hi Marco, there is no need to do anything, particularly if at least two of those IVs are manipulated. It’s only a problem if there’s a relationship among the IVs. Even so, those n’s are very similar, even if not equal.

  75. Chantal says

    Hi Karen,

    I am working on my masterthesis and am confronted with a dataset with 2 unequal groups sizes (n=48, n=160) at baseline (T1). I have to test whether there is a difference between the two groups at baseline before the start of the treatment but also after 3 and 9 months (T2, T3). Besides, the second group size gets smaller over time (n=132 at T3), so I am wondering what test to perform to deal with these difficulties.
    Hope to hear from you.

    Greetz Chantal

    • Karen says

      Hi Greetz,

      It’s really not a problem if the groups are unequal sizes. The bigger problem would be why one group is losing subjects over time but the other isn’t (although maybe I’m just assuming that last part)

  76. Muj says

    Hey Karen,

    I have conducted an ANOVA for 3 between factor groups A (n=26) B (n=19) and a neutral group (n=68). no significant effect was found, but i would like to know if this was likely due to the neutral group? what problems would the large size of this neutral group present for this situation?


    • Karen says

      Hi Muj,

      I’m not sure what you mean by if it was due to that group. Because it has the largest size, it should have the narrowest standard error. It would entirely depend on the order of the three means. It’s the two small groups that would potentially cause problems. That’s where your power is limited.


      • Muj says

        Thanks for the prompt reply,
        I am testing the effects of schizotypy on memory performances in particular accuracy and reaction time, its proposed that there would be a difference between high and low groups with high groups performing significantly worse…however no significant effect was found
        the neutral group does have the narrowest standard error (25.95), compared to a low schizotypy group (43.58) and a high schizotypy group (50.98)…
        means for RTare low = 848.58. high = 965.13. neutral =927.29

        I was asked by my supervisor to comment on the potential problems of the large neutral group, could it be that she means that my other two samples were not as matched and had reduced power and so there was not a significant effect?

        Sorry for my essay ^
        but many thanks for your help! 😀

  77. Anne says

    Good afternoon Karen,

    I have a question for you….my sample size is 351 (68 male/283 female).
    I am comparing male/female on several continuous variables and using parametric tests; t-test and manova, etc.
    The issue is the large difference between groups and feeling that I should conduct non parametrics? Would this ‘satisfy’ those reading my work? The results are the same with both para/non para., but I am concerned about the great differences due to the fact that this is my major hypothesis.

    Thanks so much for your advise.

    • Karen says

      Hi Anne,

      If you’ve checked assumptions and have no problem with unequal variance, it’s fine.

      That said, reviewers don’t always know that, so they may challenge you. If it would make you feel safer, and you are getting the same results anyway, there is nothing wrong with running it as a nonparametric for the t-test. You may have more trouble with the manova though–I don’t know of a nonparametric equivalent.


  78. David Lane says

    There is a good discussion of what to do when the variances are unequal here: http://beheco.oxfordjournals.org/content/17/4/688.full and it presents a good solution that holds for unequal n.

    I have a simulation that lets you explore the issue for the test that assumes homogeneity of variance here: http://onlinestatbook.com/2/tests_of_means/robust_sim.html

    and a discussion of unequal n in multi-factor designs here: http://onlinestatbook.com/2/analysis_of_variance/unequal.html

  79. Anne says

    Good morning Karen,

    Great site!
    I have a few questions:
    My data: gender comparisons re knowledge, attitudes, beliefs.
    Male n=68, female n=263.

    1) I am running multiple regression, t-test and MANOVA.
    I want to know if I need to run non parametrics to account for the unequal group n’s?
    Doesn’t the Central Limit Theorem kick in due to my large sample sizes?

    2) In my MANOVA, my Levene’s test shows two variables that are significant at both the.05 and .01 levels.
    Should I not use MANOVA and look at other tests instead?

    Thanks so much for your advice,

    • Karen says

      Hi Anne,

      1) You can run nonparametrics, but it’s usually not necessary. It’s hard to say what you need to do in any specific situation without all the details.

      2) I’m not sure I understand this question, and as for what you should do, see my response to #1. If you want to restate that, I can give you some info so you can decide what you should do. 🙂


  80. Alex says

    Dear Karen,
    I’m doing a paper on unequal sample sizes and I was wondering if you could give me the reference you’re citing… Keppel (1993) I’m looking for it, but I don’t seem to be able to locate it.


    • Karen says

      Hi Alex, It’s Design and Analysis: A Researcher’s Handbook, 3rd ed. There is a more recent edition, coauthored with someone I can’t remember.

  81. Leona says

    Dear Karen,

    I’m hoping for help on dealing with unequal sample sizes for logistic regression analysis. Any assistance is greatly appreciated.

    I’m currently working with data from two types of media user (single media user: N=182 and multiple media user: N=1963). We will be using demographics (such as age, gender, and race) to predict these two types of media user.

    So basically, my DV is 0=single media user and 1= multiple media user, and my IVs are age (continuous), gender (dichotomous) and race (dichotomous). However, due to the extremely unequal size of the two categories of my DV, I doubt if I can still use SPSS to run logistic regression.

    Thanks so much for answering in advance!

    • Karen says

      You can still use logistic regression in SPSS.

      The only problem may be an issue called zero cell counts. It occurs when one category of a categorical IV never co-occurs with a category of the DV. It’s common when both the IV and the DV have lopsided sample sizes. So for example, if you have no men who were single media users, you’d have a problem. Other than that, it’s fine.


  82. Fatima says

    Hi. never mind my last post. Y had my data wrong, now I found the error and the Anova type III SS works fine with my unbalanced design.

    Thank you anyway for all your previous posts

  83. Fatima says

    Hi Karen.
    I have been reading the previous posts and I have cleared many doubts. Thank you.
    However, I still can not perform a 2-way ANOVA with unequal sample sizes…I tried with minitab and I am now using SPSS. My experiment analysed growth in 2 algae depending on community density and neighbor identity.
    Factor density has 2 levels (n = 13 for each).
    Factor neighbor identity has 4 levels (level 1, n =6; level 2, n =7, level3, n = 6, level4, n =7).
    In SPSS I did an univariate GLM but the output is not complete…It doesn’t give me the interaction, nor the density F-value.
    Am I doing something wrong? I thought there would be no problem doing an unbalanced 2-way ANOVA…Or is it only possible to do a 1-way ANOVA?
    Thank you for this blog

  84. Banira says

    Hi karen,

    I am so much impressed by your suggestion to researcher!!!
    I am also doing research for master study and little knowledge about stat.. specially in these tests… i have problem that whether i can run one way ANOVA since i have three group of respondents with 10, 80 and 18 samples. And my questionniare are based on likertscale can i do factor analysis for this research?? if possible can i have any reference regarding factor analysis and one way ANOVA??

    Thanks for this useful blog…

    Again thanking you!!!
    Kind regards

  85. Kancha says

    I am really surprised reading the content, questions and the response on the post. This led me to ask my own problem here. I am working on the agriculture sector. I am looking at the health effects of pesticide use and its associated costs. Say its negative costs of pesticide use. Now I want to see, does it varies by land size. So I categorized into three groups : small. medium, and large holdings. I know the appropriate analysis could be one-way anova; but the challenge for me is to post hoc test. How to decide whether the data violates equal variance assumption?, if it violates, then should I go for ANOVA or Welch test! I sometime read about Robust mean test! waiting for your response.

    • Karen says

      Hi Kancha,

      There are a number of ways to check the homogeneity of variance assumption, and remember none of them is a definitive test. I tend to favor graphs over tests, because the tests are problematic.


  86. Jenny says

    Hi Karen,

    First of all, thank you for this post. I do, however, have a data analysis problem that hopefully you can help me with:

    My research question is whether or not prescribed burning as a site preparation treatment affects growth of chestnut trees. I have 10 control and 10 burned plots, each with 3 subplots; each subplot was planted with 3 seedlings of a specific variety of chestnut tree (Chinese, American, Backcross 1, Backcross 2, Backcross 3). All 20 plots contained a subplot of Backcross 3 (60 seedlings total) while half of each treatment contained Chinese and American, and the other half contained the backcrosses. So my inherent study design (when controlling for variety) resuults in my growth data being unbalanced. After 3 years of study, some of the seedlings died, adding to the unbalanced dataset.

    I have used least square means in conjunction with an analysis of variance to analyze the first year’s data; after 3 years of data I no longer have access to program (SAS) where I can analyze the data in this way.

    Ideas for how I can analyze this dataset? Does the unbalance affect my analysis of variance? Can I use a linear mixed effects model to analyze the data and will it account for the unbalance?


    • Karen says

      Hi Jenny,

      As a general rule, linear mixed models are better than ANOVA in this kind of design. Especially when the data are unbalanced. Linear mixed models can deal with the unbalanced data much better.

      You will need some sort of statistical software to analyze it. If you don’t have access to SAS, you can always use R. It’s free.


  87. Dim says

    Dear Karen,

    Thank you for all this information you have made available to everyone. I have some trouble finding a solution to a set of measurements I have to make.

    I have one group with 5 levels. I also have a number of continuous variables that I want to examine. The histogram show approximately normal distribution for all but one of them that looks a bit positively skewed. The first problem is that Unfortunately the sizes of the groups differ a lot (33,70,324,258,245). The second problem is that for two of the continuous variables levene’s test shows that the variances differ highly significantly across the groups.

    So the test I would like to run is an one-way anova with a post-hoc scheffe but I am pretty sure that anova assumptions are being violated here.

    What I’ve tried:
    I tried separate t-tests but only between the 3 biggest groups and only for the cont. variables that have similar variances across the groups. What I receive significance for all the cases (for “equal var. not assumed” too). The problem is a) is t-test appropriate for n>200? b) can I run a t-test for n1>200 and n2=30?

    For the cases where variance was significantly unequal I tried a mann-witney test. the problem (?) with this is that I receive very very high U-values (n1=254, n2=342, u=27909 p<,000). Is such a u-value normal or I've done something wrong?

    Finally, I tried a Kruskall-Wallis test but I can't find a solution to have the information that a scheffe test would give me…

    So to conclude:
    1) How should I deal with highly unequal samples like 30×254?
    2) If ANOVA is inappropriate how should I replace the "missing" scheffe test?
    3) Should I prefer separate t-tests or u-tests instead of parametric or non-parametric ANOVA's?

    Thank you very much in advance!

    • Karen says

      Hi Dim,

      It’s a little hard for me to give specific advice when results seem “funny” without seeing the data, but a t-test wouldn’t be any better than an anova. I would suggest the kruskall-wallis, followed by a bonferroni correction. Yes, it’s more conservative than Scheffe, but you’re not doing a lot of comparisons, so it should be too bad.


  88. Patrick says

    Dear Karen,

    I am looking for a good way to test for homogeneity of variance before conducting a one-way ANOVA, but I have run into some trouble. My sample’s size is 233 and the group sizes are unequal (66-76-91). The result of Levene’s test was p = .028, but I read it yields significant results rather quickly when large samples (N=233)are used. Therefore, I was going to do Hartely’s Fmax to do a double-check for homogeneity, but I believe it requires equal sample sizes. Do you know a good test for this particular case to test for homogeneity of variance?

    Kind regards

    • Karen says

      Hi Patrick,

      Keppel (1993)”Design and Analysis” says Hartley is problematic because *it* is affected by hetergeneity of variance and non-normality. He suggests instead of a test to just see if the Fmax is > 3. If it’s lower, simulation studies have shown there’s no affect on the p-values. When Fmax is > 9, the ANOVA F test becomes highly problematic.

      All tests of assumptions tend to be over-sensitive in large samples.


  89. Anushka says

    Hi Karen,
    I am new in using ANOVA on minitab. Please help me with a problem i am facing.
    The problem is i need to conclude on my document that lot size (in production department) does not affect the critical quality attribute of a product(here the product is pouch).
    Critical qualtiy attribute is burst test of given pouch.
    i have taken a lot size of 20,000 pouches and second lot size of 6000 units. From first lot size, there were 35 burst values calculated and from second lot size there were 30 burst values.
    May i know how to do step by step ANOVA and how to interpret its values?
    I am not sure whether i can post my values for both lot sizes.
    please let me know.
    your suggestions and help is greatly appreciated.
    Thank you Karen for this blog.

    • Karen says

      Hi Anushka,

      I’m not entirely sure I understand your analysis. When you say there were 30 burst values, do that mean 30 of the pouches burst? Or is there some burst value that you measure whose mean you are calculating? This is important because ANOVA is not appropriate in the first situation, and only possibly in the second if you’re trying to compare those means across the lots (or something else). You may want to sign up for a Quick Question Consultation–I’d be happy to help you once I understand the situation thoroughly.


  90. Ric says

    Dear Karen,

    Thanks for taking the time to share your obvious expertise.

    My question relates to planned contrasts in one way ANOVAs. My data has 11 groups and is unbalanced. The groups range in size from n=5 to n=11. My homogeneity of variance looks good across groups.

    My uncertainty is I would like to run a planned contrast for one group and the means of the remaining 10 groups, but given that the groups are unbalanced, would a planned orthogonal contrast work in such a circumstance? I am only interested in one planned contrast rather than a post hoc analysis of groups. I can’t find any literature on this issue.

    Thanks for whatever help you can provide in helping me move forward.


    • Karen says

      Hi Ric,

      It would be fine–this isn’t uncommon. When you have only one contrast, there is nothing to be orthogonal with (a set of contrasts can be orthogonal).


  91. Joel says

    Dear Karen,
    First of all I want to thank you for your help! I really apreciate what you’re doing here!
    I also have a question how to proceed statistically with my data. Here some information about my data:

    -I isolated cells from 16 individual patients (so I have 16 groups)
    -I let cells migrate and measured the distance after 24h of each group.
    -unequal group size (going from n1=20 until nx=46)
    -no homogeneity of variance (Leven’s test: H0 is significantly rejected)

    So I started doing a non-parametric ANOVA (Kruskal-Wallis chi-sqd test). H0 is rejected.

    -Q1: Can I therefor assume that there are groups that are not from the same population?

    -Q2: what would be here more convenient as a non-parametric post-hoc test?

    -Q3: I’m still not sure about the distribution of my data. I tried to find out with histograms, but I don’t have enough groups to confirm a non-parametric distribution. How can I find out? (I know, I should have asked this first)

    Again, thank you very much for your help. Any other advice to proceed is very welcome!!!
    Joel, Switzerland

    • Karen says

      Hi Joel,

      Ah, this might be a question that needs a consultation. It sounds like you’re trying to compare patients to each other. Is that right? If so, is that really what you want to do?

      And to answer #3 specifically, the whole idea of nonparametric tests is the distribution doesn’t matter. They’re sometimes called distribution-free tests.


  92. Tamara says

    you mentioned Keppel (2003). Could you give us the whole citation of this work as well? I’ve searched for it, but unfortunately couldn’t find anything. Thank you!

  93. Tien says

    Hi Karen,

    I am running a mixed design ANOVA:
    IV – participant nationality (2), poser nationality (2), configuration (3), emotion (4).
    DV – Reaction time, accuracy

    to test my predictions, i have run 2 3-way ANOVAs:
    participant nationality x poser nationality x configuration
    participant nationality x poser nationality x emotion

    The procedure is that all participants view the same stimuli and respond using a key press response, but when analysing the data, i’m interested in seeing how the participant nationality interacts with the other variables.

    I have discovered significant levene’s results in my analysis, and I’m not quite sure what to do. I have been trying to consult my SPSS textbook but it is not very handy, and i know with the one way ANOVA, i can just ensure that the welch’s F ratio is included in the analysis, however, when computing using the repeated measures design, I do not have this option.

    Do you know what else I can do?


    • Karen says

      Hi Tien,

      I don’t know how large your samples are, but Levene’s test alone can be inaccurate. I would start by doing some graphs to see how far off the variances are and check the rule of thumb of largest variance/smallest variance < 9.

      You're right that Welch is only available with one-way anova, at least in spss. Your options are a rank transformation (see http://www.bio.ri.ccf.org/robrien/IntroBiostat/RankAsBridge.pdf) or a weighted least squares.


  94. hina says

    I have two problems.
    1. I have two age groups early adulthood, n=45 and middle adulthood n=45. can I run independent t test to find social support? I have read that for applying t test the sample size must be less than 30 as my sample size is 45 each.
    2. I want to run one way anova. I want to see how education affects our copng. I have divided education into 7 groups n each group sample size is different. should I apply Gabriel or Tukey post hoc test to see which group differ?
    plz reply soon. THXS

  95. Chinesewoman says

    Thank you so much Karen, I searched on internet and found you post, so glad someone knows my problem:

    Could you please take a look at my example:

    I measured Blood Pressure of 31 subjects.

    I grouped this 31 subjects according to their Sex (Male/Female = 21/10) and Smoking Status (Yes/No = 6/25).

    Thus I have 4 groups:
    Male smokers: 4
    Male non-smokers: 17
    Female smokers: 2
    Female non-smokers: 8

    I wonder if I can use the “General Linear Model” to analyse the fixed effects of Sex and Smoking Satus (and their interaction) on Blood Pressure?

    Since I have small unequal sample size in each group.

    or if I can’t use the GLM, what else methods are good?

    Thousands of thanks from South Korea~! ^____^

    • Karen says


      Well, you can run it, but I would take the results with a grain of salt. Your sample sizes are pretty low in general. I suspect you’d be fine with only main effects, but you’ll have trouble with an interaction term.

      A nonparametric test might be a better option.


  96. Amanda says

    Hi Karen,
    I have a dataset with species richness separated by forest type. The sample sizes are (49,61,256). They are normally distributed, but Levene’s test shows that there is unequality of the variances. I’ve run the Kruskal-Wallis test on the data, which showed significant difference. However, I wanted to see exactly which forest types were different from each other. After running a Mann-Whitney test, the results showed that none of the paired types were significantly different. I’m confused because this contradicts the Kruskal-Wallis test. Should I run a Kruskal-Wallis test on each of the pairs or is there a better post-hoc analysis?

    • Karen says

      Hi Amanda,

      Did you add a Bonferroni correction to the Mann-Whitneys? It shouldn’t contradict the Kruskall Wallis if you didn’t, but it’s always possible, especially if the original p-value wasn’t especially low and the MW p-values aren’t too high.

      Here are some options for post hoc tests on nonparametrics: http://www.talkstats.com/showthread.php/4634-non-parametric-post-hoc-tests

      Otherwise, you might want to try a Weighted Least Squares ANOVA, using the inverse of the variance of each group as the weight. It’s a good option when variances are unequal but normality is met.


  97. hellen says

    hi karen,
    i have a sample of ticks and human samples that i will test for babesia,Q-fever and rickettsia.what is the appropriate statistical test,i have reading for the last 5 days still cant figure what to use .thanks

    • Karen says

      Hi Hellen,

      You haven’t given me enough information to answer. I would need to know the research question, how the variables are measured, and the study design to even get started.

      Please feel free to give me more details.


  98. Jelle says

    Hi Karen,

    Just a simple question with regard to unequal group sizes, and which statistical test to choose.

    I have two samples, and I’m trying to find out whether I should use a Mann-Whitney U test or a independent samples T-test. I was wondering if you could point me in the right direction. Sample 1: n=62. Sample 2: n=38.

    Are group sizes so unequal that I should use the Mann-Whitney U test, or is the difference meager, in which case I could use a T-test? Answers are given on a 5 point Likert scale.

    I look forward to hearing from you.


  99. Karen says

    Hmm, that’s a big question. At best, it won’t answer the research question. At worst, it will answer it, but wrongly.

    Was there a specific situation you’re referring to?


  100. Sarah says

    In your above post you mention using the Tukey Kramer as a post hoc test for unequal n’s…would the Fisher’s Protected t-test also work? I need to conduct a post hoc analysis on the interaction effects of a Two-Way ANOVA. Since SPSS can only do post hoc analyses for main effects I have to do it by hand. My stats book recommends Fisher’s for a One-Way ANOVA, but it doesn’t say whether or not it will also work for the Two-Way ANOVA. If it is suitable to use, do I still use degrees of freedom within when determining my critical value?

    I appreciate your guidance!


  101. Ali says

    Hi Karen,

    I have a quick question regarding unequal sample sizes in ANOVA. I am running a test on a continuous outcome score in 5 different groups. The group sizes are very unbalanced (Group 1=98, Group 2=366, Group 3=180, Group 4=22, Group 5=10). I want to know if the mean outcome scores are significantly different between the groups, AND which ones are different, using SPSS. Am I wrong to assume that I can use ANOVA with a Tukeys post-hoc for this? Thanks!!!

  102. Richard says

    Hi Karen,

    I am using the GLM procedure in SPSS to examine the association between daily physical activity and different measures of physical fitness while adjusting for a few variables (age, gender and school) in a sample of about 600 children from whom we have adequate physical activity data. For some of my models, Levene’s test is significant. I was wondering if this should be a concerned given the relatively large sample size. If this is a concern, I have heard that it might be a good idea to compute the Welch statistic. So, I was wondering how to get that statistic in IBM SPSS 19.

    Many thanks,


    • Karen says

      Hi Richard,

      With a sample of 600, I’m sure Levene will find even very small differences in variances significant. I don’t know of a way to compute a Welch adjustment in GLM, but one-way anova has it.


  103. Michael says

    I am currently conducting research on my masters thesis and have run into a problem as to what Statistical test I haver run! My research is comparing distance from before and after. I am comparing the recruiting distance of schools that have won a ncaa football national from three years prior to three years after. I am trying to see if there is a significant increase in recruiting distance because a team won a national title. Obviously there is an unequal sample size due to the difference in the number of recruits for each year both pre-championship and post-championship! I have used both the paired samples t-test and the wilcoxin signed ranked test! It seems that these test do not equate for unequal sample size, which I fell may be throwing off my data!
    DO you have any suggestions?
    I am comparing those distances of before and after for 5 teams

    • Karen says

      Hi Michael,

      All 5 of these teams won a championship? i.e. is there a control group, say the teams who made it to the championship, but lost?

      And are you taking year into account, or is it just split into before and after?

      I’m going to assume you have no control group and one total. And because this is a masters thesis, I’m going to try to keep this as simple as possible.

      What it sounds like to me is a paired test won’t work. Let’s say school #1 has 80 recruits before and 120 after the championship. Despite being from the same school, the actual recruits don’t match up. In other words, recruit #1 before has nothing to do with recruit #1 after. Each recruit has their own distance. What you’ve got is what’s called a randomized block design. The recruits are blocked by school.

      What it means is you need to do an ANOVA, using school as a random factor. And because of the unbalanced data, you want to use a linear mixed model procedure, rather than an ANOVA. It has better algorithms for unbalanced data.

      One resource I can recommend is a webinar I did on Fixed and Random Factors. If you follow that link, the recording is free. There’s some background there.

      Any good Design of Experiments book will have info as well. I like Geoffrey Keppel’s.

      Actually running a Mixed model in Mixed can get tricky, but you have a very straightforward design, so this won’t be a hard one for someone with experience.


  104. Dave says

    Hi Karen,
    I have conducted a socio-economic study where I have collected information from 140 people near a main road and 50 people who are away from the main road. I have done K-Wallis tests (because of non-normality) to examine differences in e.g. total income, education between the two categories so as to say something about the effect of the road. I checked for the homogeneity of variance assumption and everything seems ok. I have received some comments that my data are biased and am seeking your view. From reading your responses in the above, I would think what I have done is fine. Kindly comment. Thank you.

    • Karen says

      Hi Dave,

      It’s hard to comment on exactly what they mean without seeing the comments of why they’re biased.

      However, I suspect it’s not a problem with your choice of statistical analysis method. It’s a problem with your sampling, and therefore your conclusions. Clearly, people can’t be randomly assigned to living near or far from a main road.

      So yes, it’s fine to use a K-W test to show that distributions in income, education, etc. differ in the two areas. What isn’t fine is saying the reason for the differences are due to an effect of the road. There could be other differences in the two groups, or the effect could be in the reverse direction. Perhaps people who have higher educations, incomes, etc. tend to choose to live away from a main road because they can.

      You just can’t tell given your research design.


      • Dave says

        Thank you Karen. This has been very useful. I really like the clarity in your responses. Makes the statistics understandable. Thank you again.

  105. Stephanie says

    Hi Karen,

    I have a data set with one control and two treatments. Basically, three groups of cows were fed a control diet, a contaminated diet, and a contaminated diet with additive. I have the following samples sized: control = 2 cows, trt1 = 5 cows, and trt2= 5 cows. These are pretty small numbers to begin with, but we were limited by money (yay research!). I’m using Proc Mixed in SAS for the ANOVA, but after reading some of the comments above, I’m not sure I’ve done this correctly. Can you offer some advice on the proper way to analyze this data?


    • Karen says

      Hi Stephanie,

      Proc Mixed might be fine–you haven’t given me enough information. Is there a reason for mixed, like repeated measures on each cow or randomized blocks?


  106. Remy says

    Hey Karen, thank you for posting this article and for taking the time to respond to so many of your readers’ questions. I’m really impressed with that and will be checking out more of your website. Cheers

  107. Ming says

    Hi, Karen,

    I noticed that you replied to a person you never used Levene’s test. So, I just wondered how you test the homogeneity of variance as a stat consult, since Levene’s test is known as being affected by the sample size.

    Another question is I’m working on a project involved one way ANOVA. Basically, we want to compare students’ outcome under seven different instruction methods. Since we have unequal sample sizes, the way we chose to analyze the data is we test the homogeneity of variance first, if the assumption is met, we go with normal ANOVA F and Tukey as post hoc. If the assumption is not met, we go with Welch F plus Games-Howell as post hoc. Is this way correct?
    Any thoughts are greatly appreciated! Thank you.


  108. Fay Sarah says

    Hello Karen,

    I wonder if you can help:

    I’ve conducted 8 2x2x2x2-way between-subjects ANOVAs. The sample size is 179. There are approximately similar numbers of participants in each level of the independent variables and across the 16 combinations of the IVs.

    I have 8 dependent variables. For some of the ANOVAs, the Levene’s Test is significant, for others it is not. When it is significant, I have used all expected ways of transforming the DV, without success.

    Because the Levene’s test is sometimes not significant on the same sample, does this mean that the comment you made above about Levene’s sometimes being significant in large samples does not apply to my data? (Would 179 participants be considered a large sample?)

    There is no alternative non-parametric test for a 4-way ANOVA, so I’m unsure what to do. Any advice you could offer would be greatly appreciated.

    Kind regards


    • Karen says

      Hi Fay,

      I would suggest using other ways to check for non-constant variance other than Levene’s. I don’t think you have an issue of a sample being too big–you’ve got only slightly more than 10 per condition.

      And transformations are really only useful for non-constant variance if you also have non-normality. Otherwise the normality will be messed up. (that’s a technical term).

      I have a whole workshop on this, which you might want to look into: Assumptions of Linear Models


  109. Ming says

    Hi, Karen,

    I noticed that you replied to a person you never used Levene’s test. So, I just wondered how you test the homogeneity of variance as a stat consult, since Levene’s test is known as being affected by the sample size.

    Another question is I’m working on a project involved one way ANOVA. Basically, we want to compare students’ outcome under seven different instruction methods. Since we have unequal sample sizes, the way we chose to analyze the data is we test the homogeneity of variance first, if the assumption is met, we go with normal ANOVA F and Tukey as post hoc. If the assumption is not met, we go with Welch F plus Games-Howell as post hoc. Is this way correct?
    Any thoughts are greatly appreciated! Thank you.


  110. sindre says

    Hi Karen, I was just wondering if you have any suggestion as to how to further interpret findings if the variance is unequal (Levene is highly significant, groups are large >300) when conducting an ANCOVA in SPSS. There seems to be no way to obtain Welch or Hochberg when a covariate is included (age)…Do you have any suggestions?

    Kind regards Sindre

  111. Karen says

    Hi Jay,

    The way it works is that any means that are NOT significantly different in the post-hoc tests get the same letter superscript.

    Let’s say the post-hoc results were simple, where M1 indicates the mean of group 1:

    M3 < M1=M2=M4=M6 < M5=M7 They would be labelled this way in the table (sorry, I can't get the superscripts in the comments, so just pretend the letter are up): M1a M2a M3b M4a M5c M6a M7c When it gets tricky is when there's overlap, which is very common with 7 groups. So let's say for example, we have this more complicated example: M3 < M5=M7 M3 = M1=M2=M4=M6 M1=M2=M4=M6 = M5=M7 So the highest and lowest means are significantly different from each other, but the ones in the middle don't differ significantly from anything. The means would be labelled like this: M1a,b M2a,b M3a M4a,b M5b M6a,b M7b So M3 are in a different group than M5 and M7. But M1, for example, has the same subscript as both M3 and M5 because it overlaps them. Hope that helps! Karen

  112. JAY says

    Hi KAREN.

    I would like to know if how are the different letter superscripts used in a post-hoc test? can you suggest a reading material with examples on when and how to use different letter superscripts when 7 treatments are considered, and the level of significance vary in at least 4 of the paired means.



  113. Karen says

    Hi Lulu,

    You could run a two-way anova as is without the interaction on this. The problem subcategories are the ones with 1 and 0 people in them.

    The other alternative, if the interaction seems necessary, is to collapse the experience variable into fewer categories.

    I would suggest graphing the means to see if the interaction is important, and if not, leave it out. If it is, you’d be better of collapsing.


  114. Lulu says

    HI Karen,

    I just want to know if i could actually use two way factorial anova for this.

    I have two groups of DEvice 1 (n=27) and Device 2 (n=28). in each group, I have 5 sub categories of participants (very low, low, moderate, high and very high experience of playing games). For the Device 1 group I have 9, 8, 5, 2, 2 and 1 for ach sub category. For the Device 2, I have 7, 4, 7, 8, 2, 0 for each sub category. Can I use two way ANOVA for this? Or should I just provide descriptive analysis? The main objective of the experiment is to see if there is any difference on the participants total score when playing games in Device 1 or 2.

  115. Karen says

    hi Nicole,

    I never use Levene’s test. With large sample sizes, it’s almost always significant. With small sample sizes, it’s almost never significant.

    So it’s not very helpful. Geoffrey Keppel’s book Design and Analysis of Experiments has a good section on this.

    Or if you want a full explanation and demonstration about assumptions, what they mean and better ways to check them, I would actually recommend my workshop on assumptions in linear models. We have a home study version and you can get more information at: http://www.theanalysisinstitute.com/workshops/GLM-Assumptions/index.html


  116. nicole says

    Hi Karen,
    I have run a two way ANOVA (2 by 2 facotrial design) and gained a significant Levene’s Test p = .012. I have adjusted the crititcal alpha for interpretation of significance for both the main and interaction effects, however I was wondering what are the practical methods that can be used in future studies such that Levene’s is not violated? and are you able to give me some references.

    Also, with another 2 by 2 factorial design that reveals a significant interaction, I am aware that follow up simple effects are required. Through the use of the split data method in SPSS and recalculated the F statistic using the overall MSE. Is there a need to control for Type 1 error by using Bonferroni’s?



  117. Bud says

    Hi again Karen, and thanks!

    I see your point, her (Erika’s) dependent was a categorical. Mine, however, are not. That I am sure of. However, a greater concern for me is that my sample sizes vary considerably: group 1 equals 464, group 2 = 444, and group 3 = 24.

    My problem is that even though an ANOVA shows significant differences for the three groups on a specific dependent variable, and the largest calculated mean-difference is between the smallest group and one of the other, post hoc tests cannot tell apart the smallest group from the group where the largest mean-difference appear.

    Standardized mean scores for the groups:
    Group 1: -.08 (a)
    Group 2: .27 (b)
    Group 3: -.11 (ab)

    Currently, I use Hochberg’s GT2 post hoc test, as it, I have read, is quite robust to violations of homogeneity of variance. I also, where indicated by the Levene’s test, modify the p-values using the Welch modification.

    I know that this may be a lot to ask but I wonder whether you think I could benefit from bootstrapping or if such a procedure will not help me as the ratio among my three groups will not differ?

    Best Viktor

  118. Bud says

    Hi Karen, I am interested in your 3rd response to Erika the 7th December 2010. You write Erika should not use ANOVA as the response is categorical, not continuous.

    Do you mean that because of Erika’s design, a control: n=60; dose 1 n=114; and dose 2 n=175, it is inappropriate to use ANOVA here?

    I am interested in this as I have similar conditions; a grouping variable with three categorical (depending on viewpoint) responses, a (very) unbalanced design, and for some dependents, unequal variances. Hence, I wonder, what analysis would be appropriate if I conclude my response is categorical, rather than a continuous?

    Best Bud

    • Karen says

      Hi Bud,

      Good question. No, the control/Dose 1/Dose 2 variable is her Independent Variable. It’s totally appropriate to have grouping (ie. categorical) variables for the independent variable.

      In Erika’s study, her Dependent Variable (aka Response Variable or Outcome) is ALSO categorical: Is the Presence of the plant the same after as it was before: Yes or No.

      ANOVA is comparing means in the Dependent variable for the different categories of the Independent Variable. Since there is no way to calculate a mean of Yes/No, you can’t use anova.

      So I’m not sure based on how you’ve described your study whether your dependent variable is indeed categorical. You mention unequal variances, which makes me think they really are numerical.

      Here are a few posts that might be helpful:

      When Dependent Variables Are Not Fit for GLM, Now What?

      6 Types of Dependent Variables that will Never Meet the GLM Normality Assumption


  119. vasi says

    hi in my test am comparing a single variable among 3 groups having different sample size. can i do one way ANOVA inspite of the unequal sample size?

    • Karen says

      Hi Vanna,

      Hmm, we may be past your deadline anyway, but in any case, I’d need more information about what you need. The fact that you have unequal sample sizes in the ANOVA isn’t problematic. Just run it as you would any 2×2 ANOVA. If you need help running a 2×2 ANOVA in SPSS, I can tell you to use Univariate GLM. If you need more detail than that, I need a better idea of what you understand already and what you need help with. 🙂


  120. vanna says

    good day! this is very urgent…. we have a report to pass tomorrow and our research design is two-way (2×2) anova factorial design. we dont know how to make results in spss. thank you!

  121. Paulo says

    Hi there,

    I have some data that gives the amount of time taken by three different surgeons to undertake a specific procedure. Given that I have a varying number of data points for each surgeon (e.g. 50/40/25) and that there may be unequal variance (e.g. slower surgeons having a greater variety of recorded times), what is the best way to figure out if there are significant differences in the time taken by each surgeon?


    • Karen says

      Hi Paulo,

      I would start by seeing if the unequal variances are large enough to cause problems. If they are, with a one-way analysis like that, you could easily just run a nonparametric test.


  122. Adamantia says

    Hi Karen,

    I conducted a two-way ANOVA to test if there are differences in levels of teaching innovation (scores 0-6) between teachers based on school (1=regular school, 2=all-day school) and in-service training (1=none, 2=Basic ICT Skills, 3=Educational applications of ICT). I used unequal sample sizes (75 all-day teachers and 90 regular teachers).

    The ANOVA table showed that there are no differences in either main effects or interaction effect (p<0.05). However, the Model p-value was smaller than 0.05 showing that there are significant differences in the model.

    I discussed only the interaction and main effects p-values. My chair told me to recheck the data analysis because it does not make sense with the Model having significant differences whereas none of the effects (main and interaction) had no significant differences. When I deleted the Model row from the table claiming that the only important p-values to discuss were the main and interaction effects p-values, my chair said this was wrong.

    The data analysis is correct–I double checked it. My question is: What does this Model p-value mean? Does it have to do with the unequal sample sizes? How should I discuss this Model p-value? Is it really this important to include it in my results?

    Thanks in advance,


    • Karen says

      Hi Adamantia,

      Thanks for being patient–I’ve been out of the office and just got back.

      I can’t give you a definite answer of what is going on without trying it on data, but this is what is *probably* going on.

      The Model p-value evaluates the overall effect of all IVs. IF all the IVs are completely independent and sample sizes are equal, the overall model effect won’t be significant if no IVs are.

      IVs are usually only independent when you have randomly assigned subjects to conditions.

      The other thing that can happen is if your p-values are close to .05, different tests might be falling on one side of that cutoff or the other. They’re not really changing much, and even just rounding can be creating differences. So if that’s the case, don’t take the .05 cutoff too seriously.


  123. Erika says

    Thank you Karen! You really helped to clear these things up for me. I really appreciate it. Sorry again for all of the questions.

    Thanks Again,

  124. Erika says

    I apologize in advance but I have bunch of questions about unequal sample sizes and one-way ANOVAs in a particular case study.

    I am conducting an experiment with very different numbers of sample sites. control: n=60; dose 1 n=114; dose 2 n=175. My main question is if the response to dose 1 and dose 2 are significantly different? Response was measured by difference in plant’s presence or absence before and after treatment. So if the the plant was present at the sample site before treatment and absent at the same site after treatment it was considered a 1 for response, if it was present before and it was present after it was considered a 0 for response and if the plant was not present before and was present after it was considered -1 (I already know from previous research that the two doses should be significantly different from the control but I would like to do an ANOVA test to compare the resposes in the control group and the two different doses)

    Q1: Is there a test, like the levene test, for determining the equality of variances for unequal sample sizes?

    Q2: Should I not use ANOVA because the sample sizes are too different?

    Q3: Would it be better or worse to conduct a series of t-tests?

    Q4: If I choose to use ANOVA should I use a Welch ANOVA followed by games howell pairwise comparison as suggested here in the below pdf because the sample sizes are different? http://frank.mtsu.edu/~dkfuller/notes302/anova.pdf

    Q5: Should I not use ANOVA or a t-test because I pretty sure the data is not gaussian due to the fact that the data is practically boolean? And if so is there another test for comparing this kind of data?

    Any help you could give me would be greatly appreciated. I feel pretty lost.

    Thank you in advance,

    • Karen says

      Hi Erika,

      Sorry it took me a while to respond. Hope this is still useful. You do have a lot of questions, but I’ll do my best.

      1. Levene works with unequal samples sizes. Equal variance is even MORE important if sample sizes are unequal.
      2. No. It’s fine to use ANOVA (assuming variances are equal) with unequal sample sizes. But you should NOT use ANOVA in this study because your response is categorical, not continuous.
      3. Worse. Always worse.
      4. Welch’s test could work in your design (if ANOVA were appropriate), but according to Keppel (1991), it’s “unsatisfactory” when you’re comparing more than 4 means.
      5. Exactly. You could just run a Chi-square, or if you want to get really fancy, or you have covariates you want to include, a logistic regression.


  125. Jessica says

    This may be a silly questions, but what if you are doing a 2x2x2 and your comparing males and females on their reaction times (2 tasks) and their anxiety (high or low)
    and there are more females in the study than males.
    Would this be a confound?

    • Karen says

      Hi Jessica,

      It’s not a confound just if there are more females than males. It’s a confound only if, say, there are more females AND females are more likely to be anxious.

      If your task and anxiety conditions are manipulated, so that you’re assigning people to them, then you have no problem. The example I gave could only occur if you also measured anxiety, not manipulated it.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.