*Updated Dec 18, 2020 to add more detail
*

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand. Sums of squares require a different formula* if sample sizes are unequal, but statistical software will automatically use the right formula. So we’re not too concerned. We’re definitely using software.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced. Instead of the grand mean, you need to use a weighted mean. That’s not a big deal if you’re aware of it.

But there are a few real issues with unequal sample sizes in ANOVA. They don’t invalidate an analysis, but it’s important to be aware of them as you’re interpreting your output.

### Two Practical Issues for Unequal Sample Sizes in One-Way ANOVA

#### 1. Assumption Robustness with Unequal Samples

The main practical issue in one-way ANOVA is that unequal sample sizes affect the robustness of the equal variance assumption.

ANOVA is considered robust to moderate departures from this assumption. But that’s not true when the sample sizes are very different. According to Keppel (1993), there is no good rule of thumb for how unequal the sample sizes need to be for heterogeneity of variance to be a problem.

So if you have equal variances in your groups and unequal sample sizes, no problem. If you have unequal variances and equal sample sizes, no problem.

The only problem is if you have unequal variances *and* unequal sample sizes.

#### 2. Power with Unequal samples

The statistical power of a hypothesis test that compares groups is highest when groups have equal sample sizes.

Power is based on the smallest sample size, so while it doesn’t hurt power to have more observations in the larger group, it doesn’t help either.

So if you have a specific number of individuals to randomly assign to groups, you’ll have the most power if you assign them equally.

If your grouping is a natural one, you’re not making decisions based on a total number of individuals. It’s very common to just happen to get a larger sample of one group compared to the others.

That doesn’t bias your test or give you incorrect results. It just means the power you have is based on the smaller sample.

So if you have 30 individuals with Treatment A and 40 individuals with Treatment B and 300 controls, that’s fine. It’s just that you could have stopped with 30 controls. The extra 270 didn’t help the power of this particular test.

### Yes, this all holds true for independent samples t-tests

Independent samples t-tests are essentially a simplificiation of a one-way ANOVA for only two groups. In fact, if you run your t-test as an ANOVA, you’ll get the same p-value. And the between-groups F statistic will be the square of the t statistic you got in your t-test.

(Really, try it…. pretty cool, huh?)

This means they work the same way. Unbalanced t-tests have the same practical issues with unequal samples, but it doesn’t otherwise affect the validity or bias in the test.

### Problems in *Factorial* ANOVA

Factorial ANOVA includes all those ANOVA models with more than one crossed factor. It generally involves one or more interaction terms.

Real issues with unequal sample sizes **do** occur in factorial ANOVA in one situation: when the sample sizes are confounded in the two (or more) factors. Let’s unpack this.

For example, in a two-way ANOVA, let’s say that your two independent variables (factors) are Age (young vs. old) and Marital Status (married vs. not).

Let’s say there are twice as many young people as old. So unequal sample sizes.

And say the younger group has a much larger *percentage* of singles than the older group. In other words, the two factors are not independent of each other. The effect of marital status cannot be distinguished from the effect of age.

So you may get a big mean difference between the marital statuses, but it’s really being driven by age.

### What about Chi Square Tests?

(This article is about ANOVA (and t-tests), but I’ve updated to include Chi-Square tests after getting a lot of questions).

There are a number of different chi-square tests, but the two that can seem concerning in this context are the Chi-Square Test of Independence and The Chi-Square Test of Homogeneity. Both have two categorical variables. Both count the the frequencies of the combinations of these categories.

They calculate the test statistic the same way. Without getting into the math, it’s basically a comparison of the actual frequencies of the combinations with the frequencies you’d expect under the null hypothesis.

And luckily, unequal sample sizes do not affect the ability to calculate that chi-square test statistic. It’s pretty rare to have equal sample sizes, in fact. The expected values take the sample sizes into account. So no problems at all here.

That said, when there is a third variable involved, you can have an issue with Simpson’s Paradox. You may or may not have collected that third variable, so it’s worth thinking about whether there could be something else that is creating an association in a combination of two groups of that third variable that doesn’t exist in each group alone.

But that’s not really an issue with unequal sample sizes. That’s an issue of omitting an important variable from an analysis.

Stella says

Hello, Karen I’m glad I came across this site! Please I’m facing a challenge with my research work. I sampled 6 different land use types, replicated 4 land use types 5times and the other two, 4 and 2 (due to their limited size for sampling). Now I want to see to significant difference using a parameter between different replications and their means using ANOVA. This shows an unbalanced sampling, and I’ve tried to use Gabriel test but my variance shows unequal and my data is not normally distributed. Please, how do I go about this analysis? Thanks!

Karen says

Hi Stella,

I’d have to know a lot more about your study and data to make suggestions about an analysis. I’m just not comfortable making suggestions as it’s too easy for someone to have left out crucial info. It seems you have a lot going on there. So I’d suggest a consultation.

Seaneen says

Thanks so much Karen, that makes a bit more sense now! Will have a go at graphing them. Thanks again!

Lou says

Hi Karen,

I am in the process of collecting data and plan to use a 2 (gender, between subjects) x 3 (condition, between subjects) x 3 (time of testing, within subjects) ANOVA to analyse my data.

I want to run an a priori power analysis to check how many participants I should have in each cell. I am unsure if I am using Gpower correctly (particularly if an effect size of .3 is ok), but it gives me a sample of 102 overall (17 per cell?). I wonder if this seems right and if having vastly mismatched cells will matter? (some cells currently have 49 participants).

Thank you in advance!

Seaneen says

Hi Karen, I am hoping you might be able to offer some suggestions regarding two questions I am struggling with for my data analysis.

1) I have one study which has shown a statistically siginificant difference between two sample groups, using a Mann-Whitney test as the data is not normal, however the groups are unequal in size (Group 1 = 3369, Group 2 = 1524). My supervisor has asked whether I can apply a correction factor to account for the difference in group size, however I was under the impression that the Mann-Whiteny already accounts for this? Any ideas??

2) Another study has two sample groups with almost exactly equal means (Group 1=5.67, Group 2=5.75), which to me intuitively says they are not statistically different, however again the data are not normally distributed (and not equal in size either Group 1=103, Group 2 = 221), so I am assuming I have to run a non-parametric test, which results in statistically significant differnece between the groups??

I hope that all makes sense!

Any light at all you can shed on this would be greatly appreciated, I have been struggling for days and have exhausted the textbooks and web pages!!! Thanks in advance!

Seaneen

Karen says

Hi Seaneen,

1) No correction necessary. M-W is fine for unequal samples.

2) It’s possible to have so-small-it’s-not-interesting but statistically significant results. But another possibility is that the nonparametric test isn’t comparing means. If you have an outlier or two, that would affect means (possibly making them closer than say, the medians) but would not affect the nonparametric test. So it’s possible those two distributions have the same mean, but aren’t generally overlapping as much as the close means would indicate. I say graph them.

Sarah says

I need to run an ANOVA with two samples (n is unequal for the groups) for several measurements. I am not able to carry this out, perhaps because the sample sizes are different? I am comparing 28 different categories between two groups at 3 different ages. How do I do this? I ran student t-tests that gave good information, but am now asked to run an ANOVA. Any help would be appreciated.

Karen says

Hi Sarah,

I’m not sure I understand what is your DV. Is it the 28 categories? Or you’re saying you have 28 DVs?

Maria says

sorry I mean, pick randomly 7 females 🙂

Karen says

Hi Maria,

This is tricky–unequal sample sizes are definitely a problem with two-way models, but at the same time 7 is a very, very small sample. Is there any way to get more males instead?

Maria says

Hi Karen, I´m running a 2×2 mixed ANOVA (between factor is gender male and female, within is measurement at Time 1 and Time 2) with 7 males and 29 females. Is it okay do to that or is the samplesizes too unequal? The variances in score (using two different scales) are mostly twice as much for woman than for men, for instance std. (man/woman) = 0.4/0.8 , 0.4/0.9 and the scores from the other scale 5.6/4.9 and 3.7/6.3. Or should I randomly (SPSS can do it) take 7 males and then perform the 2×2 mixed ANOVA?

Nidhi says

Hello,

I m using a multiple regression for my research project. My sample sizes are unequal like students-720, parents-135 and teachers-80. I want to find the effect of parents and teachers on students. I have used SPSS software to calculate it, but still want to confirm from you whether you can do muliple regression with unequal sample size. Pls help me as i am confused and stuck in this. Thanks.

Ali says

Hi Keren,

I am a business student and i dont have a strong statistic background but im not afraid of learning if there are any articles that can help please let me know. I have three variables. one is independent, second a mediator and third is dependent. Data will be collected from managers and employees. IV and DV data will be collected from managers and mediator data from employees. Now the problem is if there are 20 managers and there are 100 employees. I was following baren and kenny (1986) approach and Jud and kenny (1981b) recomendation to run regresson models to analyze data . Now im looking at other techniques due to unequal sample size. Can i analyze data in anova if there is any artice on this sourt of problem please let me know i appreciate any help i get. Thanks

Colin Jones says

I am trying to figure out sample size of an article on socially conscious mutual funds. The article takes a look at industries/sectors that are screened out of these mutual funds in order to evaluate performance. The three independent sectors that are looked at are tobacco, alcohol, and gambling. Each sector is compared to the S&P 500 Index over an 11 year span. Tobacco has 15 stocks in the industry, alcohol-18, and gambling-22. Do you know what the number of the sample size would be for this? Would it be 3? Or 1, since they are all exclusive?

Karen says

Hi Colin,

It’s hard for me to say without seeing the paper and exactly which analysis they’re doing and how. It could either be the number of stocks or it could be, as you suggested, the number of industries.

Marko says

Hi Karen,

So glad I found this site! I’m having trouble accepting my analysis and perhaps I’m doing it wrong so hopefully you can shed some light.

My master’s thesis is on female choice. I conducted three-choice experiments in which females are presented 3 different acoustic stimuli simultaneously. I record which stimulus they choose as well as the time it took them to make the choice (latency). My issue is with the latency analysis. I assumed that a one-way ANOVA was a proper test because my independent factor is categorical (choice) and my dependent factor is continuous (latency–time).

My sample sizes:

Stimulus 1: 2

Stimulus 2: 10

Stimulus 3: 18

One issue I have is that the variance for the group with two individuals is HUGE, mainly because one female took her time to choose that stimulus, whereas another female chose that same stimulus rather quickly. I found no significance across the board, but is it because of that low sample size of group 1?

Thank you so much for your help. I really appreciate it.

Marko

Karen says

Hi Marko,

Theoretically it doesn’t matter that your samples are unequal, but practically, you’re going to have a hard time if a sample is only 2.

Your choices are to run more subjects or drop that stimulus group. Unfortunately, that’s about all you can do. Since none of your groups is very large, running more subjects would be the best, if you can manage it.

gautam says

Hi. I have done an analysis on 3 groups. Group 1 has 24 subjest, group 2 has 398 and group 3 has 755 subjects. On analysing variable vomiting; group 1 had 12 subjects with vomiting out of 24 (50%); group 2 had 169 subjects out of 398 ( 42.5%) and group 3 had 270 out of 756 (35.8%) with vomiting. On analysis by chi square (3×3) pvalue was statistically significant ( .041). To find out which group differed from each other i did pair wise comaprison between group1 and2, group 1 and 3 and group 2 and 3. The pvalue for group 2 and 3 analysis was less than .05 thus statistically significant but for group 1 and 2 and group 2 and 3 the analysis was not statistically significant. My question is: the difference between group 2 with 42.5% of cases and 35.8% of cases with vomiting was statitically significant but why the difference between group 1 with 50% ( which is higher than proportion of cases seen in group 2) when comapred with group 3 with 35.8% was not statistically significant. Is it because of very less number of subjects in group 1 the difference was not sigmificant or something else.

Thank u.

Karen says

Hi Gautam,

Yes, that’s probably it. With so few people in Group 1, you don’t have much power to find a difference.

Manoj says

Hi Karen,

Could you please help me with your valuable suggestions in stats?

I have three groups (n1=16, n2=23 and n3=24) with different sample sizes. I want to see the significant difference between these groups based on a parameter in common. Please let me know the best method or tool to analyse.

Thanks,

Manoj

Karen says

Hi Manoj,

Well it depends on which parameter you want to compare. If it’s the mean of each group on some dependent variable, then you can use one way ANOVA. The different sample sizes are no problem.

Karen

Richa Gupta says

Is it compulsory to have no of patients equal in both group for data analysis?? If not then can i exclude a single patient to remove bias at the end of study for analysis to make equal sample in both groups?

Karen says

It’s not necessary at all, unless you had some sort of patient matching. It sounds like you don’t, so you’re good to go.

Shari says

Hi Karen,

I’m looking at differences in fish weight between a control groups and 4 different treatments groups from experiment start to finish.

I am a Masters thesis student and have a run a 2-way ANOVA on my data to but have unequal groups (unavoidable and I was told this wouldn’t be a problem by my supervisors). I have 3 independent variables {sample period, treatment and frequency} and 1 dependent {weight}.

So turns out it is a problem – the levene’s test is 0.017. My data conforms to normality and my model is significant 0.018. My factor (sample period) which is significant to the .001.

Should I be running another stats test or is there a way to adjust for the lack of homogeneity?

Thanks for help!

Karen says

Hi Shari,

I would investigate those variances more. Levene’s test isn’t very useful for testing assumptions (see Keppel, 1993).

Yannis says

Sorry for double posting, I meant to create a new reply but replied to a post instead:

Hi Karen,

Thank you for this article, both the article and the discussions below are enlightening 🙂

Can I ask your opinion on one related thing; I want to run a two-way ANOVA with unequal sample sizes. The reason for the unequal sizes is that there is a third factor that doesn’t participate to this ANOVA and requires its own data points. What would be the way to go when downsizing the larger sample groups in terms of randomization?

To give an example, let’s say we compare responses from athletes and non-athletes, which are either male or female. So the factors are Gender (Male, Female) and Athlete (Yes, No). This will be analyzed with a two-way ANOVA, let’s call it ANOVA A. So we have:

Male Athletes: n=20

Male Non-Athletes: n=20

Female Athletes: n=40, but we want to make it n=20

Female Non-Athletes: n=40, but we want to make it n=20

The Female subjects are more because in the same study but a different analysis we will do exactly the same comparison, but with an added factor, eg. In-pregnancy (Yes, No), which doesn’t apply to males. So that one will be another two factor ANOVA, let’s call it ANOVA B:

Female Athletes In-Pregnancy: n=20

Female Non-Atheltes In-Pregnancy: n=20

Female Atheltes Not-In-Pregnancy: n=20

Female Non-Athletes Not-In-Pregnancy: n=20

How do we choose which females to use in the downsized group for ANOVA A? It sounds logical to randomly select 20 Female Athletes and 20 female Non-Athletes, but should we care if they are In-Pregnancy or not? Or should we account for that as well?

Thanks a lot,

Yannis

Karen says

Hi Yannis,

That’s a great question.

I assume that if you had not had the pregnant/non-pregnant groups selected out for the second study, you would have just randomly selected 20 Female athletes and 20 female athletes. Unless it’s standard or relevant to find out if they’re pregnant, you wouldn’t ever know, right?

So there are two options for the study where pregnancy is not relevant.

1. Figure out what percentage of the female athlete population is usually pregnant at any given time, then sample your two samples at the same rate.

2. Decide that the population of interest is non-pregnant female athletes and just use that sample.

mauricio says

Hello. Than ks for the information. I would like to ask, what is recommended to use as post hoc when runnin on-way ANOVA with different size samples.

4 groups n = 10, 1 control group n = 30. thanks a lot 🙂

Karen says

I would usually use a Tukey. Tukey Kramer is the version for unequal sample sizes.

ryan says

Hi Karen,

I get confused with my data analysis. Im about to study motivation towards grade achievement. The motivation is divided into 2 categories: intrinsic (interest and attitude) and also extrinsic (family, social, teaching style, learning style). grade is defined in term of A, A-, B+,B, B-, C+, C, C-, D and E. Since I have run the ANOVA one way test, the result shows there are sig. different among those means. But when I try to run the post hoc test, its comes out like this:

Warnings

Post hoc tests are not performed for Gred because at least one group has fewer than two cases.

Can I know how to solve such problem please?? Im new in statistic..

Thanks =)

Karen says

Hi Ryan,

It’s hard to tell exactly what is going on without looking at it, but it sounds like there is one group within your motivation categories with only one person. I would start with some frequency tables.

Kevin Kirkpatrick says

I’m using ANOVA to compare user preference ratings R within various cities, for groups A, B, and C. Unfortunately, my group sizes are HUGELY skewed – group A will typically have 20,000 or more members per city, group B will have ~1,000, and group C can have as few as 100.

In response, I have been running ANOVA by

1) determining count of C members in each city, call this Cn (let’s say 130 C people in Dallas)

2) randomly pick Cn members from group A within each city, calling this a sample-A group (in contrast to population-A for the city). So in my hypothetical, this might mean picking 130 A ratings out of 25,000.

3) I then perform a one-sample t-test on the sample-A vs population-A within each city – in the Dallas hypothetical, comparing the 130 sample-A to the 25,000 population-A.

4) repeat steps 2 and 3 until until I get a sample-A selection with no significant difference from population-A for each city. This might mean I re-pick the 130 Dallas A ratings several times until I’ve picked a representative sample.

5) I repeat 2 – 4 for group B.

6) I perform my ANOVA test on Sample-A, Sample-B, and Sample-C within each city.

This seems to be working quite well; indeed, I’ve clearly identified cities where the ratings of A, B, and C groups truly seem to differ. However, I’m not an experience statistician, and since this approach feels ad-hoc, I’m curious as to whether the results would stand up to scrutiny.

Karen says

Hi Kevin,

Your sampling seems fine. The one thing I would change, though, is eliminate steps 3-5. Those are still based on the very large pop size. As long as your sampling is truly random, there should theoretically be no difference between the mean of the population and the sample.

AMY says

Hi

if I have three different sample sizes which are 48 , 46 and 44.

can I use one-way ANOVA.

Thanks. : )

yasmine says

Hey Karen

I have a question, when running a one way anova with three levels (60, 62, 63 participants in each group) and one group not having met the normality assumption (although the histogram looks like it satisfies normality) but equal variance was met, what kind of post hoc test should I be using? and why?

thanks!!! 🙂

Karen says

Hi Yasmine,

There isn’t a post hoc for a situation of non-normality. If the normality is close enough for the ANOVA F test, it’s good enough for posthocs.

Mohammed says

Hi.

I have 3 subgroups from the main group. The no. of sample in each group was 6,7,9. Can I use ANOVA or Kruskall Wallis H test in comparison and why?

hellen says

I am analysing my data using STATISTICA, I have a problem of getting standard error as zero across my dry matter variable yet other variables do not have a zero standard error. what could be the problem? Thank you

Karen says

Hi Hellen,

I would need a lot more information, and probably to actually see the analysis to figure this one out. It sounds like you’re overspecifying the model in some way.

sufala says

Hi, i m doing a studt with six groups , so i have to do anova. but when i check for normality by using shepiro wilks test or kolmogrv test, data in two of the six groups is not normally distributed. can i still continue with anova or KW test?

nisha says

hello mam,

my total sample is 218, divided into three different groups and count is: group a:65, group b:61, group c:92. i have to do comparison between these three groups. for that i used anova for comparison and after find the result (p) value i have to use post post hoc test. Could you please suggest me what type of post hoc test i can use in my study, because my sample is large.

thank you. please reply asap.

sanaz says

Hi,

I was wondering if you can help me to find an answer for my question?

I have collected 567 data on smoking status. 11 respondents (2.5%) are smoker and 553 (97.5%) are non-smoker. I want to conduct a t-test to compare these two groups regarding their difference in mean of another variables. Is is doable? I just ignored testing this variable due to very unbalanced sample size. is that right?

Thank you

Karen says

It’s doable. Just be very careful to check the equal variance assumption. The bigger issue is that 11 is very small, and you may not want to make inferences on the responses from 11 people.

Hector says

Hi Karen,

Thank you for sharing your knowledge with us.

I have an ANCOVA question for you. I am trying to compare a treatment and a control group, across 8 different segments of people. My sample sizes for treatment and control groups for each of the 8 segments are not even. The worst uneven sample sizes are n(treatment)=20, n(control)=8. My results are showing significant difference between the treatment and control groups in only one of the eight segments, however the “observed power” for the test is much lower than 0.8. So, I am wondering whether these results are reliable at all?

If I want to increase the power, is there any way other than increasing the sample size (because I can not)? For instance, is there any other test?

Thank you for your help, in advance,

/Hector

Karen says

Hi Hector,

Yes, if a test is insignificant and the true effect size is the effect size you measured, then you have insufficient power to detect that effect. You don’t need observed power to check that.

Here are pretty much the only -ways to increase power. https://www.theanalysisfactor.com/5-ways-to-increase-power-in-a-study/

Keneth Tumwebaze says

When I analyse data with ANOVA, I am able to present my p values and means in a table and this acceptable. However, i have a study in which i intend to KruskWallis and i would want to have my results in a table from. Is it order to put the medians or i use p values only? i have not come across this very later situation. Advice.

Karen says

Hi Keneth, although technically a Kruskal Wallis is not testing medians, it is pretty common to report medians as a descriptive stat, along with the K-W test statistic and p-value.

Ambika K.C. says

Namste Mam

I have some problem in my statistics, I have two sample size one 18 and other 17 when i test normality, from Shapiro test(R) presenting p values of 17(sample size) 0.007442i.e p is less than o.o5 and (18 sample size) 0.3423 i.e p is greater than o.o5 respectively. With the p-Values it is observed that one has normal distribution but next does not present normal distribution. In this situation which test is suitable, Can i use Wilcox.test rank sum test (nonparametric test).

I have drawn this sample from one community Forest which is divided into two blocks one is unmanaged and other is managed block of CFs

Karen says

Namaste Ambika,

I don’t like Shapiro Wilk test as a final decision maker about normality. I would first investigate what distributions you do have. If the one doesn’t look normal, why not? Skew? An Ourlier? Uniform?

That said, the Wilcoxon is considered distribution-free, so it’s safe to use, if it answers your research question.

Mona says

Hi

In my paper, males and females compared through Manova Test. The number of males is 37 and females are 86. Is this difference of numbers affect the results? How can I justify this difference?

Best

Daniel says

Hi Karen,

I would be most grateful if you could help me as I have an ANCOVA question for you.

Two of my independent variables have unequal sample sizes, for example: the first variable (depression) was drawn from a student sample, the depression variable has 6 ordinal levels with: n=55, 16, 6, 5, 4, 1 (in each level of depression). The second variable (anxiety), also from a student sample and has 4 ordinal levels with: n=36, 28, 17, 6. As you probably assumed: when depression and anxiety increases the n for level of the respective group gets smaller (there are few subjects with higher levels of anxiety or depression in the sample).

Question: Should run the analysis as it is (I have used levene’s test of equality of error variance and it was non-significant), or should I merge i.e the levels 3-6 in the depression variable and 3 & 4 in the anxiety variable. What would you do?

Thank you very much for your time,

Daniel

Karen says

Hi Daniel,

There isn’t one right answer to this one, since you don’t seem to have problems with unequal variance.

But I can tell you a group with n=1 (the highest depression) has no variance, so isn’t useful. It is certainly reasonable to combine those groups, as long as it makes theoretical and logical sense.

And as long as those natural groupings aren’t giving you opposite results, it should help your power as well.

David says

Hi Karen,

I’m running Anova to compare means. Anova sig. = .129 but post hoc test concludes there’s significant difference at 0.05 level. How come?

Karen says

Hi David,

I was going to refer you to another article, but just realized I haven’t written anything on this. It’s so important (and common). Here’s the quick answer:

1. They’re not actually testing the exact same thing.

2. The F test always trumps the post-hoc. If it’s not significant, don’t run a post-hoc. 🙂

Grace says

Please help me with my assignment. I really dont know what to do cause our prof didn’t teach this yet and this is some kind of advance study for us but its so hard 🙁

HOMEWORK – Introduction to Analysis of Variance

A psychologist conducts a research to compare learning performance for three (3) species of monkeys. The animals are tested individually on a delayed-response task. A raisin is hidden in one of three containers while the animal watcher from its cage window. A shade is then pulled over the window for 1 minute to block the view. After this delay period, the monkey is allowed to respond by tipping over one container. If its response is correct, the monkey is rewarded with the raisin. The number of trials it takes before the animal makes five (5) consecutive correct responses is recorded. The researcher used all the available animals from each species which resulted in unequal sample size (n). The data is summarized below. Ref. (Gravetter, Frederick J.; Walnau, Larry B.;, 2012)

Vervet

Rhesus

Baboon

n=4 n=10 n=6 N=20

M=9 M=14 M=4 G=200

T=36 T=140 T=24

SS=200 SS=500 SS=320

Summary Table for One-Way ANOVA

Source SS df MS F

Between

Within

Total

Fcrit = ? at alpha 0.05

Guide Questions:

1. Formulate the steps in hypothesis testing (10 pts)

2. Construct the summary table for One-way ANOVA (8 pts)

3. Identify if the problem uses one-tail or two-tail of alpha level? Explain why? (2 pts)

Karen says

Hi Grace, while I appreciate how hard this can be, as a rule, I don’t help with homework. That’s what your TA is paid the big bucks to do. 🙂

Alex says

Hi Karen,

I was hoping to use ANCOVA to compare a battery of neuropsychological tests in carriers vs non-carriers, controlling for age, gender and education level and I have three questions about that which I was hoping you could help me with. 🙂

Firstly, do I have to demean the covariates, before feeding them into (the) SPSS (multivariate general linear model)?

Secondly, is Levene’s Test of Equality of Error Variances the test I need to do to check if the variances are sufficiently similar to perform the ANCOVA on?

Lastly, assuming this is the case, what happens if Levene’s test is significant? Does it matter a lot for ANCOVA (or is it very robust anyway)? Is there a non-parametric alternative that I could use instead?

Thank you very much!

-Alex

Karen says

Hi Alex,

1. I’m not sure what you mean by demean. I assume you mean “mean center.’ If so, it’s not necessary, but can be helpful.

2. Levene’s is popular, but I don’t use it, at least not as a sole criterion.

3. It’s robust, unless sample sizes are quite different.

Mike says

Hi,

If my samples from two groups were slightly unbalanced (8 vs 9), but the homogeneity of variance was not violated (Levene’s test > 0.05). Does it mean that I could interpret the results as if the data were balanced? Thank you very much.

Mike

Karen says

Yes!

Mike says

Thanks:)

Kaye says

hi Karen,

I’m new in spss and research analysis hope you can help me. I am doing an analysis on the influence of teacher characteristics (ex.academic background) to student scores. i have 135 teachers and more than 4000 students. how should i prepare my data set so i can do a multiple regression? thank you!!!

Karen says

Hi Kaye, if you’re looking at teacher characteristics on their students, you need to account for the fact that the students with the same teacher are not independent. You do this with a multilevel or mixed model. You can get a lot more info here: https://www.theanalysisfactor.com/category/mixed-and-multilevel-models/

Mark Lowe says

Hi Karen,

I am wanting to run a one-way between groups ANOVA, however my groups sizes are 88, 76 and 7. Do you have any suggestions or comments on whether this is going to provide useful information?

thanks

Mark

Karen says

Hi Mark,

It’s hard to do any sort of comparison with only 7 observations in a group. That said, in some studies that’s all you ever have. This could be useful, but pay very close attention to those assumptions. A non-parametric test, like Kruskall-Wallis, may be a safer approach.

Chantalle says

Hi Karen,

I’m hoping to run a one-way ANOVA with 4 independent factors. The sample sizes are 102, 100, 100 & 59. Levine’s test was significant (0.001) after an arcsin transformation (data were percentages). The distributions are normal.

I read somewhere that if there is less than a 5-fold difference in standard deviations, the ANOVA should still be robust, even with heterogeneity of variance, but the site did not list any references. In my case, there is a 1.49-fold difference between the largest and smallest standard deviation.

I was wondering whether you think I can use an ANOVA?

Also, I’m having trouble tracking down the paper you referenced (Keppel, 1993). In what journal was it published?

Thank you very much! 🙂

Karen says

HI Chantalle,

5-fold sounds higher than I’ve seen, but 1.49 is probably fine. Keppel is a textbook, not a journal article. Desighn and Analysis: A researcher’s Handbook is the title.

Karen

Kathy says

I am running both t-tests and logistic regression analyses looking at income differences between two groups. One group has 980 subjects; the other 9800. In another comparison, one group has 980 and the other group has 430,000.

I have run t-tests using the lincom function in stata (with unequal variances). I have also drawn a random sample of 10% of the larger group and re-run some of the analyses. While my means change slightly with the smaller samples, the overall patterns persist and statistical significance does not change.

I have a reviewer who has asked whether I have applied any corrections to take sample size differences into account. Would you suggest any additional corrections, other than what I have already done? The reviewer in particular questioned whether I could trust my results that indicated statistical significance, given the very different sizes between the two groups. Would you agree with this concern?

I appreciate any feedback!

Karen says

Hi Kathy,

I understand doing corrections in a factorial situation, but you don’t have that. It sounds like you already tried the subset of the larger group, and got the same answer. I’m not sure what other corrections you’re supposed to try.

Richard says

I have 3 sample groups I wish to compare. Sample A = 20 Sample B = 20 Sample C = 40. Do I need to adjust my ANOVA to compare them? If so, how do I calculate the weighted mean? The samples come from 3 different stakeholder groups i.e. different populations. Does make a difference when calculating the weighted mean?

All my data is in Excel. Is it possible to carry out an ANOVA with weighted mean in Excel?

Sorry for all the questions. Your help would be greatly appreciated.

Karen says

Hi Richard,

I suspect is it possible to do an ANOVA with weighted means in Excel, but I don’t ever use Excel for data analysis, so I have no idea how.

You would need to do adjustments to means if you’re calculating by hand, but stat software will do it for you automatically.

Rebekah says

Hi Karen,

I have completed an independent samples t-test and because equal variances are not assumed, I go with the statistics which SPSS provides for that correction. However, my sample sizes are not similar (71/242) and therefore I have been taught to be very leery of the corrected t statistic. One solution I have been told is to select a random sample of the bigger group (so I would select 71 cases randomly out of the 242) and then run the test so that you have equal groups (71 to 71) to run your t test. Have you ever heard of this? Is this the most robust way of dealing with the issue of having both unequal variances and unequal n size?

Any help/suggestions would be much appreciated!

Karen says

I have heard of that (just read it in a book again yesterday). You’re absolutely right that when the sample sizes are that different, you have to be careful about unequal variances.

Another option, btw, would be a nonparametric test, like Wilcoxon Rank Sum.

Lia says

i have a 2 x 2 x 2 mixed anova design as well,

it’s a 2×2 repeated measures followed by a between group (gender).

but my sample size difference is 59 and 29, is that too big a difference?

Lia says

Also, past research have said females would generally do better, so with it at 59 and males at 29, should i report a possible confound?

Karen says

Hi Lia,

It could. This is exactly the situation where the bigger sample of females could cause problems. Are the results the same within each gender?

Lia says

there is a marginal significance p=0.058 in only one of the interaction between gender and another IV

Marco says

Hi,

I’ve run a 2 (groups) x 3 (modalities) x 3 (intervals) mixed ANOVA.

Now, in group 1 there are 17 subjects, while in group 2 there are 15 subjects.

One reviewer asked if I applied any “correction” to take into account the different sample size.

I did not think this was a problem, above all with this small difference. Do you have any advice? What should I do?

Thank’s,

Marco

Karen says

Hi Marco, there is no need to do anything, particularly if at least two of those IVs are manipulated. It’s only a problem if there’s a relationship among the IVs. Even so, those n’s are very similar, even if not equal.

Chantal says

Hi Karen,

I am working on my masterthesis and am confronted with a dataset with 2 unequal groups sizes (n=48, n=160) at baseline (T1). I have to test whether there is a difference between the two groups at baseline before the start of the treatment but also after 3 and 9 months (T2, T3). Besides, the second group size gets smaller over time (n=132 at T3), so I am wondering what test to perform to deal with these difficulties.

Hope to hear from you.

Greetz Chantal

Karen says

Hi Greetz,

It’s really not a problem if the groups are unequal sizes. The bigger problem would be why one group is losing subjects over time but the other isn’t (although maybe I’m just assuming that last part)

Muj says

Hey Karen,

I have conducted an ANOVA for 3 between factor groups A (n=26) B (n=19) and a neutral group (n=68). no significant effect was found, but i would like to know if this was likely due to the neutral group? what problems would the large size of this neutral group present for this situation?

Thanks,

Muj

Karen says

Hi Muj,

I’m not sure what you mean by if it was due to that group. Because it has the largest size, it should have the narrowest standard error. It would entirely depend on the order of the three means. It’s the two small groups that would potentially cause problems. That’s where your power is limited.

Karen

Muj says

Thanks for the prompt reply,

I am testing the effects of schizotypy on memory performances in particular accuracy and reaction time, its proposed that there would be a difference between high and low groups with high groups performing significantly worse…however no significant effect was found

the neutral group does have the narrowest standard error (25.95), compared to a low schizotypy group (43.58) and a high schizotypy group (50.98)…

means for RTare low = 848.58. high = 965.13. neutral =927.29

I was asked by my supervisor to comment on the potential problems of the large neutral group, could it be that she means that my other two samples were not as matched and had reduced power and so there was not a significant effect?

Sorry for my essay ^

but many thanks for your help! 😀

hasna says

how can treat with non-parametric paired t-test if you have unequal samples size using r ?

Karen says

Unequal sample sizes ARE a problem if the data are paired. Do you mean that some pairs are missing one half of the pair?

Karri Kauppinen says

I love you. Thanks! 🙂

Anne says

Good afternoon Karen,

I have a question for you….my sample size is 351 (68 male/283 female).

I am comparing male/female on several continuous variables and using parametric tests; t-test and manova, etc.

The issue is the large difference between groups and feeling that I should conduct non parametrics? Would this ‘satisfy’ those reading my work? The results are the same with both para/non para., but I am concerned about the great differences due to the fact that this is my major hypothesis.

Thanks so much for your advise.

Anne

Karen says

Hi Anne,

If you’ve checked assumptions and have no problem with unequal variance, it’s fine.

That said, reviewers don’t always know that, so they may challenge you. If it would make you feel safer, and you are getting the same results anyway, there is nothing wrong with running it as a nonparametric for the t-test. You may have more trouble with the manova though–I don’t know of a nonparametric equivalent.

Karen

David Lane says

There is a good discussion of what to do when the variances are unequal here: http://beheco.oxfordjournals.org/content/17/4/688.full and it presents a good solution that holds for unequal n.

I have a simulation that lets you explore the issue for the test that assumes homogeneity of variance here: http://onlinestatbook.com/2/tests_of_means/robust_sim.html

and a discussion of unequal n in multi-factor designs here: http://onlinestatbook.com/2/analysis_of_variance/unequal.html

Anne says

Good morning Karen,

Great site!

I have a few questions:

My data: gender comparisons re knowledge, attitudes, beliefs.

Male n=68, female n=263.

1) I am running multiple regression, t-test and MANOVA.

I want to know if I need to run non parametrics to account for the unequal group n’s?

Doesn’t the Central Limit Theorem kick in due to my large sample sizes?

2) In my MANOVA, my Levene’s test shows two variables that are significant at both the.05 and .01 levels.

Should I not use MANOVA and look at other tests instead?

Thanks so much for your advice,

Anne

Karen says

Hi Anne,

1) You can run nonparametrics, but it’s usually not necessary. It’s hard to say what you need to do in any specific situation without all the details.

2) I’m not sure I understand this question, and as for what you should do, see my response to #1. If you want to restate that, I can give you some info so you can decide what you should do. 🙂

Karen