Stratified Sampling for Oversampling Small Sub-Populations

by guest

FacebookTwitterGoogle+Share

by Ritu Narayan

Sampling is a critical issue in any research study design. Most of us have grappled with balancing costs, time and of course, statistical power when deciding our sampling strategies.

How do we know when to go for a simple random sample or to go for stratification or for clustering? Let’s talk about stratified sampling here and one research scenario when it is useful.

One Scenario for Stratified Sampling

Suppose you are studying minority groups and their behavior, say Yiddish speakers in the U.S. and their voting.  Yiddish speakers are a small subset of the US population, just .6%.patterns. If Yiddish speakers are only 0.6% of the population, even if you have a random sample of 1000 respondents, you will get approximately 6 respondents from the group.

A large simple random sample of 1000 residents would, on average, result in 6 Yiddish speakers.

Obviously, it is difficult to draw any meaningful inferences about the group’s behavior based on 6 respondents.

Further, suppose you need at least 60 respondents from this group for sufficient statistical power. That would entail collecting data from 10,000 respondents – a number for which you might neither have the financial resources nor the time.

What if you were able to get meaningful results from a total sample of just 200 respondents of which 100 are Yiddish speakers? Now, that sounds more manageable, doesn’t it? And this is where stratified sampling comes in.

How to do it

In stratified sampling, the population is divided into different sub-groups or strata, and then the subjects are randomly selected from each of the strata. So, in the above example, you would divide the population into different linguistic sub-groups (one of which is Yiddish speakers). Here are two simple steps you should follow:

Step 1: Divide the population into sub-groups (strata)

Commonly used strata are age, gender, ethnicity, socio-economic class and religion. You should ensure that the strata meet the following criteria:

  1. The sub-groups should be exhaustive, i.e. the entire population should be covered within the strata. For example, the different strata for age could be child (<=12 years), teenager (13-19 years), adult (20-59 years), and senior (>=60 years).
  2. There should be no overlaps within sub-groups i.e. each subject or element should fall in only one sub-group. This is evident in the example above.

Step 2: Sample the strata using proportionate or disproportionate allocation

1. In proportionate allocation, in a sample of 1000, you would draw 6 Yiddish speakers.

2. Alternatively, you could draw 100 Yiddish speakers in a total sample size of 200. In other words, you disproportionately sample more subjects from the stratum of interest. That is, 50% Yiddish speakers is much more than their representation in the population (0.6%). With such a sample you can draw meaningful inferences about Yiddish speakers and how they compare with the rest of the population.

In this scenario, disproportional allocation would make the most sense, since the point is to ensure an adequate number of Yiddish speakers in the sample. 

Proportional allocation makes more sense in other scenarios.  One example is when the strata themselves are not of interest in the research question, but they improve access to potential research participants.

To summarize, one good reason to use stratified sampling is if you believe that the sub-group you want to study is a small proportion of the population, and sample a disproportionately high number of subjects from this sub-group. This will enable you to compare your sub-group with the rest of the population with greater accuracy, and at lower cost.

Ritu Narayan, M.S., M.B.A provides services to clients in conceptualization, design and implementation of research studies. Her expertise in quantitative data analysis methods and background in business consulting and information systems help her provide insights during all stages of research, from questionnaire design to report writing. She has published in peer reviewed journals on what motivates people to post online reviews, and is interested in research focused on social media.

Bookmark and Share

{ 3 comments… read them below or add one }

Carla July 28, 2015 at 5:49 am

I found it clear and useful. Thanks!
C.

Reply

peter hadjis February 1, 2015 at 6:08 pm

An otherwise useful explanation, is muddled up by omitted text or suspended editing.

Fourth paragraph down:
Obviously, it is difficult to draw any meaningful inferences about the group’s behavior based on 6 respondents.
only makes sense if you take into consideration:

Step 2: Sample the strata using proportionate or disproportionate allocation
1. In proportionate allocation, in a sample of 1000, you would draw 6 Yiddish speakers.

which further down in the text.

Reply

Karen February 6, 2015 at 5:12 pm

Thanks, Peter. I fixed it. 🙂

Reply

Leave a Comment

Please note that Karen receives hundreds of comments at The Analysis Factor website each week. Since Karen is also busy teaching workshops, consulting with clients, and running a membership program, she seldom has time to respond to these comments anymore. If you have a question to which you need a timely response, please check out our low-cost monthly membership program, or sign-up for a quick question consultation.

Previous post:

Next post: