• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

What is a Complex Sample Part 3: Stratified Sampling

by Karen Grace-Martin 1 Comment

In our last two posts, we explained (1) that every member of a simple random sample had an equal probability of selection and (2) that there are some really good reasons why complex samples can work better, despite being more complex.

Today, we’re going to talk a bit about one complex sampling technique: stratified sampling.

What is Stratified Sampling?

In stratified sampling, the target population is first classified into subgroups or strata.  (Grammar note: “strata” is plural for “stratum” just as “data” is plural for “datum.”).

A simple random sample is then selected within every stratum.

That’s it.

For example, let’s say you’re doing a linguistics study within the US.  You want to make sure that you have enough people in your sample with most of the major dialects within American English.  You know there are regional differences in pronunciation and word use, and you want to ensure you include people in your sample who say “crawfish” and “crayfish” as well as “tennis shoes” and “sneakers.”

So rather than taking one simple random sample across the US, your first create four regional strata: Northeast, Southeast, Midwest, and West.  These regions are based on other studies that generally define four common dialectical patterns.

You then randomly sample from each of the four regions.

 What are the advantages of stratified sampling?

  •  Administration of field work is more convenient and less costly.  You can, for example, assign a separate research team to each region.  They have to do less travelling, reducing expenses.
  •  If people really are more similar within each stratum, stratified sampling will lead to improved precision of estimates.  In other words, standard errors will be lower.  Not just standard errors within a stratum, but the standard errors for estimates of the entire US.This is incredibly important, as it’s one of the few things you can do to improve power in a study without huge increases in sample size.
  • It allows oversampling of small populations of interest.  This is also incredibly important, as sometimes those small populations are vital to your research questions.

Let’s say that there is a 5th known, but relatively small, dialect in Alaska.  You have a particular interest in being able to compare this group to others in the study.

Perhaps Alaskans have their own word for crayfish and gym shoes that no one else uses.  (And yes, I’m totally making up this example).  Lumping them in with the population of other populous western states means you may end up with only 5 or 10 Alaskans, even if the entire western state sample is pretty large.

(fyi, if you’re not familiar with US Geography, although Alaska is huge in area, it has a small population.  Other western states, like California, on the other hand, have enormous populations).

If all you care about is representing the US, it’s fine.  But if you’d like to also do some analyses on just Alaskans, you won’t have enough in the sample without specifically sampling Alaskans at a higher rate than residents of other western states.

So, one option is to make Alaska its own stratum, then make sure the sample in that stratum is large enough to use in your statistical tests.

The Consequences

In order to have accurate estimates of the US in general, though, you need to account for the fact that there are proportionally more Alaskans in the sample than are really representative of the population.

You do this through weighting and through incorporating the stratification into the statistical analysis.

The weighting ensures that parameter estimates like means and regression coefficients are accurate and unbiased.  Incorporating the stratification ensures the standard errors are accurate.

Unfortunately, although most procedures in general statistical software can incorporate weights, you need to use software designed for complex surveys to include the stratification.

Luckily, all the major stat packages (and a few specialized ones) now have complex survey procedures available.

Bookmark and Share

Tagged With: oversampling, Stratified Sampling

Related Posts

  • What is Complex Sampling? Part 4: Cluster Sampling
  • What is a Complex Sample Part 2: What is it and Why Would You Ever Want to Use One?
  • How Does the Distribution of a Population Impact the Confidence Interval?
  • How Confident Are You About Confidence Intervals?

Reader Interactions

Comments

  1. Samson Odira Omolo says

    September 30, 2020 at 4:34 am

    really enjoyed what you posted to me on statistical analyses. As per to studies which is on
    * Impact on abundance, diversity, distribution and Public Health implications of disease vectors in solid waste disposal system in Mombasa County*. How would I go about the Stratified Sampling and what advantages it has on this study. Mombasa county has six sub-counties, each has different solid waste disposal sites. I would appreciate your idea so that I develop good proposal

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.