Author: Trent Buskirk, PhD.
As it is in history, literature, criminology and many other areas, context is important in statistics. Knowing from where your data comes gives clues about what you can do with that data and what inferences you can make from it.
In survey samples context is critical because it informs you about how the sample was selected and from what population it was selected.
Not every sample selected is a simple random sample so knowing information about the sampling design provides necessary context that allows researchers to create proper estimates and generate correct and projectable inferences.
One of the very first steps, then, in working with survey data is to understand the sampling design. There are a few key concepts that you not only need to understand in general, but define in your sample in order to provide the proper context for computing estimates and drawing inferences.
The first is the Sampling Unit.
This is the actual unit that we include in our sample. Usually this unit refers to an individual person, but it could be a company, a school, or a neighborhood, depending on what you’re measuring and how you’re measuring it.
Now put Sampling Units into their proper context and you have the Sample Frame which consists of a listing of all possible Sampling Units.
The target population provides the overall context and represents the collection of people, housing units, schools etc. about which inferences and estimates are desired.
Ideally, the sampling frame perfectly coincides with the target population. Of course, the ideal is not always possible. Sometimes the frame will be larger or smaller, depending on practical ways of getting in touch with each member of the sample.
Why Sampling Frames are so Important
Let’s say you’re doing a study on the opinions of US adults on current politicians.
Of course, you don’t have phone numbers for *all* adults in the US. But you are able to get a master frame of all available cell phone numbers, which you can sample using random digit dialing.
The target population (US Adults) will in large part be “covered” by the sampling frame (those in the cell phone banks).
However, some cell phone numbers in the bank are owned by children, who are not part of the target population.
Likewise, adults with only a land line telephone or no telephone at all will be not be covered by this sampling frame.
If these adults are different in some way from those who own a cell phone on our survey outcomes, then selection bias may result. In this particular case, it’s called coverage bias.
You may have no better option for a sampling frame or the frame may have been decided by someone else before the data were available to you. In any case, it’s vital that you know how the sample was obtained and how the sampling frame may not have covered the entire target population.
Knowing this information allows you to derive reasonable statistical estimates and perhaps more importantly, allows you to make inferences that can be put in proper context.