Sampling

How the Population Distribution Influences the Confidence Interval

September 6th, 2022 by Jeff Meyer

Spoiler alert, real data are seldom normally distributed. How does the population distribution influence the estimate of the population mean and its confidence interval?

To figure this out, we randomly draw 100 observations 100 times from three distinct populations and plot the mean and corresponding 95% confidence interval of each sample.
(more…)

1 comment

How Confident Are You About Confidence Intervals?

August 12th, 2019 by Jeff Meyer

Any time you report estimates of parameters in a statistical analysis, it’s important to include their confidence intervals.

How confident are you that you can explain what they mean? Even those of us who have a solid understand of confidence intervals get tripped up by the wording.

The Wording for Describing Confidence Intervals

Let’s look at an example. (more…)

8 comments

Member Training: A Quick Introduction to Weighting in Complex Samples

October 3rd, 2017 by guest contributer

A few years back the winning t-shirt design in a contest for the American Association of Public Opinion Research read “Weighting is the Hardest Part.” And I don’t think the t-shirt was referring to anything about patience!

Most statistical methods assume that every individual in the sample has the same chance of selection.

Complex Sample Surveys are different. They use multistage sampling designs that include stratification and cluster sampling. As a result, the assumption that every selected unit has the same chance of selection is not true.

To get statistical estimates that accurately reflect the population, cases in these samples need to be weighted. If not, all statistical estimates and their standard errors will be biased.

But selection probabilities are only part of weighting. (more…)

1 comment

Outliers and Their Origins

November 11th, 2016 by Karen Grace-Martin

Outliers are one of those realities of data analysis that no one can avoid.

Those pesky extreme values cause biased parameter estimates, non-normality in otherwise beautifully normal variables, and inflated variances.

Everyone agrees that outliers cause trouble with parametric analyses. But not everyone agrees that they’re always a problem, or what to do about them even if they are.

Sometimes a nonparametric or robust alternative is available — and sometimes not.

There are a number of approaches in statistical analysis for dealing with outliers and the problems they create. It’s common for committee members or Reviewer #2 to have very strong opinions that there is one and only one good approach.

Two approaches that I’ve commonly seen are: 1) delete outliers from the sample, or 2) winsorize them (i.e., replace the outlier value with one that is less extreme).

The problem with both of these “solutions” is that they also cause problems — biased parameter estimates and underweighted or eliminated valid values. (more…)

1 comment

Member Training: Small Sample Statistics

August 1st, 2016 by Audrey Schnell

Despite modern concerns about how to handle big data, there persists an age-old question: What can we do with small samples?

Sometimes small sample sizes are planned and expected. Sometimes not. For example, the cost, ethical, and logistical realities of animal experiments often lead to samples of fewer than 10 animals.

Other times, a solid sample size is intended based on a priori power calculations. Yet recruitment difficulties or logistical problems lead to a much smaller sample. In this webinar, we will discuss methods for analyzing small samples. Special focus will be on the case of unplanned small sample sizes and the issues and strategies to consider.

Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.

(more…)

Comments closed

Get your Sampling Out of My Survey Errors…

February 17th, 2015 by guest contributer

Author: Trent Buskirk, PhD.

In my last article, we got a bit comfortable with the notion of errors in surveys. We discussed sampling errors, which occur because we take a random sample rather than a complete census.

If you ever had to admit error, sampling error is the type to admit. Polls admit this sort of error frequently by reporting the margin of error. Margin of error is the sampling error multiplied by a distributional value that can be used to create a confidence interval.

But there are some other types of error that can occur in the survey context that, while influential, are a bit more invisible. They are generally referred to as non-sampling error.

These types of errors are not associated with sample-to-sample variability but to sources like selection biases, frame coverage issues, and measurement errors. These are not the kind of errors you want in your survey.

In theory, it is possible to have an estimator that has little sampling error associated with it. That looks good on the surface, but this estimator may yield poor information due to non-sampling errors.

For example, a high rate of non-response may mean that some participants are opting out and biasing estimates.

Likewise, a scale or set of items on the survey could have known measurement error. They may be imprecise in their measurement of the construct of interest or they may measure that construct better for some populations than others. Again, this can bias estimates.

Frame coverage error occurs when the sampling frame does not quite match the target population. This leads to the sample including individuals who aren’t in the target population, missing individuals who are, or both.

A perspective called the Total Survey Error Framework allows researchers to evaluate estimates on errors that come from sampling and those that don’t. It can be very useful in choosing a sampling design that minimizes errors as a whole.

So when you think about errors and how they might come about in surveys, don’t forget about the non-sampling variety – those that could come as a result of non-response, measurement, or coverage.

No comments yet