Ever consider skipping the important step of cleaning your data? It’s tempting but not a good idea. Why? It’s a bit like baking.
I like to bake. There’s nothing nicer than a rainy Sunday with no plans, and a pantry full of supplies. I have done my shopping, and now it’s time to make the cake. Ah, but the kitchen is a mess. I don’t have things in order. This is no way to start.
First, I need to clear the counter, wash the breakfast dishes, and set out my tools. I need to take stock, read the recipe, and measure out my ingredients. Then it’s time for the fun part. I’ll admit, in my rush to get started I have at times skipped this step.
(more…)
When you draw a graph- either a bar chart, a scatter plot, or even a pie chart, you have the choice of a broad range of colors that you can use. R, for example, has 657 different colors from aliceblue to yellowgreen. SAS has 13 shades of orange, 33 shades of blue, and 47 shades of green. They even have different shades of black.
You have a wealth of colors, but you can’t use all of them in the same graph. The ideal number of colors is 2.
(more…)
In statistical practice, there are many situations where best practices are clear. There are many, though, where they aren’t. The granddaddy of these practices is adjusting p-values when you make multiple comparisons. There are good reasons to do it and good reasons not to. It depends on the situation.
At the heart of the issue is a concept called Family-wise Error Rate (FWER). FWER is the probability that
(more…)
Standard deviation and standard error are statistical concepts you probably learned well enough in Intro Stats to pass the test. Conceptually, you understand them, yet the difference doesn’t make a whole lot of intuitive sense.
So in this article, let’s explore the difference between the two. We will look at an example, in the hopes of making these concepts more intuitive. You’ll also see why sample size has a big effect on standard error. (more…)
If you’ve ever run a one-way analysis of variance (ANOVA), you’re familiar with post-hoc tests. The ANOVA omnibus test only tells you whether any groups differ in their means. But if you want to explore which specific group mean is different from which, you need to follow up with a post-hoc test. (more…)
When you need to compare a numeric outcome for two groups, what analysis do you think of first? Chances are, it’s the independent samples t-test. But that’s not the only, or always, the best option. In many situations, the Mann-Whitney U test is a better option.
The non-parametric Mann-Whitney U test is also called the Mann-Whitney-Wilcoxon test, or the Wilcoxon rank sum test. Non-parametric means that the hypothesis it’s testing is not about the parameter of a particular distribution.
It is part of a subgroup of non-parametric tests that are rank based. That means that the specific values of the outcomes are not important, only their order. In other words, we will be ranking the outcomes.
Like the t-test, this analysis tests whether two independent groups have similar typical outcomes. You can use it with numeric data, but unlike the t-test, it also works with ordinal data. Like the t-test, it is designed for comparisons, and not for estimation or prediction.
The biggest difference from the t-test is that it does not compare means. The Mann-Whitney U test determines whether a random observation from one group tends to be higher (or lower) than a random observation from the other group. Imagine choosing two observations, one from each group, over and over again. This test will determine whether one group is more likely to have the higher values.
It has many advantages: It is a straightforward comparison of means. There are versions for similar and different variances in the two groups. Many people are familiar with it.
(more…)