When we think about model assumptions, we tend to focus on assumptions like independence, normality, and constant variance. The other big assumption, which is harder to see or test, is that there is no specification error. The assumption of linearity is part of this, but it’s actually a bigger assumption.
What is this assumption of no specification error? (more…)
It’s easy to make things complex without meaning to. Especially in statistical analysis.
Sometimes that complexity is unavoidable. You have ethical and practical constraints on your study design and variable measurement. Or the data just don’t behave as you expected. Or the only research question of interest is one that demands many variables.
But sometimes it isn’t. Seemingly innocuous decisions lead to complicated analyses. These decisions occur early in the design, research questions, or variable choice.
(more…)
Just about everyone who does any data analysis has used a chi-square test. Probably because there are quite a few of them, and they’re all useful.
But it gets confusing because very often you’ll just hear them called “Chi-Square test” without their full, formal name. And without that context, it’s hard to tell exactly what hypothesis that test is testing. (more…)
by Danielle Bodicoat
Statistics can tell us a lot about our data, but it’s also important to consider where the underlying data came from when interpreting results, whether they’re our own or somebody else’s.
Not all evidence is created equally, and we should place more trust in some types of evidence than others.
(more…)

Open data, particularly government open data is a rich source of information that can be helpful to researchers in almost every field, but what is open data? How do we find what we’re looking for? What are some of the challenges with using data directly from city, county, state, and federal government agencies?
(more…)
One activity in data analysis that can seem impossible is the quest to find the right analysis. I applaud the conscientiousness and integrity that
underlies this quest.
The problem: in many data situations there isn’t one right analysis.
(more…)