It’s easy to make things complex without meaning to. Especially in statistical analysis. Sometimes that complexity is unavoidable. You have ethical and practical constraints on your study design and variable measurement. Or the data just don’t behave as you expected. Or the only research question of interest is one that demands many variables. But sometimes […]
Missing data is a common problem in data analysis. One of the successful approaches is k-Nearest Neighbor (kNN), a simple approach that leverages known information to impute unknown values with a relatively high degree of accuracy.
Even if you’ve never heard the term Generalized Linear Model, you may have run one. It’s a term for a family of models that includes logistic and Poisson regression, among others. It’s a small leap to generalized linear models, if you already understand linear models. Many, many concepts are the same in both types of […]
by Steve Simon, PhD Hazard functions are a key tool in survival analysis. But they’re not always easy to interpret. In this article, we’re going to explore the definition, purpose, and meaning of hazard functions. Then we’ll explore a few different shapes to see what they tell us about the data. Motivating example This is […]
As it has been said a picture is worth a thousand words and so it is with graphics too. A well constructed graph can summarize information collected from tens to hundreds or even thousands of data points. But not every graph has the same power to convey complex information clearly.
Just about everyone who does any data analysis has used a chi-square test. Probably because there are quite a few of them, and they’re all useful. But it gets confusing because very often you’ll just hear them called “Chi-Square test” without their full, formal name. And without that context, it’s hard to tell exactly what […]
Missing data are a widespread problem, as most researchers can attest. Whether data are from surveys, experiments, or secondary sources, missing data abounds. But what’s the impact on the results of statistical analysis? That depends on two things: the mechanism that led the data to be missing and the way in which the data analyst […]
One component often overlooked in the ‘Define & Design’ phase of a study, is writing the analysis plan. The statistical analysis plan integrates a lot of information about the study including the research question, study design, variables and data used, and the type of statistical analysis that will be conducted.
What’s the difference between Multilevel Models, Mixed Models, and Hierarchical Models? I get this question a lot. The answer: very little.
Centering a covariate –a continuous predictor variable–can make regression coefficients much more interpretable. That’s a big advantage, particularly when you have many coefficients to interpret. Or when you’ve included terms that are tricky to interpret, like interactions or quadratic terms. For example, say you had one categorical predictor with 4 categories and one continuous covariate, […]