There are a number of simplistic methods available for tackling the problem of missing data. Unfortunately there is a very high likelihood that each of these simplistic methods introduces bias into our model results. Multiple imputation is considered to be the superior method of working with missing data. It eliminates the bias introduced by the […]

## Latest Blog Posts

The concept of “hazard” is similar, but not exactly the same as, its meaning in everyday English. If you’re not familiar with Survival Analysis, it’s a set of statistical methods for modelling the time until an event occurs.Let’s use an example you’re probably familiar with — the time until a PhD candidate completes their dissertation.

One issue with using tests of significance is that black and white cut-off points such as 5 percent or 1 percent may be difficult to justify. Significance tests on their own do not provide much light about the nature or magnitude of any effect to which they apply. One way of shedding more light on […]

Oops—you ran the analysis you planned to run on your data, carefully chosen to answer your research question, but your residuals aren’t normally distributed. Maybe you’ve tried transforming the outcome variable, or playing around with the independent variables, but still no dice. That’s ok, because you can always turn to a non-parametric analysis, right? Well, […]

by Kim Love, PhD What are the best methods for checking a generalized linear mixed model (GLMM) for proper fit? This question comes up frequently. Unfortunately, it isn’t as straightforward as it is for a general linear model. In linear models the requirements are easy to outline: linear in the parameters, normally distributed and independent […]

Survey questions are often structured without regard for ease of use within a statistical model. Take for example a survey done by the Centers for Disease Control (CDC) regarding child births in the U.S. One of the variables in the data set is “interval since last pregnancy”. Here is a histogram of the results.

A great tool to have in your statistical tool belt is logistic regression. It comes in many varieties and many of us are familiar with the variety for binary outcomes. But multinomial and ordinal varieties of logistic regression are also incredibly useful and worth knowing. They can be tricky to decide between in practice, however. […]

You probably learned about the four levels of measurement in your very first statistics class: nominal, ordinal, interval, and ratio. Knowing the level of measurement of a variable is crucial when working out how to analyze the variable. Failing to correctly match the statistical method to a variable’s level of measurement leads either to nonsense […]

Multicollinearity can affect any regression model with more than one predictor. It occurs when two or more predictor variables overlap so much in what they measure that their effects are indistinguishable. When the model tries to estimate their unique effects, it goes wonky (yes, that’s a technical term). So for example, you may be interested in […]

At times it is necessary to convert a continuous predictor into a categorical predictor. For example, income per household is shown below. This data is censored, all family income above $155,000 is stated as $155,000. A further explanation about censored and truncated data can be found here. It would be incorrect to use this variable […]