Previous Posts
An incredibly useful tool in evaluating and comparing predictive models is the ROC curve. Its name is indeed strange. ROC stands for receiver operating characteristic. Its origin is from sonar back in the 1940s; ROCs were used to measure how well a sonar signal (e.g., from a submarine) could be detected from noise (a school of fish). In its current usage, ROC curves are a nice way to see how any predictive model can distinguish between the true positives and negatives. In order to do this, a model needs to not only correctly predict a positive as a positive, but also a negative as a negative. The ROC curve does this by plotting sensitivity, the probability of predicting a real positive will be a positive, against 1-specificity, the probability of predicting a real negative will be a positive. (A previous newsletter article covered the specifics of sensitivity and specificity, in case you need a review about what they mean--and why it's important to know how accurately the model is predicting positives and negatives separately.) The best decision rule is high on sensitivity and low on 1-specificity. It's a rule that predicts most true positives will be a positive and few true negatives will be a positive. I've been talking about decision rules, but what about models?
In this webinar, we’ll provide a gentle introduction to generalized linear mixed models (or GLMMs). You’ll become familiar with the major issues involved in working with GLMMs so you can more easily transition to using these models in your work.
In a previous post we discussed the difficulties of spotting meaningful information when we work with a large panel data set. Observing the data collapsed into groups, such as quartiles or deciles, is one approach to tackling this challenging task. We showed how this can be easily done in Stata using just 10 lines of code..
Panel data provides us with observations over several time periods per subject. In this first of two blog posts, I’ll walk you through the process. (Stick with me here. In Part 2, I’ll show you the graph, I promise.)
The concept of a statistical interaction is one of those things that seems very abstract. If you’re like me, you’re wondering: What in the world is meant by “the relationship among three or more variables”?
When you have data measuring the time to an event, you can examine the relationship between various predictor variables and the time to the event using a Cox proportional hazards model. In this webinar, you will see what a hazard function is and describe the interpretations of increasing, decreasing, and constant hazard. Then you will examine the log rank test, a simple test closely tied to the Kaplan-Meier curve, and the Cox proportional hazards model.
In the past few months, I've gotten the same question from a few clients about using linear mixed models for repeated measures data. They want to take advantage of its ability to give unbiased results in the presence of missing data. In each case the study has two groups complete a pre-test and a post-test measure. Both of these have a lot of missing data...
Despite modern concerns about how to handle big data, there persists an age-old question: What can we do with small samples?
Odds Ratios and Relative Risks are often confused despite being unique concepts. Why?
There are not a lot of statistical methods designed just for ordinal variables. But that doesn't mean that you're stuck with few options. There are more than you'd think...