Previous Posts
Most statistical software packages use a spreadsheet format for viewing the data. This helps you get a feeling for what you will be working with, especially if the data set is small. But what if your data set contains numerous variables and hundreds or thousands of observations? There is no way you can get warm and fuzzy by browsing through a large data set. To help you get a good feel for your data you will need toutilize your software’s command or syntax editor to write a series of code for reviewing your data. Sounds complicated...
ROC Curves are incredibly useful in evaluating any model or process that predicts group membership of individuals. ROC stands for Receiver Operating Characteristic. This strange name goes back to its original use of assessing the accuracy of sonar readings. Any ROC can tell you how well a process or model distinguishes between true and false positives and negatives.
So I was glad that SPSS became an option for generalized linear mixed models. But that Model Viewer had led me to nearly give up that option. It's that annoying. (Google it if you're curious about the hate for the Model Viewer). Anyway, there is now a way to get rid of it.
How should I build my model? I get this question a lot, and it's difficult to answer at first glance--it depends too much on your particular situation. There are really three parts to the approach to building a model: the strategy, the technique to implement that strategy, and the decision criteria used within the technique.
Like many people with graduate degrees, I have used a number of statistical software packages over the years. Through work and school I have used Eviews, SAS, SPSS, R and Stata. Some were more difficult to use than others but if you used them often enough you would become proficient to take on the task at hand (though some packages required greater usage of George Carlin’s 7 dirty words). There was always one caveat which determined which package I used...
If you have worked on or know of a paper that used mixed models, please give us the reference in the comments. Links to online versions are great too, if you have one. Trust me, many people in your field are looking for an example and will be happy to cite it.
Repeated measures ANOVA is the approach most of us learned in stats classes, and it works very well in certain designs. But it’s a bit limited in what it can do. Sometimes trying to fit a data set into a repeated measures ANOVA requires too much data gymnastics—averaging across repetitions or pretending a continuous predictor isn’t really.
All resampling techniques are based on the idea of repeatedly estimating a statistic based on subsets of the sample. There are many practical applications, including estimating standard errors when they can’t be based on a theoretical distribution (a.k.a., when distributional assumptions are not met).
You can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.
Sometimes you need to know if your data set contains elements that meet some criterion or a particular set of criteria. For example, you may need to know if you have missing data (NAs) lurking somewhere in a large data set...

stat skill-building compass