Here’s a little tip.
When you construct Dummy Variables, make it easy on yourself to remember which code is which. Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results.
Make the codes inherent in the Dummy variable name.
So instead of a variable named Gender with values of 1=Female and 0=Male, call the variable Female.
Instead of a set of dummy variables named MaritalStatus1 with values of 1=Married and 0=Single, along with MaritalStatus2 with values 1=Divorced and 0=Single, name the same variables Married and Divorced.
And if you’re new to dummy coding, this has the extra bonus of making the dummy coding intuitive. It’s just a set of yes/no variables about all but one of your categories.
Someone who registered for my upcoming Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well.
The short answer: Yes
The long-winded detailed explanation of why this is true and the one caveat:
One of the greatest things about regression models is that they all have the same set up: (more…)
This one is relatively simple. Very similar names for two totally different concepts.
Hierarchical Models (aka Hierarchical Linear Models or HLM) are a type of linear regression models in which the observations fall into hierarchical, or completely nested levels.
Hierarchical Models are a type of Multilevel Models.
So what is a hierarchical data structure, which requires a hierarchical model?
The classic example is data from children nested within schools. The dependent variable could be something like math scores, and the predictors a whole host of things measured about the child and the school.
Child-level predictors could be things like GPA, grade, and gender. School-level predictors could be things like: total enrollment, private vs. public, mean SES.
Because multiple children are measured from the same school, their measurements are not independent. Hierarchical modeling takes that into account.
Hierarchical regression is a model-building technique in any regression model. It is the practice of building successive linear regression models, each adding more predictors.
For example, one common practice is to start by adding only demographic control variables to the model. In the next model, you can add predictors of interest, to see if they predict the DV above and beyond the effect of the controls.
You’re actually building separate but related models in each step. But SPSS has a nice function where it will compare the models, and actually test if successive models fit better than previous ones.
So hierarchical regression is really a series of regular old OLS regression models–nothing fancy, really.
Confusing Statistical Terms #1: Independent Variable
Confusing Statistical Terms #2: Alpha and Beta
Confusing Statistical Terms #3: Levels
Oh so many years ago I had my first insight into just how ridiculously confusing all the statistical terminology can be for novices.
I was TAing a two-semester applied statistics class for graduate students in biology. It started with basic hypothesis testing and went on through to multiple regression.
It was a cross-listed class, meaning there were a handful of courageous (or masochistic) undergrads in the class, and they were having trouble keeping (more…)
One of the biggest challenges in learning statistics and data analysis is learning the lingo. It doesn’t help that half of the notation is in Greek (literally).
The terminology in statistics is particularly confusing because often the same word or symbol is used to mean completely different concepts.
I know it feels that way, but it really isn’t a master plot by statisticians to keep researchers feeling ignorant.
Really.
It’s just that a lot of the methods in statistics were created by statisticians working in different fields–economics, psychology, medicine, and yes, straight statistics. Certain fields often have specific types of data that come up a lot and that require specific statistical methodologies to analyze.
Economics needs time series, psychology needs factor analysis. Et cetera, et cetera.
But separate fields developing statistics in isolation has some ugly effects.
Sometimes different fields develop the same technique, but use different names or notation.
Other times different fields use the same name or notation on different techniques they developed.
And of course, there are those terms with slightly different names, often used in similar contexts, but with different meanings. These are never used interchangeably, but they’re easy to confuse if you don’t use this stuff every day.
And sometimes, there are different terms for subtly different concepts, but people use them interchangeably. (I am guilty of this myself). It’s not a big deal if you understand those subtle differences. But if you don’t, it’s a mess.
And it’s not just fields–it’s software, too.
SPSS uses different names for the exact same thing in different procedures. In GLM, a continuous independent variable is called a Covariate. In Regression, it’s called an Independent Variable.
Likewise, SAS has a Repeated statement in its GLM, Genmod, and Mixed procedures. They all get at the same concept there (repeated measures), but they deal with it in drastically different ways.
So once the fields come together and realize they’re all doing the same thing, people in different fields or using different software procedures, are already used to using their terminology. So we’re stuck with different versions of the same word or method.
So anyway, I am beginning a series of blog posts to help clear this up. Hopefully it will be a good reference you can come back to when you get stuck.
We’ve expanded on this list with a member training, if you’re interested.
If you have good examples, please post them in the comments. I’ll do my best to clear things up.
If you’ve ever tried sharing SPSS output with your collaborators, advisor, or statistical consultant, you have surely noticed that the output is often not compatible across different versions of SPSS.
And if you work in a company where everyone is working on the same site license, it’s not a problem. But if you’re collaborating with colleagues at different universities on different upgrade schedules, you might run into some problems.
It’s true that most software programs aren’t back-compatible. You can’t read documents created in newer versions in older versions of software.
But SPSS’s sharing capabilities are more, um, interesting.
The syntax and data files are back and forward-compatible across many versions, at least since v9 or so. (I don’t (more…)