If you have a categorical variable that you plan to use in a regression analysis in SPSS, there are a couple ways to do it. You can use the SPSS Regression procedure, which I will talk about more in another post. Or you can use SPSS GLM, which I discuss here, and in a follow-up post.
The big question in SPSS GLM is what goes where. As I’ve detailed in another post, any continuous independent variable goes into covariates. And don’t use random factors at all unless you really know what you’re doing.
So the question is what to do with your categorical variables. You have two choices, and each has advantages and disadvantages.
The easiest is to put categorical variables in Fixed Factors. SPSS will dummy code those variables for you, which is quite convenient if your categorical variable has more than two categories. However, there are some defaults you need to be aware of that may or may not make this a good choice.
SPSS always makes the reference group the one that comes last alphabetically. So if the values you input are strings, it will be the one that comes last. If those values are numbers, it will be the highest one.
In some studies it really doesn’t matter which is the reference group. But in others, interpreting regression coefficients will be a whole lot easier if you choose a group that makes a good comparison, such as a control group or the most common group in the data. If you want that to be the reference, make it come last alphabetically. I’ve been known to do things like change my data so that the control group becomes something like ZControl. (But create a new variable–never overwrite original data).
It really can get confusing, though, if the variable was already dummy coded–if it already had values of 0 and 1. Because 1 comes last alphabetically, SPSS will make that group the reference group. This can really lead to confusion when interpreting coefficients. It’s not impossible if you’re paying attention, but you do have to pay attention.
In tomorrow’s post I’ll discuss another default in SPSS that will affect your decision.
If you want more information on using and interpreting parameter estimates in regression using SPSS, get the recording from my free webinar: Interpreting Regression Coefficients: A Walk Through Output.
Editor’s Update: If you want to learn in depth about dummy coding in SPSS GLM and Regression, as well as the other options in SPSS GLM, check out our workshop on Running Regressions and ANCOVAs in SPSS GLM. It’s now available in a home study version: http://theanalysisinstitute.com/workshops/SPSS-GLM/index.html
- Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 2
- SPSS GLM: Choosing Fixed Factors and Covariates
- Confusing Statistical Terms #1: The Many Names of Independent Variables
- Interpreting Interactions in Linear Regression: When SPSS and Stata Disagree, Which is Right?