spss syntax

Another Great SPSS book: SPSS Programming and Data Management

March 3rd, 2010 by

Have you ever needed to do some major data management in SPSS and ended up with a syntax program that’s pages long?  This is the kind you couldn’t even do with the menus, because you’d tear your hair out with frustration because it took you four weeks to create some new variables.

I hope you’ve gotten started using Syntax, which not only gives you a record of how you’ve recoded and created all those new variables and exactly which options you chose in the  data analysis you’ve done.

But once you get started, you start to realize that some things feel a little clunky.  You have to run the same descriptive analysis on 47 different variables.  And while cutting and pasting is a heck of a lot easier than doing that in the menus, you wonder if there isn’t a better way.

There is.

SPSS syntax actually has a number of ways to increase programming efficiency, including macros, do loops, repeats.

I admit I haven’t used this stuff a lot, but I’m increasingly seeing just how useful it can be.  I’m much better trained in doing these kinds of things in SAS, so I admit I have been known to just import data into SAS to run manipulations.

But I just came across a great resources on doing sophisticated SPSS Syntax Programming, and it looks like some fabulous bedtime reading.  (Seriously).

And the best part is you can download it (or order it, if you’d like a copy to take to bed) from the author’s website, Raynald’s SPSS Tools, itself a great source of info on mastering SPSS.

So once you’ve gotten into the habit of hitting Paste instead of Okay, and gotten a bit used to SPSS syntax, and you’re ready to step your skills up a notch, this looks like a fabulous book.

[Edit]: As per Jon Peck in the comments below, the most recent version is now available at www.ibm.com/developerworks/spssdevcentral under Books and Articles.

Want to learn more? If you’re just getting started with data analysis in SPSS, or would like a thorough refresher, please join us in our online workshop Introduction to Data Analysis in SPSS.

 


3 Pieces of SPSS Syntax to Keep Handy

October 23rd, 2009 by

I hope you’re getting started using SPSS Syntax by hitting that Paste button when you use the menus.

But there are a few parts of SPSS you can’t do that with. Specifically, there are syntax commands for doing all the variable definitions that you usually fill out in the “Variable View” window. But there are no Paste buttons there, so you have to know how to write the syntax from scratch.

I find the three variable definitions that I use the most are defining Variable Labels, Value Labels and Missing Data codes. The syntax is simple and logical for all three, so I’m going to just give you the basic code, which you can keep on hand and edit as you need.

For a data set with the variables Gender, Smoke, and Exercise, with the following definitions:

Gender: 0=Male, 1=Female
Smoke: 1=Never 2=Sometimes 3=Daily
Exercise: 1=Never 2=Sometimes 3=Daily

For all three variables, 999 = a user-defined missing value

We could use the following code to give descriptive variable labels, encode the value labels, and define the missing data:

VARIABLE LABELS
GENDER ‘Participant Gender’
SMOKE ‘Does Participant ever Smoke Cigarettes?’
EXERCISE ‘How Often Does Participant Exercise for a30 Minute Period?’.

Notice two things:
1. I could put all three Variable labels in the same Variable Label statement
2. There is a period at the end of the statement. This is required.

VALUE LABELS
GENDER 0 ‘Male’ 1 ‘Female
/SMOKE EXERCISE
1 ‘Never’
2 ‘Sometimes’
3 ‘Daily’.

MISSING VALUES
GENDER SMOKE EXERCISE (999).

Since all three variables have the same missing data code, I could include them all in the same statement.

There are, of course syntax rules for all of these commands, but you can easily look them up in the Command Syntax Manual.

Want to learn more? If you’re just getting started with data analysis in SPSS, or would like a thorough refresher, please join us in our online workshop Introduction to Data Analysis in SPSS.

 


A Great (and Free) Resource for SPSS Syntax: the Command Syntax Reference

October 22nd, 2009 by

I find SPSS manuals, as a rule, marginally useful. Sure they may tell you which options are available when doing Statistic X, but not what they mean or when to use them.

I still use them, of course, but only when I have no other options.

There is one exception, though, and that is the Command Syntax Reference. This is the manual that explains all the SPSS Syntax commands.

SPSS started as a syntax-only program. I first learned SPSS before Windows existed. I don’t think you could even get it for a PC back then. We had to use the College’s VAX mainframe computer. This was back in the days where you had to go pick up your printouts down the hall in the computing center. But no cards. I’m not THAT old.

Anyway, I think in those days SPSS must have put a lot of resources into really good manual writing. So the Command Syntax Reference, which was the entire manual, rocked. It still does, since for the most part, the syntax doesn’t change that much with new versions.

The great thing about it is now it’s available right in SPSS. When you click on help, instead of Search, choose Command Syntax Reference. It includes every possible option, explains when and how to use it, and what it means. It’s an extremely handy resource, comes free with SPSS, and you don’t have to spend hours searching the internet for an answer.

The only hard part about it is it is organized by the command, and they’re not always intuitive. So if you don’t know that the Univariate GLM menu equivalent syntax command is  “UNIANOVA,” you’ll have a hard time using it.

This is another good time to use the Paste button. Just use the menus to create some semblance of the analysis you want to do and hit Paste. You’ll get the basic command, which you can now look up and refine.

Want to learn more? If you’re just getting started with data analysis in SPSS, or would like a thorough refresher, please join us in our online workshop Introduction to Data Analysis in SPSS.

 


How to Effortlessly Create SPSS Syntax and Automatically Add it to your Output

October 16th, 2009 by

So hopefully I’ve extolled the benefits of using SPSS Syntax enough that you’re convinced it is something you should regularly use.

Even if you don’t start programming, there are two things you can do to begin learning Syntax and give you the communication and tracking benefits.

1. From now on, when you use menus for an analysis, instead of clicking the “OK” button, click “Paste.”*

When you use the menus and click OK, SPSS is translating your menu choices into syntax.  You just don’t see it.

When you click Paste, though, SPSS opens a syntax window and writes a copy of this syntax.  To run it, simply go to the Syntax window, highlight the procedure you want to run, and click the Run button, which looks like a triangle facing right.

This will get you used to the kind of language SPSS Syntax uses. You can, if you wish, start to edit it.

But even if you don’t, over time you’ll start to notice how logical it is and how the menu choices correspond to phrases in the syntax. And you’ll (more…)


5 Reasons to use SPSS Syntax

October 7th, 2009 by

You don’t rely on only SPSS menus to run your analysis, right?  (Please, please tell me you don’t).

There’s really nothing wrong with using the menus.  It’s a great way to get started using SPSS and it saves you the hassle of remembering all that code.

But there are some really, really good reasons to use the syntax as well.

 

1. Efficiency

If you’re figuring out the best model and have to refine which predictors to include, running the same descriptive statistics on a  bunch of variables, or defining the missing values for all 286 variable in the data set, you’re essentially running the same analysis over and over.

Picking your way through the menus gets old fast.  In syntax, you just copy and paste and change or add variables names.

A trick I use is to run through the menus for one variable, paste the code, then add the other 285. You can even copy the names out of the Variable View and paste them into the code. Very easy.

2. Memory

I know that while you’re immersed in your data analysis, you can’t imagine you won’t always remember every step you did.

But you will.  And sooner than you think.

Syntax gives you a “paper” trail of what you did, so you don’t have to remember. If you’re in a regulated industry, you know why you need this trail. But anyone who needs to defend their research needs it.

3. Communication

When your advisor, coauthor, colleague, statistical consultant, or Reviewer #2 asks you which options you used in your analysis or exactly how you recoded that variable, you can clearly communicate it by showing the syntax.  Much harder to explain with menu options.

When I hold a workshop or run an analysis for a client, I always use syntax.  I  send it to them to peruse, tweak, adapt, or admire.  It’s really the only way for me to show them exactly what I did and how to do it.

If your client, advisor, or colleague doesn’t know how to read the syntax, that’s okay. Because you have a clear answer of what you did, you can explain it.

4. Efficiency again

When the data set gets updated, or a reviewer (or your advisor, coauthor, colleague, or statistical consultant) asks you to add another predictor to a model, it’s a simple matter to edit and rerun a syntax program.

In menus, you have to start all over. Hopefully you’ll remember exactly which options you chose last time and/or exactly how you made every small decision in your data analysis (see #2: Memory).

5. Control

There are some SPSS options that are available in syntax, but not in the menus.

And others that just aren’t what they seem in the menus.

The menus for the Mixed procedure are about the most unintuitive I’ve ever seen.  But the syntax for Mixed is really logical and straightforward.  And it’s very much like the GLM syntax (UNIANOVA), so if you’re familiar with GLM, learning Mixed is a simple extension.

Bonus Reason to use SPSS Syntax: Cleanliness

Luckily, SPSS makes it exceedingly easy to create syntax.  If you’re more comfortable with menus, run it in menus the first time, then hit PASTE instead of OK.  SPSS will automatically create the syntax for you, which you can alter at will.  So you don’t have to remember every programming convention.

When refining a model, I often run through menus and paste it.  Then I alter the syntax to find the best-fitting model.

At this point, the output is a mess, filled with so many models I can barely keep them straight.  Once I’ve figured out the model that fits best, I delete the entire output, then rerun the syntax for only the best model.  Nice, clean output.

The Take-away: Reproducibility

What this all really comes down to is your ability to confidently, easily, and accurately reproduce your analysis. When you rely on menus, you are relying on your own memory to reproduce. There are too many decisions, judgments, and too many places to make easy mistakes without noticing it to ever be able to rely totally on your memory.

The tools are there to make this easy. Use them.

 


Averaging and Adding Variables with Missing Data in SPSS

August 29th, 2008 by

SPSS has a nice little feature for adding and averaging variables with missing data that many people don’t know about.

It allows you to add or average variables, while specifying how many are allowed to be missing.

For example, a very common situation is a researcher needs to average the values of the 5 variables on a scale, each of which is measured on the same Likert scale.

There are two ways to do this in SPSS syntax.

Newvar=(X1 + X2 + X3 + X4 + X5)/5  or

Newvar=MEAN(X1,X2, X3, X4, X5).

In the first method, if any of the variables are missing, due to SPSS’s default of listwise deletion, Newvar will also be missing.

In the second method, if any of the variables is missing, it will still calculate the mean.  While this seems great at first,  the researcher may wish to limit how many of the 5 variables need to be observed in order to calculate the mean.  If only one or two variables are present, the mean may not be a reasonable estimate of the mean of all 5 variables.

SPSS has an option for dealing with this situation.  Running it the following way will only calculate the mean if any 4 of the 5 variables is observed.  If fewer than 4 of the variables are observed, Newvar will be system missing.

Newvar=MEAN.4(X1,X2, X3, X4, X5).

You can specify any number of variables that need to be observed.

(This same distinction holds for the SUM function in SPSS, but the scale changes based on how many are being averaged.  A better approach is to calculate the mean, then multiply by 5).

This works the same way in the syntax or in the Transform–>Compute menu dialog.

First Published  12/1/2016;
Updated  7/20/21 to give more detail.