SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two.

by Karen

In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?

The default is to use whatever software they used in your statistics class–at least you know the basics.

And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve, not its ability to handle advanced analyses that are encountered in research.

I think I’ve used at least a dozen different statistics packages since my first stats class. And here are my observations:

1. The first one you learn is the hardest to learn. There are many similarities in the logic and wording they use, even if the interface is different. So once you’re learned one, it will be easier to learn the next one.

2. You will have to learn another one. Just accept it.  If you have the self discipline to do it, I suggest learning two at the beginning. This will come in handy for a number of reasons

- My favorite stat package for a while was BMDP. Until the company was bought up by SPSS. I’m not sure if they stopped producing or updating it, but my university cancelled their site license.

- Many schools offer only a site license for only one package, and it may not be the one you’re used to. When I was at Cornell, they offered site licenses for 5 packages. But when a new stats professor decided to use JMP instead of Minitab, guess what happened to the Minitab site license? Unless you’re sure you’ll never leave your current university, you may have to start over.

- In case you decide to outwit the powers-that-be in IT who control the site licenses and buy your own (or use R, which is free), no software package does every type of analysis. There is huge overlap, to be sure, and the major ones are much more comprehensive than they were even 5 years ago. Even so, the gaps are in the most complicated analyses–some mixed models, gee, complex sampling, etc. And when you’re trying to learn a new, highly complicated statistical method is not the time to learn a new, highly complicated stats package.

For these reasons, I recommend that everyone who plans to do research for the forseeable future learn two packages.

I know, it’s hard enough to find the time to start over and learn one. Much less the self discipline. But if you can, it will save you grief later on. There are many great books, online tutorials, and workshops for learning all the major stats packages.

But I also recommend you choose one as your primary package and learn it really, really well. The defaults and assumptions and wording are not the same across packages. Knowing how yours handles dummy coding or missing data is imperative to doing correct statistics.

Which one? Mainly it depends on the field you’re in. Social scientists should generally learn SPSS as their main package, mainly because that is what their colleagues are using. You can then choose something else as a backup–either SAS, R, or Stata, based on availability and which makes most sense to you logically.

{ 2 comments… read them below or add one }

peng January 29, 2010 at 10:08 am

hi friends,
I am new to R.I would like to know R-PLUS.Does any know where can I get the free training for R-PLUS.

Regards,
Peng.

Reply

Dennis October 11, 2011 at 11:58 pm

Good advice, all around. But… if you choose SPSS as your primary package, SAS has little to offer you, and vice versa. The overlap is just too great to make either a good complement to the other.

A factor to consider in choosing between the Big Two is your preferred user interface. If you don’t want to program (much) and you adore point-and-shoot interfaces, go with SPSS. If you don’t mind programming explicitly, and despise point-and-shoot interfaces SAS will make you happier.

Another factor in choosing among the Big Two is your use of structural equation models (SEMs). If you don’t use them it’s a non-issue. If you use them extensively, you should choose between EQS-like syntax (in SAS PROC CALIS) and SPSS’s AMOS. SEMs are confusing enough without worrying about converting from your preferred expression of the models into the expression your software wants.

Much better choices as a complement to one of the Big Two are Stata and some dialect of S (R, S, S-plus). Stata users say it has some very slick programming facilities. (I’m not among them, so I can’t say from experience.) The S dialects are killers for simulation studies. I benchmarked R against SAS/IML (in version 9.1) and found R was an order of magnitude faster. R is built entirely around an object-oriented programming interface. Language extensions are a snap. In my opinion bootstrap estimation is easier in R than in other languages. High resolution graphics are native to R, and (despite a lot of improvement from versions 6 to 7 to 9.1 and 9.2) not native to SAS.

Reply

Leave a Comment

{ 2 trackbacks }

Previous post:

Next post: