Someone asked me this recently.
Many R advocates would absolutely say yes to everyone who asks.
I don’t.
(I actually gave her a pretty long answer, summarized here).
It depends on what kind of work you do and the context in which you’re working.
I can say that R is a very handy tool to have in your pocket.
It’s powerful. It’s flexible. It’s cost-effective.
And it’s a lot easier to approach than it used to be, with options like RStudio and RCommander.
But not everyone needs it.
If you have access to another software program (or two) that handles all the statistics you need; if you’re well versed in it and have secure legal access to a site license, you probably don’t need to learn R. This is particularly true if none of your colleagues in your field are using R in any large part.
And yet…
It’s always good to have options. A second software package in which you can check your work.
That may or may not be R. Still, there are times when R is the best tool for the job. Or the only tool.
More often, I suspect, is this situation: there are other tools that could work just as well, if you had access to them. And how many packages can you justify purchasing?
This exact situation came up for me just last month.
My Recent Experience with Embracing R
I was working with a client who needed a pretty specific statistic calculated–a sample size estimate for a kappa statistic (for inter rater reliability). Naturally, they had a tight deadline.
I checked all my sample size software, and none had that statistic. (Of course).
I didn’t look, but it’s quite possible I could have purchased a third sample size software package that had this specific statistic. There are others out there that I don’t have.
A year ago, I would have been a bit stuck and may have had to do just that.
But I’ve been taking some of our own R workshops. And I’ve started to see the options opening up. So I thought I’d check to see if there’s an R package for sample size estimates on kappa.
Lo and behold, there is.
It wasn’t hard to download or use, and I got the answer I needed pretty quickly.
My day was saved.
Why Learn a Second Statistical Software
So really it comes down to what I’ve been saying for years–it’s always good to have more statistical software options when you need them.
R is becoming an obvious choice, not just due to the $0 price tag, but the fact that so many people are creating packages that perform ridiculously specific tests–like a sample size calculation for a kappa statistic.
I realize that R looks extremely intimidating from the outside. I was required to use its parent SPlus in one of my grad programs, and I didn’t like it one bit.
But once you get started, you realize it’s pretty logical. I don’t know if R makes more sense than SPlus (I don’t think so) or if our instructor Kim Love is just a better teacher than my grad school TA (very likely).
The other thing I’ve been saying for years is that you don’t have to learn every difficult or ridiculously specific analysis up front. You don’t need to master every option on a tool to be able to use it.
Build for yourself a solid foundation, from basics up through linear models, and build it well. Once you have those skills, you’ll be able to add new ones from there as you need them.
I discovered R about 4 or 5 years ago. My company had a SAS license but only a few people had access and, frankly, most of them only knew how to plug their data into a script someone else had written. I learned R on my own to have control over my data, increase my knowledge and confidence in using stats and to gain flexibility in experimental design. That journey has not been a straight line! I think the efforts of R contributors like Hadley Wickham to add a “grammar” to R through the Tidyverse are remarkable and have made the software much more approachable. When you leave the Tidyverse, things can get scary. R offers tremendous power and flexibility but like everything else in life, flexibility is directly proportional to complexity.
I learned SPlus in graduate school, of course switched to R, and it is what I know. I’m in a biomedical workgroup using a statistical graphics program and some bespoke specialty software with our own (copious) data. No one else knows R. I’m not hands-on with data analysis these days (I do produce much of it). If a statistical problem is mentioned that I know R can resolve and if I suggest using it, the suggestion usually goes nowhere. The latest had to do with some data grooming, which R does well. It’ll probably get done instead by cut-pasting in a spreadsheet.
First off, assume for a second that you are not a student or work for a company that purchases software, but you are sitting at your desk and want to learn _____ (fill in the blank with your favorite stats package. Now, find out the cost of a license. SAS? Thousands, given that you need 6 modules for it to become useful. Stata or Minitab? At least $1,000. SPSS? Expensive, last time I looked. Want to do data mining? well, Statistica Data Miner is probably the best. Only will set you back $15,000. R? Basically free.
Customer work on Linux? Too bad if you are an SPSS or Stastistica user, since they run on Windows. R? Works everywhere. In fact, I do all my R work on an Apple iMac and every model I have delivered ran, unchanged, on systems other than Apple.
Superb community support that costs zero, add to that the many user supplied functions, and you have quite the robust environment.
Last but not least, just look at job boards, especially jobs with “big data” in the description. A vast majority want R. In fact, at a data mining conference in NY city a few months back, the speaker from Columbia University said he expects R to be the de facto statistics standard within 3-years.
Personally, I have not renewed my SAS license (way to expensive) and not sure if I’m going to upgrade Stata. Reason: since 2007, the only thing clients have requested is R
So, the question should really be: why DON’T you know R?
Hi Joe,
Well, there definitely people who don’t need to worry about the cost of a license of other software packages. (And fwiw, Stata is a relative bargain, compared to many others). And there are fields where other software is entrenched. People working in those fields, especially those who don’t use stats often, may not need R. At least not yet.
I still believe it’s useful to be able to use the field-entrenched software if you do any collaborative work, and this includes all grad students. You’re going to get stuck if you’re the only R user.
But unless you are a very seldom user of stats, knowing at least two packages is extremely helpful. So I agree–anyone who does a lot of stats ought to learn R.
I think R is becoming the statistics software equivalent of speaking English. Not everyone in the world needs it and it’s not the only important language, but many, many people benefit immensely from it. In some situations this benefit is huge. In others it’s not at all, and in some, it’s absolutely crucial.