*Do I really need to learn R?*

Someone asked me this recently.

Many R advocates would absolutely say yes to everyone who asks.

I don’t.

(I actually gave her a pretty long answer, summarized here).

It depends on what kind of work you do and the context in which you’re working.

I can say that R is a very handy tool to have in your pocket.

It’s powerful. It’s flexible. It’s cost-effective.

But not everyone needs it.

If you have access to another software program (or two) that handles all the statistics you need; if you’re well versed in it and have secure legal access to a site license, you probably don’t need to learn R. This is particularly true if members of your field are not using R in any large part.

#### And yet…

It’s *always* good to have options. A second software package in which you can check your work. (Though that may or may not be R).

*
*Still, there are times when R is the best tool for the job. Or the only tool.

More often, I suspect, is this situation: there are other tools would work just as well, just not the one you have already. And how many packages can you justify purchasing?

This exact situation came up for me just last month.

I was working with a client who needed a pretty specific statistic calculated–a sample size estimate for a kappa statistic (for inter rater reliability). Naturally, they had a tight deadline.

I checked all my sample size software, and none had that statistic. (Of course).

I didn’t look, but it’s quite possible I could have purchased a third sample size software package that had this specific statistic. There are others out there that I don’t have.

A year ago, I would have been a bit stuck and may have had to do just that.

But I’ve been taking some of our own R workshops over the past 6 months, taught by David Lillis. And I’ve started to see the options opening up. So I thought I’d check to see if there’s an R package for sample size estimates on kappa.

Lo and behold, there is.

It wasn’t hard to download or use, and I got the answer I needed pretty quickly.

My day was saved.

So really it comes down to what I’ve been saying for years–it’s always good to have more statistical software options when you need them.

R is becoming an obvious choice, not just due to the $0 price tag, but the fact that so many people are creating packages that perform ridiculously specific tests–like a sample size calculation for a kappa statistic.

I realize that R looks extremely intimidating from the outside. I was required to use its parent SPlus in one of my grad programs, and I didn’t like it one bit.

But once you get started, you realize it’s pretty logical. I don’t know if R makes more sense than SPlus (I don’t think so) or if David is just a better teacher than my grad school TA (very likely).

The other thing I’ve been saying for years is that you don’t have to learn every difficult or ridiculously specific analysis up front. You don’t need to master every option on a tool to be able to use it.

Build for yourself a solid foundation, from basics up through linear models, and build it well. Once you have those skills, you’ll be able to add new ones from there as you need them.

{ 2 comments… read them below or add one }

First off, assume for a second that you are not a student or work for a company that purchases software, but you are sitting at your desk and want to learn _____ (fill in the blank with your favorite stats package. Now, find out the cost of a license. SAS? Thousands, given that you need 6 modules for it to become useful. Stata or Minitab? At least $1,000. SPSS? Expensive, last time I looked. Want to do data mining? well, Statistica Data Miner is probably the best. Only will set you back $15,000. R? Basically free.

Customer work on Linux? Too bad if you are an SPSS or Stastistica user, since they run on Windows. R? Works everywhere. In fact, I do all my R work on an Apple iMac and every model I have delivered ran, unchanged, on systems other than Apple.

Superb community support that costs zero, add to that the many user supplied functions, and you have quite the robust environment.

Last but not least, just look at job boards, especially jobs with “big data” in the description. A vast majority want R. In fact, at a data mining conference in NY city a few months back, the speaker from Columbia University said he expects R to be the de facto statistics standard within 3-years.

Personally, I have not renewed my SAS license (way to expensive) and not sure if I’m going to upgrade Stata. Reason: since 2007, the only thing clients have requested is R

So, the question should really be: why DON’T you know R?

Hi Joe,

Well, there definitely people who don’t need to worry about the cost of a license of other software packages. (And fwiw, Stata is a relative bargain, compared to many others). And there are fields where other software is entrenched. People working in those fields, especially those who don’t use stats often, may not need R. At least not yet.

I still believe it’s useful to be able to use the field-entrenched software if you do any collaborative work, and this includes all grad students. You’re going to get stuck if you’re the only R user.

But unless you are a very seldom user of stats, knowing at least two packages is extremely helpful. So I agree–anyone who does a lot of stats ought to learn R.

I think R is becoming the statistics software equivalent of speaking English. Not everyone in the world needs it and it’s not the only important language, but many, many people benefit immensely from it. In some situations this benefit is huge. In others it’s not at all, and in some, it’s absolutely crucial.