*by David Lillis, Ph.D.*

Many of you have heard of R (the R statistics language and environment for scientific and statistical computing and graphics). Perhaps you know that it uses command line input rather than pull-down menus. Perhaps you feel that this makes R hard to use and somewhat intimidating!

OK. Indeed, R has a longer learning curve than other systems, but don’t let that put you off! Once you master the syntax, you have control of an immensely powerful statistical tool.

Actually, much of the syntax is not all that difficult. Don’t believe me? To prove it, let’s look at some syntax for providing summary statistics on a continuous variable.

First, install R by going to the following web-site:

http://cran.r-project.org/bin/windows/base/

Now you can download R by clicking on the following link:

OK, so you have successfully downloaded R. Now you should have the R icon on your desktop. Click this icon to open R.

Now for a simple exercise. Let’s take height to be a variable that describes the heights (in cm) of ten people. Copy and paste the following code to the R command line to create this variable.

height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)

The variable is now stored in the R workspace. To view this variable, enter:

height

Now, enter the following code at the command line and hit return after each piece of code. You will be surprised how easy this really is!

summary(height)

range(height)

mean(height)

sd(height)

max(height)

min(height)

length(height)

height[1]

height[3]

height[10]

So – it really wasn’t that difficult after all. More about R later.

**About the Author:***David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.*

{ 4 comments… read them below or add one }

Trying to dive into the command line via R is not easy for a generation of students who are brought up on GUI and clicking and pointing with a mouse.

Some years ago I taught an introductory statistics course and managed to get the package R Commander to automatically pop up when students clicked on the R icon. With directed instruction I found R Commander like training wheels program for beginners of R. Sure its not like R Studio but for my bunch of students it gave them a set of training wheels, which after a while they can ween themselves off and start using the command line.

I have found R a fantastic resource , combine it with R Commander and its a good way to start learning R.

Hi !

Your post show exactely why R is difficult to learn. The people who want to learn R are mainly statisticians, and the first thing a statistician want to do with his data is never to compute a mean. For that purpose, you have a calculator or somethong like that (Excel, Matlab or Scipy).

R syntax has too much odds characters (the “.” that means nothing, $, c, T, F, <-, <<-,…, etc), the syntax is inconsistent, the online tutorials rarely or never start by recoding or creating a variable, or computing a table, or how to use weights (we are in statisitics !), and the help files in R are helpless for a statistician (it is a kind of reference doc that show you the complexity of each function rather than a help doc that can help you to understand something).

And there is one more thing : categoricals variables (factors in R language). It is very difficult to find help about them, just because R don't hold them very well. And this kind of variable is what make a difference between a numerical analysis package as Matlab, and a statistical package as SPSS or Stata.

I spent many years to learn and use R and I can write a list of many things that made learning R and using it for serious statistical work very difficult.

But, what is good, is that some people accept to work on that flaws in R ecosystem and are trying to resolve them, and one good example is the book "R in Action". This book shows you how to use R for statistics.

Sorry for the english, it is my third language.

I agree that help files in R are difficult to understand without already having a sense of the program and what the functions do. It would be extremely difficult to learn R from those. On the other hand, there are some very good introductory R books. I think “Beginning R: The Statistical Programming Language” by Mark Gardener is very accessible. “Hands-On Programming with R” by Garrett Grolemund is another great choice. (They both have programming in the title. I am not a programmer, but I think learning to use R means you have to understand the syntax and how it is structured. Both books provide that at a very basic level.)

I would say that at times the syntax is overlapping – you can do the same thing in different ways, but not inconsistent. I’m sure you are aware of this, but … T is just the abbreviation for TRUE, same for F. “c” is concatenate. It’s what you use to create a vector. “<-" is just the symbol for assigning a value to a name. If you look around on some basic websites rather than R help I think a lot of what you mentioned you can figure out pretty easily. "Quick-R" is a great site.

I’m not sure most people trying to learn R are statisticians. There are probably a lot of students for whom R is part of statistics coursework or coursework in related fields, and a lot of working researchers and other people who need to evaluate what they are doing on the fly and are precluded by funding or time from summoning a statistician every time they need to work with or represent their data.