by Kim Love
If you are like I was for a long time, you have avoided learning R.
You’ve probably heard that there’s a steep learning curve, and that the available documentation is not necessarily user-friendly.
Frankly, both things are true, to some extent.
The best and worst thing about R is that it is open-source and there is no single company that is responsible for R or your ability to use it.
While there is a developer community that maintains a set of standards and regulated documentation, anyone can add new functionality to R through user-created “packages.”
This gives R users a large, flexible range of options (once you know how to install the packages, of course!), which can be a major advantage.
On the other hand, these packages are as diverse as the users who create them, and they may emphasize different model features, output displays, and even basic methodological principles.
Underlying all of this, though, is what I feel is the truly intimidating part of R: that is, how R thinks. For those of us who are used to using SAS, SPSS, and most other commercially-based statistical software products, the way that we interact with R feels dauntingly unfamiliar.
Consider running a linear model in SAS or SPSS.
We write some code, or click some buttons and follow some menus, and there’s our output. We might get slightly different output, depending on what options we include or check off, but that’s the basic story every time. We run a model, and our results appear.
Not so with R.
Let’s take a look at the syntax you might use to run a basic one-way ANOVA in R, using a dataset called data1. (Notice I say might, because there is more than one way to do this!)
model1 <- lm(yvar ~ factorvar, data=data1)
We run the syntax, and…
> model1 <- lm(yvar ~ factorvar, data=data1)
Did it work? And if it did work, where are the results?
Turns out, R stored them as an object called model1. If we want to see the results, we have to ask for them, and we have to know how.
If we want to see the ANOVA table, for example, one option is to run a function called anova on that object:
If we want to see the actual solution to the model, along with some other basic statistics, we might run a different function on that object:
While this might seem burdensome and unnecessary at first, the more you program in R, the more the advantages of this system become clear. It is exactly what gives R the wonderful flexibility and range that experienced R programmers always seem to be talking about.
Growing your understanding of this “object-based” programming opens many doors.
Most importantly, a deeper understanding of R objects and the functions we use on them is the key to being able to understand the documentation that seems so out of reach when we first start trying to learn R.