R




online workshops

Introduction to R: A Step-by-Step Approach to the Fundamentals

As a researcher, it’s critical for you to know more than one stat software. And R is a great “backup” program because it’s versatile, flexible, and FREE. R is virtually limitless… once you learn its quirks and ins and outs. learn more



articles at the analysis factor

R tutorial series

R is Not So Hard! A Tutorial
Part 1: Syntax

Many of you have heard of R (the R statistics language and environment for scientific and statistical computing and graphics). Perhaps you know that it uses command line input rather than pull-down menus. Perhaps you feel that this makes R hard to use and somewhat intimidating! learn more

R is Not So Hard! A Tutorial
Part 2: Variable Creation

In Part 1 we installed R and used it to create a variable and summarize it using a few simple commands. Let’s re-create that variable and also create a second variable, and see what we can do with them. As before, we take height to be a variable that describes the heights (in cm) of ten people. learn more

R Is Not So Hard! A Tutorial
Part 3: Regressions and Plots

In Part 2 of this series, we created two variables and used the lm() command to perform a least squares regression on them, treating one of them as the dependent variable and the other as the independent variable. Here we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. learn more

R Is Not So Hard! A Tutorial
Part 4: Fitting a Quadratic Model

In Part 3 we used the lm() command to perform least squares regressions. In Part 4 we will look at more advanced aspects of regression models and see what R has to offer. One way of checking for non-linearity in your data is to fit a polynomial model and check whether the polynomial model fits the data better than a linear model. learn more

R Is Not So Hard! A Tutorial
Part 5: Fitting an Exponential Model

In Part 3 and Part 4 we used the lm() command to perform least squares regressions. We saw how to check for non-linearity in our data by fitting polynomial models and checking whether they fit the data better than a linear model. Now let’s see how to fit an exponential model in R. learn more

R Is Not So Hard! A Tutorial
Part 6: Basic Plotting in R

In Part 6, let’s look at basic plotting in R. Try entering the following three commands together (the semi-colon allows you to place several commands on the same line). learn more

R Is Not So Hard! A Tutorial
Part 7: More Plotting in R

In Part 7, let’s look at further plotting in R. Try entering the following three commands together (the semi-colon allows you to place several commands on the same line). Let’s take an example with two variables and enhance it. learn more

R Is Not So Hard! A Tutorial
Part 8: Basic Commands

Let’s look at some basic commands in R. Set up the following vector. Now figure out what each of the following commands do. You should not need me to explain each command, but I will explain a few. learn more

R Is Not So Hard! A Tutorial
Part 9: Sub-setting

In Part 9, let’s look at sub-setting in R. I want to show you two approaches. Let’s provide summary tables on the following data set of tourists from different nations, their gender and numbers of children. learn more

R Is Not So Hard! A Tutorial
Part 10: Creating Summary Tables with aggregate()

In Part 10, let’s look at the aggregate command for creating summary tables using R. You may have a complex data set that includes categorical variables of several levels, and you may wish to create summary tables for each level of the categorical variable. learn more

R Is Not So Hard! A Tutorial
Part 11: Creating Bar Charts

Let’s create a simple bar chart in R using the barplot() command, which is easy to use. First, we set up a vector of numbers. Then we count them using the table() command, and then we plot them. The table() command creates a simple table of counts of the elements in a data set. learn more

R is Not So Hard! A Tutorial
Part 12: Creating Histograms & Setting Bin Widths

I’m sure you’ve heard that R creates beautiful graphics. It’s true, and it doesn’t have to be hard to do so. Let’s start with a simple histogram using the hist() command, which is easy to use, but actually quite sophisticated. First, we set up a vector of numbers and then we create a histogram. learn more

R Is Not So Hard! A Tutorial
Part 13: Box Plots

In Part 13, let’s see how to create box plots in R. Let’s create a simple box plot using the boxplot() command, which is easy to use. First, we set up a vector of numbers and then we plot them. Box plots can be created for individual variables or for variables by group. learn more

R Is Not So Hard! A Tutorial
Part 14: Pie Charts

In Part 14, let’s see how to create pie charts in R. Let’s create a simple pie chart using the pie() command. As always, we set up a vector of numbers and then we plot them. Create a simple pie chart. learn more

R Is Not So Hard! A Tutorial
Part 15: Counting Elements in a Data Set

Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria. Let’s count the 3s in the vector b. In fact, you can count the number of elements that satisfy almost any given condition. learn more

R Is Not So Hard! A Tutorial
Part 16: Counting Values within Cases

SPSS has the Count Values within Cases option, but R does not have an equivalent function. Here are two functions that you might find helpful, each of which counts values within cases inside a rectangular array. For example, you might have a data set consisting of responses to a questionnaire involving multiple Likert items scored 1 to 5. learn more

R Is Not So Hard! A Tutorial
Part 17: Testing for Existence of Particular Values

Sometimes you need to know if your data set contains elements that meet some criterion or a particular set of criteria. For example, a common data cleaning task is to check if you have missing data (NAs) lurking somewhere in a large data set. Or you may need to check if you have zeroes or negative numbers, or numbers outside a given range. learn more

R Is Not So Hard! A Tutorial
Part 18: Re-Coding Values

One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis. In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values. learn more

R Is Not So Hard! A Tutorial
Part 19: Multiple Graphs and par(mfrow=(A,B))

Today we see how to set up multiple graphs on the same page. We use the syntax par(mfrow=(A,B)), where A refers to the number of rows and B to the number of columns (and where each cell will hold a single graph). This syntax sets up a plotting environment of A rows and B columns. First we create four vectors, all of the same length. learn more

R is Not So Hard! A Tutorial
Part 20: Useful Commands for Exploring Data

Sometimes when you’re learning a new stat software package, the most frustrating part is not knowing how to do very basic things. This is especially frustrating if you already know how to do them in some other software. Let’s look at some basic but very useful commands that are available in R. learn more

R is Not So Hard! A Tutorial
Part 21: Pearson and Spearman Correlation

Let’s use R to explore bivariate relationships among variables. Part 7 of this series showed how to do a nice bivariate plot, but it’s also useful to have a correlation statistic. We use a new version of the data set we used in Part 20 of tourists. Here, we have a new variable – the amount of money they spend while on vacation. learn more

R is Not So Hard! A Tutorial
Part 22: Creating and Customizing Scatter Plots

In our last post, we calculated Pearson and Spearman correlation coefficients in R and got a surprising result. So let’s investigate the data a little more with a scatter plot. We use the same version of the data set of tourists. We have data on tourists from different nations, their gender, number of children, and how much they spent on their trip. learn more

Graphing Non-Linear Mathematical Expressions in R

Here, we see how to create mathematical expressions for your graph in R. We’ll use an example of graphing a cosine curve, along with relevant Greek letters as the axis label, and printing the equation right on the graph. Mathematical expressions, like sine or exponential curves on graphs are made possible through expression(paste()) and substitute(). learn more

Doing Scatterplots in R

In this lesson, we see how to use qplot to create a simple scatterplot. The qplot (quick plot) system is a subset of the ggplot2 (grammar of graphics) package which you can use to create nice graphs. It is great for creating graphs of categorical data, because you can map symbol colour, size and shape to the levels of your categorical variable. learn more

R Graphics: Plotting in Color with qplot

In this lesson, let’s see how to use qplot to map symbol colour to a categorical variable.
Copy in the following data set (a medical data set relating to patients in a randomised controlled trial) and create a scatterplot of patient height against weight before treatment. We map both symbol size and shape to GENDER using factor(). learn more

R Graphics: Plotting in Color with qplot Part 2

In the last lesson, we saw how to use qplot to map symbol colour to a categorical variable. Now we see how to control symbol colours and create legend titles. With the same data set of the last lesson (a medical data set relating to patients in a randomised controlled trial), we map symbol size to GENDER and symbol colour to EXERCISE, but choosing our own colours. learn more

Linear Models in R: Plotting Regression Lines

Today let’s re-create two variables and see how to plot them and include a regression line. We take height to be a variable that describes the heights (in cm) of ten people. Copy and paste the code to the R command line. Now let’s take bodymass to be a variable that describes the masses (in kg) of the same ten people. learn more

Linear Models in R: Diagnosing Our Regression Model

Last time we created two variables and added a best-fit regression line to our plot of the variables. Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. As before, we perform the regression. learn more

Linear Models in R: Improving Our Regression Model

Last time we created two variables and used the lm() command to perform a least squares regression on them, and diagnosing our regression using the plot() command. Just as we did last time, we perform the regression using lm(). This time we store it as an object M. learn more

Generalized Linear Models in R
Part 1: Calculating Predicted Probability in Binary Logistic Regression

Ordinary Least Squares regression provides linear models of continuous variables. However, much data of interest to statisticians and researchers are not continuous and so other methods must be used to create useful predictive models. learn more

Generalized Linear Models in R
Part 2: Understanding Model Fit in Logistic Regression

In the last article, we saw how to create a simple Generalized Linear Model on binary data using the glm() command. We continue with the same glm on the mtcars data set (modeling the vs variable on the weight and engine displacement). learn more

Generalized Linear Models in R
Part 3: Plotting Predicted Probabilities

In our last article, we learned about model fit in Generalized Linear Models on binary data using the glm() command. We continue with the same glm on the mtcars data set (regressing the vs variable on the weight and engine displacement). Now we want to plot our model, along with the observed data. learn more

Generalized Linear Models in R
Part 4: Options, Link Functions, and Interpretation

Previously, I wrote several articles (GLM in R 1, GLM in R 2, GLM in R 3) that provided an introduction to Generalized Linear Models (GLMs) in R. As a reminder, Generalized Linear Models are an extension of linear regression models that allow the dependent variable to be non-normal. In our example here, we fit a GLM to a set of education-related data. learn more



R and stat software

What R Commander Can Do in R Without Coding–More Than You Would Think

I received a question recently about R Commander, a free R package. R Commander overlays a menu-based interface to R, so just like SPSS or JMP, you can run analyses using menus. Nice, huh? The question was whether R Commander does everything R does, or just a small subset. learn more

Ways to Customize a Scatter Plot in R Commander

I mentioned in my last post that R Commander can do a LOT of data manipulation, data analyses, and graphs in R without you ever having to program anything. Here I want to give you some examples, so you can see how truly useful this is. Let’s start with a simple scatter plot between Time and the number of Jobs (in thousands) in 67 counties. learn more

Random Sample from a Uniform Distribution in R Commander

To celebrate two milestones hit by The Analysis Factor, we decided to do a giveaway to 6 randomly-chosen newsletter subscribers. And since randomly generating numbers is something you often need to do in research, I thought I would let you know how we did it. learn more

Ten Ways Learning a Statistical Software Package is Like Learning a New Language

Someone recently asked me if they need to learn R. In responding, it struck me that this is another way that learning a stat package is like learning a new language. The metaphor is extremely helpful for deciding when and how to learn a new stat package, and to keep you going when the going gets rough. learn more

The Four Stages of Statistical Skill

At The Analysis Factor, we are on a mission to help researchers improve their statistical skills so they can do amazing research. We all tend to think of “Statistical Analysis” as one big skill, but it’s not. Over the years of training, coaching, and mentoring data analysts at all stages, I’ve realized there are four fundamental stages of statistical skill. learn more

SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two

In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use? The default is to use whatever software they used in your statistics class–at least you know the basics. And this might turn out pretty well, but chances are it will fail you at some point. learn more

Do I Really Need to Learn R?

Do I really need to learn R? Someone asked me this recently. Many R advocates would absolutely say yes to everyone who asks. I don’t. It depends on what kind of work you do and the context in which you’re working. learn more

R Programming Video: 15 Tips for The Beginner

One of our instructors, David Lillis, recently gave a talk in front of the Wellington R Users Group highlighting 15 Tips for using the R statistical programming language aimed at the beginner. Below is a video recording of his presentation. learn more

Analyzing Repeated Measures Data: ANOVA and Mixed Model Approaches

When it comes to analyzing repeated measures data, there are three main approaches: Multivariate GLM (aka, repeated measures ANOVA), Marginal Model, and Mixed Model. Each has its own advantages and disadvantages. Depending on what repeated measures designs you’re dealing with, some approaches work better than others. learn more