*by David Lillis, Ph.D.*

In Part 3 and Part 4 we used the lm() command to perform least squares regressions. We saw how to check for non-linearity in our data by fitting polynomial models and checking whether they fit the data better than a linear model. Now let’s see how to fit an exponential model in R.

As before, we will use a data set of counts (atomic disintegration events that take place within a radiation source), taken with a Geiger counter at a nuclear plant.

The counts were registered over a 30 second period for a short-lived, man-made radioactive compound. We read in the data and subtract the background count of 623.4 counts per second in order to obtain

` (more…)`

*by David Lillis, Ph.D.*

In Part 1 we installed R and used it to create a variable and summarize it using a few simple commands. Today let’s re-create that variable and also create a second variable, and see what we can do with them.

As before, we take height to be a variable that describes the heights (in cm) of ten people. Type the following code to the R command line to create this variable.

height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)

Now let’s take bodymass to be a variable that describes the weight (in kg) of the same ten people. Copy and paste the following code to the R command line to create the bodymass variable.

bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

Both variables are now stored in the R workspace. To view them, enter:

height

bodymass

We can now create a simple plot of the two variables as follows:

plot(bodymass, height)

However, this is a rather simple plot and we can embellish it a little. Type the following code into the R workspace:

plot(bodymass, height, pch = 16, cex = 1.3, col = "red", main = "MY FIRST PLOT USING R", xlab = "Body Mass (kg)", ylab = "HEIGHT (cm)")

[Note: R is very picky about the quotation marks you use. If the font that is displaying this post shows the beginning and ending quotation marks as facing in different directions, it won’t work in R. They both have to look the same–just straight lines. You may have to retype them within R rather than cutting and pasting.]

In the above code, the syntax `pch = 16`

creates solid dots, while `cex = 1.3`

creates dots that are 1.3 times bigger than the default (where `cex = 1`

). More about these commands later.

Now let’s perform a linear regression on the two variables by adding the following text at the command line:

lm(height~bodymass)

We see that the intercept is 98.0054 and the slope is 0.9528. By the way – lm stands for “linear model”.

Finally, we can add a best fit line to our plot by adding the following text at the command line:

abline(98.0054, 0.9528)

None of this was so difficult!

In Part 3 we will look again at regression and create more sophisticated plots.

**About the Author:** *David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.*

See our full R Tutorial Series and other blog posts regarding R programming.

I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable.

The asker had a situation where Y, the response, was not normally distributed, but the residuals were.

**Quick Answer**: It’s just the residuals.

In fact, if you look at any (good) statistics textbook on linear models, you’ll see below the model, stating the assumptions: (more…)

If you’ve compared two textbooks on linear models, chances are, you’ve seen two different lists of assumptions.

I’ve spent a lot of time trying to get to the bottom of this, and I think it comes down to a few things.

1. There are four assumptions that are explicitly stated along with the model, and some authors stop there.

2. Some authors are writing for introductory classes, and rightfully so, don’t want to confuse students with too many abstract, and sometimes untestable, (more…)