Linear Models in R: Diagnosing Our Regression Model

by David Lillis

by David Lillis, Ph.D.

Last time we created two variables and added a best-fit regression line to our plot of the variables. Here are the two variables again.

height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)
bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. As before, we perform the regression.

lm(height ~ bodymass)
Call:
lm(formula = height ~ bodymass)

Coefficients:
(Intercept)     bodymass 
   98.0054       0.9528 

Now we can use several R diagnostic plots and influence statistics to diagnose how well our model is fitting the data.  In our next article we’ll eliminate an outlier to see how this changes the model fit.

These diagnostics include:

  1. Residuals vs. fitted values
  2. Q-Q plots
  3. Scale Location plots
  4. Cook’s distance plots.

To use R’s regression diagnostic plots, we set up the regression model as an object and create a plotting environment of two rows and two columns. Then we use the plot() command, treating the model as an argument.

model <- lm(height ~ bodymass)

     par(mfrow = c(2,2))
plot(model)

tn_image001

The first plot (residuals vs. fitted values) is a simple scatterplot between residuals and predicted values.  It should look more or less random.

This is more or less what what we see here, with the exception of a single outlier in the bottom right corner.

The second plot (normal Q-Q) is a normal probability plot.  It will give a straight line if the errors are distributed normally, but points 4, 5 and 6 deviate from the straight line.

The third plot (Scale-Location), like the the first, should look random.  No patterns.  Ours does not–we have a strange V-shaped pattern.

The last plot (Cook’s distance) tells us which points have the greatest influence on the regression (leverage points). We see that points 2, 4 and 6 have great influence on the model.

Next time we will see what happens to the model if we remove one of these outliers..

See our full R Tutorial Series and other blog posts regarding R programming.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

Bookmark and Share

{ 2 comments… read them below or add one }

igoR87

I do not understand this sentence:
“Ours does not–we have a strange V-shaped pattern.”
I see a clear V-shaped pattern in the scla-location plot.

Reply

Karen Grace-Martin

Igor, “Ours does not” refers to the previous sentence. “Ours does not have no pattern and look random.”

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: