R Graphics: Plotting in Color with qplot

In this lesson, let’s see how to use qplot to map symbol colour to a categorical variable.

Copy in the following data set (a medical data set relating to patients in a randomised controlled trial):

M <- structure(list(PATIENT = c("Mary","Dave","Simon","Steve","Sue","Frida","Magnus","Beth","Peter","Guy","Irina","Liz"),
GENDER = c("F","M","M","M","F","F","M","F","M","M","F","F"),
TREATMENT = c("A","B","C","A","A","B","A","C","A","C","B","C"),
AGE =c("Y","M","M","E","M","M","E","E","M","E","M","M"),
WEIGHT_1 = c(79.2,58.8,72.0,59.7,79.6,83.1,68.7,67.6,79.1,39.9,64.7,65.6),
WEIGHT_2 = c(76.6,59.3,70.1,57.3,79.8,82.3,66.8,67.4,76.8,41.4,65.3,63.2),
HEIGHT = c(169,161,175,149,179,177,175,170,177,138,170,165),
SMOKE = c("Y","Y","N","N","N","N","N","N","N","N","N","Y"),
RECOVER = c(1,0,1,1,1,0,1,1,1,1,0,1)),
class = "data.frame", row.names = 1:12)

1     Mary      F         A   Y     79.2     76.6    169     Y     TRUE       1
2     Dave      M         B   M     58.8     59.3    161     Y    FALSE       0
3    Simon      M         C   M     72.0     70.1    175     N    FALSE       1
4    Steve      M         A   E     59.7     57.3    149     N    FALSE       1
5      Sue      F         A   M     79.6     79.8    179     N     TRUE       1
6    Frida      F         B   M     83.1     82.3    177     N    FALSE       0
7   Magnus      M         A   E     68.7     66.8    175     N    FALSE       1
8     Beth      F         C   E     67.6     67.4    170     N     TRUE       1
9    Peter      M         A   M     79.1     76.8    177     N     TRUE       1
10     Guy      M         C   E     39.9     41.4    138     N    FALSE       1
11   Irina      F         B   M     64.7     65.3    170     N    FALSE       0
12     Liz      F         C   M     65.6     63.2    165     Y     TRUE       1


Now we create a scatterplot of patient height against weight before treatment. We map both symbol size and shape to GENDER using factor() . Enter the following syntax:

qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(5, 7))

Note how we mapped symbol size and colour to GENDER using the syntax:

size = factor(GENDER) and color = factor(GENDER)

Also note how we controlled symbol size using the layer:

+ scale_size_manual(values = c(5, 7))

In this example I have chosen symbol sizes of 5 and 7. You may select different sizes, depending on your preferences. Very quickly you will gain experience and select the symbol sizes that suit your graphs best. Of course you can experiment with the above syntax yourselves, each time changing the symbol size values. For example:

qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(2, 9))


The difference in point sizes is now rather extreme, but you now see how to control symbol size. Soon we will learn how to control symbol colour too.

That wasn’t so hard! in our next blog post we will learn the rest of what we need to colour scatterplots in qplot.

About the Author:
David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

See our full R Tutorial Series and other blog posts regarding R programming.


Getting Started with R
Kim discusses the use of R statistical software for data manipulation, calculation, and graphical display.

Reader Interactions


  1. Ife N says

    How would you change the assigned colours. Right now male is blue and female is pink. How could you change the colours to something else like green and purple or something.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.