R Is Not So Hard! A Tutorial, Part 17: Testing for Existence of Particular Values

Sometimes you need to know if your data set contains elements that meet some criterion or a particular set of criteria.

For example, a common data cleaning task is to check if you have missing data (NAs) lurking somewhere in a large data set.

Or you may need to check if you have zeroes or negative numbers, or numbers outside a given range.

In such cases, the any() and all() commands are very helpful. You can use them to interrogate R about the values in your data.

Test for the existence of particular values using the any() command

b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b
[1] 7 2 4 3 -1 -2 3 3 6 8 12 7 3

any(b == -4)
[1] FALSE

any(b < 5)
[1] TRUE

Both commands work on logical vectors. Use any() to check for missing data in a vector or an array

d <- c(3, 2, NA, 5, 6, NA)
d
[1] 3 2 NA 5 6 NA

any(is.na(d))
[1] TRUE

Of course, we can check for non-missing data too.

any(!is.na(d))
[1] TRUE

The any() command is helpful when checking for particular values in large data sets.

You can use the all() command to check whether all elements in a given vector or array satisfy a particular condition. For example, let’s see whether all non-missing values in d are less than 5. Here we note noting that the command is.na() identifies missing data and that the syntax !is.na() identifies non-missing data.

all(d[!is.na(d)] < 5)
[1] FALSE

Now check whether all non-missing elements are less than 7.

all(d[!is.na(d)] < 7)
[1] TRUE

The syntax above looks formidable. However, is.na() identifies missing elements by creating a logical vector whose elements are either TRUE or FALSE.

is.na(d)
[1] FALSE FALSE TRUE FALSE FALSE TRUE

The syntax !is.na(d) gives the opposite logical vector and counts non-missing data. Then, d[!is.na(d)] gives the elements of d that are-non missing. Finally, we apply the all() command, and include the condition that all elements are less than 7.

That wasn’t so hard! In our next blog post we’ll learn about re-coding values in R.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

See our full R Tutorial Series and other blog posts regarding R programming.

 

Getting Started with R
Kim discusses the use of R statistical software for data manipulation, calculation, and graphical display.

Reader Interactions


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.