R Is Not So Hard! A Tutorial, Part 15: Counting Elements in a Data Set

Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria.

b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b

Let’s count the 3s in the vector b.

count3 <- length(which(b == 3))
count3
[1] 4

In fact, you can count the number of elements that satisfy almost any given condition.

length(which(b < 7))
[1] 9

Here is an alternative approach, also using the length() command, but also using square brackets for sub-setting:

length(b[ b < 7 ])
[1] 9

The square brackets allow us to subset. For such operations using square brackets, I like to use the words “such that”. Here, we have the elements of b, such that the elements are less than 7.

R provides another alternative that not everyone knows about

sum(b < 7)
[1] 9

This syntax gives a count rather than a sum. Be aware of the meaning of syntax like sum(b < 7). Both work on logical vectors whose elements are either TRUE or FALSE. Try entering b <- 7 at the keyboard.

b < 7
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE

We see that sum(b < 7) counts the number of elements that are TRUE. There are nine such elements.

Now try:

mean(b < 7)
[1] 0.6923077

That syntax found the proportion of elements meeting the criterion rather than the mean. Again, if you use the sum() and mean() function you must be very careful to ensure that your output is what you intended. Note that sum(), length() and length(which()) all provide mechanisms for counting elements.

Now find the percentage of 7s in b.

P7 <- 100 * length(which(b == 7)) / length(b)
P7
[1] 15.38462

extension example

You can find counts and percentages using functions that involve length(which()). Here we create two functions; one for finding counts, and the other for
calculating percentages.

count <- function(x, n){ length((which(x == n))) }
perc <- function(x, n){ 100*length((which(x == n))) / length(x) }

Note the syntax involved in setting up a function in R. Now let’s use the count function to count the threes in the vector b.

count(b, 3)
[1] 4

perc(b, 4)
[1] 7.692308

That wasn’t so hard! In our next blog post we’ll discuss counting values within cases.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

See our full R Tutorial Series and other blog posts regarding R programming.

Getting Started with R

Kim discusses the use of R statistical software for data manipulation, calculation, and graphical display.

Comments

Rob Baer says

October 22, 2021 at 9:11 am

Missing Values
Just a note on using length() on a whole vector that includes NA. The missing values are counted in the whole vector length when using the length() function.

b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b1 <- c(b, NA)
length(b)
length(b1
sd(b)
sd(b1, na.rm = TRUE)

# If you want want an "n" to go with the sd for b1, don't use length().
(n = sum(!is.na(b))) #13
(n = sum(!is.na(b1))) # 13

Reply
Gastón says

August 21, 2020 at 8:43 am

Thnak you!!! I spent a lot time trying to get some instruction with this issue!

Reply
Sebas says

July 27, 2018 at 12:44 pm

Hi. How can I set the dimmensions of a matrix in 2 different variables instead of a vector?

Reply
Nathalie says

April 16, 2018 at 9:07 am

I am stucked with a string counting issue and could not find any helpful post so far maybe someone here can help me:

I have a string variable tours in my dataframe df that represents the different stops an individuum did during a journey.

For example:
1. home_work_leisure_home
2. home_work_shopping_work_home
3. home_work_leisure_errand_home

In Transport planning we group activities in primary (work and education) and secondary activities (everything else). I want to count the number of secondary activities before the first primary activity, inbetween two primary activities after the last primary activity for each tour.

This means I am looking for a function in R that:
a. identifies the first work in the string variable,
b. then counts the number of activities before this first work activity
c. then identifies the last work in the string if there is more than one
d. if there is then count the number of activities between the two work activities,
e. then count the number of activities after the last work activity

The result for the three example tours then would be:
1.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 2 (leisure & home)
number of primary activities: 1 (work)
2.number of activities before first primary: 1 (home)
number of activities between first and last primary: 1 (shopping)
number of activities after last primary: 1 (home)
number of primary activities: 2 (work)
3.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 3 (leisure, errand & home)
number of primary activities: 1 (work)

I would be super thankful if someone could give me a hand with this issue – even if it is a link to a similar question.

Tank you. Kind regards N

Reply
- Karen Grace-Martin says
  
  May 15, 2018 at 11:36 am
  
  Nathalie,
  
  I’m not the R expert, but I’ve done a lot of this kind of thing in other software. It sounds like this will be a multi-step process. The very first thing you need to do is split this into multiple variables.
  
  Reply
Pranjit Sarmah says

April 1, 2018 at 10:03 am

obj<-function(x,y,x_cat, y_val){
xx<-which(x==x_cat)
yy<-which(y==y_val)
return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
}

Reply
bhuvanesh says

September 18, 2016 at 10:48 am

how to provide more than 1 no. in which filter

Reply
Karol says

April 8, 2016 at 5:59 pm

Hi,
I have a data something like this:
X Y
A 1
A 2
B 1
B 2
B 3
C 1
…
I meen – X variable is a fator o k categories length and Y is a continous variable.
I’d like to compute a vector (let’s say Z) counting which observation of X (in each category) is Y… Something like ID for each category of X. Can You please give me some tip?
Thank You in advanced!
Karol

Reply
- Carla says
  
  December 1, 2017 at 3:27 pm
  
  Hi Karol, did you found a solution? I’m in the same situation :/
  Cheers/Carla
  
  Reply
- Pranjit Sarmah says
  
  April 1, 2018 at 10:04 am
  
  obj<-function(x,y,x_cat, y_val){
  xx<-which(x==x_cat)
  yy<-which(y==y_val)
  return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
  }
  
  Reply

R provides another alternative that not everyone knows about

extension example

Reader Interactions

Comments

Leave a Reply Cancel reply