R Is Not So Hard! A Tutorial, Part 15: Counting Elements in a Data Set

by guest

by David Lillis, Ph.D.

Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria.

b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b

Let’s count the 3s in the vector b.

count3 <- length(which(b == 3))
count3
[1] 4

In fact, you can count the number of elements that satisfy almost any given condition.

length(which(b < 7))
[1] 9

Here is an alternative approach, also using the length() command, but also using square brackets for sub-setting:

length(b[ b < 7 ])
[1] 9

The square brackets allow us to subset. For such operations using square brackets, I like to use the words “such that”. Here, we have the elements of b, such that the elements are less than 7.

 

R PROVIDES ANOTHER ALTERNATIVE THAT NOT EVERYONE KNOWS ABOUT

sum(b < 7)
[1] 9

This syntax gives a count rather than a sum. Be aware of the meaning of syntax like sum(b < 7). Both work on logical vectors whose elements are either TRUE or FALSE. Try entering b <- 7 at the keyboard.

b < 7
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE

We see that sum(b < 7) counts the number of elements that are TRUE. There are nine such elements.

Now try:

mean(b < 7)
[1] 0.6923077

That syntax found the proportion of elements meeting the criterion rather than the mean. Again, if you use the sum() and mean() function you must be very careful to ensure that your output is what you intended. Note that sum(), length() and length(which()) all provide mechanisms for counting elements.

Now find the percentage of 7s in b.

P7 <- 100 * length(which(b == 7)) / length(b)
P7
[1] 15.38462

 

EXTENSION EXAMPLE

You can find counts and percentages using functions that involve length(which()). Here we create two functions; one for finding counts, and the other for
calculating percentages.

count <- function(x, n){ length((which(x == n))) }
perc <- function(x, n){ 100*length((which(x == n))) / length(x) }

Note the syntax involved in setting up a function in R. Now let’s use the count function to count the threes in the vector b.

count(b, 3)
[1] 4

perc(b, 4)
[1] 7.692308

To see the rest of the R is Not So Hard! tutorial series, visit our R Resource page.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

Bookmark and Share

{ 8 comments… read them below or add one }

Sebas

Hi. How can I set the dimmensions of a matrix in 2 different variables instead of a vector?

Reply

Nathalie

I am stucked with a string counting issue and could not find any helpful post so far maybe someone here can help me:

I have a string variable tours in my dataframe df that represents the different stops an individuum did during a journey.

For example:
1. home_work_leisure_home
2. home_work_shopping_work_home
3. home_work_leisure_errand_home

In Transport planning we group activities in primary (work and education) and secondary activities (everything else). I want to count the number of secondary activities before the first primary activity, inbetween two primary activities after the last primary activity for each tour.

This means I am looking for a function in R that:
a. identifies the first work in the string variable,
b. then counts the number of activities before this first work activity
c. then identifies the last work in the string if there is more than one
d. if there is then count the number of activities between the two work activities,
e. then count the number of activities after the last work activity

The result for the three example tours then would be:
1.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 2 (leisure & home)
number of primary activities: 1 (work)
2.number of activities before first primary: 1 (home)
number of activities between first and last primary: 1 (shopping)
number of activities after last primary: 1 (home)
number of primary activities: 2 (work)
3.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 3 (leisure, errand & home)
number of primary activities: 1 (work)

I would be super thankful if someone could give me a hand with this issue – even if it is a link to a similar question.

Tank you. Kind regards N

Reply

Karen Grace-Martin

Nathalie,

I’m not the R expert, but I’ve done a lot of this kind of thing in other software. It sounds like this will be a multi-step process. The very first thing you need to do is split this into multiple variables.

Reply

Pranjit Sarmah

obj<-function(x,y,x_cat, y_val){
xx<-which(x==x_cat)
yy<-which(y==y_val)
return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
}

Reply

bhuvanesh

how to provide more than 1 no. in which filter

Reply

Karol

Hi,
I have a data something like this:
X Y
A 1
A 2
B 1
B 2
B 3
C 1

I meen – X variable is a fator o k categories length and Y is a continous variable.
I’d like to compute a vector (let’s say Z) counting which observation of X (in each category) is Y… Something like ID for each category of X. Can You please give me some tip?
Thank You in advanced!
Karol

Reply

Carla

Hi Karol, did you found a solution? I’m in the same situation :/
Cheers/Carla

Reply

Pranjit Sarmah

obj<-function(x,y,x_cat, y_val){
xx<-which(x==x_cat)
yy<-which(y==y_val)
return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
}

Reply

Leave a Comment

Please note that, due to the large number of comments submitted, any comments on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Previous post:

Next post: