• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

R Is Not So Hard! A Tutorial, Part 15: Counting Elements in a Data Set

by guest contributer 10 Comments

by David Lillis, Ph.D.

Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria.

b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b

Let’s count the 3s in the vector b.

count3 <- length(which(b == 3))
count3
[1] 4

In fact, you can count the number of elements that satisfy almost any given condition.

length(which(b < 7))
[1] 9

Here is an alternative approach, also using the length() command, but also using square brackets for sub-setting:

length(b[ b < 7 ])
[1] 9

The square brackets allow us to subset. For such operations using square brackets, I like to use the words “such that”. Here, we have the elements of b, such that the elements are less than 7.

 

R PROVIDES ANOTHER ALTERNATIVE THAT NOT EVERYONE KNOWS ABOUT

sum(b < 7)
[1] 9

This syntax gives a count rather than a sum. Be aware of the meaning of syntax like sum(b < 7). Both work on logical vectors whose elements are either TRUE or FALSE. Try entering b <- 7 at the keyboard.

b < 7
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE

We see that sum(b < 7) counts the number of elements that are TRUE. There are nine such elements.

Now try:

mean(b < 7)
[1] 0.6923077

That syntax found the proportion of elements meeting the criterion rather than the mean. Again, if you use the sum() and mean() function you must be very careful to ensure that your output is what you intended. Note that sum(), length() and length(which()) all provide mechanisms for counting elements.

Now find the percentage of 7s in b.

P7 <- 100 * length(which(b == 7)) / length(b)
P7
[1] 15.38462

 

EXTENSION EXAMPLE

You can find counts and percentages using functions that involve length(which()). Here we create two functions; one for finding counts, and the other for
calculating percentages.

count <- function(x, n){ length((which(x == n))) }
perc <- function(x, n){ 100*length((which(x == n))) / length(x) }

Note the syntax involved in setting up a function in R. Now let’s use the count function to count the threes in the vector b.

count(b, 3)
[1] 4

perc(b, 4)
[1] 7.692308

To see the rest of the R is Not So Hard! tutorial series, visit our R Resource page.

About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.

Bookmark and Share

Getting Started with R
Kim discusses the use of R statistical software for data manipulation, calculation, and graphical display.

Tagged With: count, data manipulation, R

Related Posts

  • Member Training: What’s the Best Statistical Package for You?
  • The Advantages of RStudio
  • What Really Makes R So Hard to Learn?
  • R is Not So Hard! A Tutorial, Part 22: Creating and Customizing Scatter Plots

Reader Interactions

Comments

  1. Rob Baer says

    October 22, 2021 at 9:11 am

    Missing Values
    Just a note on using length() on a whole vector that includes NA. The missing values are counted in the whole vector length when using the length() function.

    b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
    b1 <- c(b, NA)
    length(b)
    length(b1
    sd(b)
    sd(b1, na.rm = TRUE)

    # If you want want an "n" to go with the sd for b1, don't use length().
    (n = sum(!is.na(b))) #13
    (n = sum(!is.na(b1))) # 13

    Reply
  2. Gastón says

    August 21, 2020 at 8:43 am

    Thnak you!!! I spent a lot time trying to get some instruction with this issue!

    Reply
  3. Sebas says

    July 27, 2018 at 12:44 pm

    Hi. How can I set the dimmensions of a matrix in 2 different variables instead of a vector?

    Reply
  4. Nathalie says

    April 16, 2018 at 9:07 am

    I am stucked with a string counting issue and could not find any helpful post so far maybe someone here can help me:

    I have a string variable tours in my dataframe df that represents the different stops an individuum did during a journey.

    For example:
    1. home_work_leisure_home
    2. home_work_shopping_work_home
    3. home_work_leisure_errand_home

    In Transport planning we group activities in primary (work and education) and secondary activities (everything else). I want to count the number of secondary activities before the first primary activity, inbetween two primary activities after the last primary activity for each tour.

    This means I am looking for a function in R that:
    a. identifies the first work in the string variable,
    b. then counts the number of activities before this first work activity
    c. then identifies the last work in the string if there is more than one
    d. if there is then count the number of activities between the two work activities,
    e. then count the number of activities after the last work activity

    The result for the three example tours then would be:
    1.number of activities before first primary: 1 (home)
    number of activities between first and last primary: 0
    number of activities after last primary: 2 (leisure & home)
    number of primary activities: 1 (work)
    2.number of activities before first primary: 1 (home)
    number of activities between first and last primary: 1 (shopping)
    number of activities after last primary: 1 (home)
    number of primary activities: 2 (work)
    3.number of activities before first primary: 1 (home)
    number of activities between first and last primary: 0
    number of activities after last primary: 3 (leisure, errand & home)
    number of primary activities: 1 (work)

    I would be super thankful if someone could give me a hand with this issue – even if it is a link to a similar question.

    Tank you. Kind regards N

    Reply
    • Karen Grace-Martin says

      May 15, 2018 at 11:36 am

      Nathalie,

      I’m not the R expert, but I’ve done a lot of this kind of thing in other software. It sounds like this will be a multi-step process. The very first thing you need to do is split this into multiple variables.

      Reply
  5. Pranjit Sarmah says

    April 1, 2018 at 10:03 am

    obj<-function(x,y,x_cat, y_val){
    xx<-which(x==x_cat)
    yy<-which(y==y_val)
    return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
    }

    Reply
  6. bhuvanesh says

    September 18, 2016 at 10:48 am

    how to provide more than 1 no. in which filter

    Reply
  7. Karol says

    April 8, 2016 at 5:59 pm

    Hi,
    I have a data something like this:
    X Y
    A 1
    A 2
    B 1
    B 2
    B 3
    C 1
    …
    I meen – X variable is a fator o k categories length and Y is a continous variable.
    I’d like to compute a vector (let’s say Z) counting which observation of X (in each category) is Y… Something like ID for each category of X. Can You please give me some tip?
    Thank You in advanced!
    Karol

    Reply
    • Carla says

      December 1, 2017 at 3:27 pm

      Hi Karol, did you found a solution? I’m in the same situation :/
      Cheers/Carla

      Reply
    • Pranjit Sarmah says

      April 1, 2018 at 10:04 am

      obj<-function(x,y,x_cat, y_val){
      xx<-which(x==x_cat)
      yy<-which(y==y_val)
      return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
      }

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT