• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Stata Loops and Macros for Large Data Sets: Quickly Finding Needles in the Hay Stack

by Jeff Meyer 1 Comment

by Jeff Meyer

I recently opened a very  large data set titled “1998 California Work and Health Survey” compiled by the Institute for Health Policy Studies at the University of California, San Francisco. There are 1,771 observations and 345 variables.

I know Californians are supposed to be “laid back” (I’m a native Californian). But can you imagine agreeing to take a survey and then be asked 345 questions? Dude!

I looked at the original questionnaire and noticed that all “yes/no” questions were coded 1 for yes and 2 for no. Unfortunately indicator (dummy) variables have to be coded 0,1. Typically no is coded 0 and yes is coded 1.

The question of the day is, how can I quickly locate all of the dichotomous variables in a data set with 345 variables so that I can recode the values?

Using macros and loops makes it quite easy.

The first step is to create a macro with no entries. I created a global macro named “dichot”. Next I started my loop with the foreach command, telling Stata to look one by one at all of the variables in the data set.

I tell Stata to summarize the first variable in the list. If you recall from my previous blogs on stored results, Stata temporarily stores results when it performs a calculation. Two of the results that the summarize command stores are a variable’s minimum and maximum values.

Next I tell Stata to add the variable to my global macro if the minimum value is equal to 1 and the maximum value is equal to 2.  I do this by creating a loop within a loop.

Stata then repeats these steps for the remaining variables in the list.

From start to finish my code looks like this:
global dichot
foreach v of var * {
summarize `v’, meanonly
if r(min) == 1 & r(max) == 2 {
global dichot $dichot `v’
}
}

To look at the variables in my global macro and make sure they all have minimum values of 1, maximum values of 2 and only 2 distinct numbers I use the following code:
codebook $dichot ,compact

I used eight lines of code to discover that there are 96 dichotomous variables in the data set.

Because they are listed in my global macro, I can quickly recode all 96 of them with one line of  code:
recode $dichot (2=0)

I could have put the recode command in my loop but I wanted to review my variables before recoding them.


Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here.

Getting Started with Stata
Jeff introduces you to the consistent structure that Stata uses to run every type of statistical analysis.

Tagged With: compact, recode, Stata, SUMMARIZE

Related Posts

  • Using the Collapse Command in Stata
  • Using Stored Calculations in Stata to Center Predictors: an Example
  • Loops in Stata: Making coding easy
  • Statistical Software Access From Home

Reader Interactions

Comments

  1. Erick Axxe says

    February 7, 2019 at 11:13 am

    Thanks so much for all the work you do on this blog and the Stata help forum! It’s been very helpful.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: Analyzing Pre-Post Data

Upcoming Free Webinars

Poisson and Negative Binomial Regression Models for Count Data

Upcoming Workshops

  • Analyzing Count Data: Poisson, Negative Binomial, and Other Essential Models (Jul 2022)
  • Introduction to Generalized Linear Mixed Models (Jul 2022)

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT