• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Stata Loops and Macros for Large Data Sets: Quickly Finding Needles in the Hay Stack

by Jeff Meyer 1 Comment

by Jeff Meyer

I recently opened a very  large data set titled “1998 California Work and Health Survey” compiled by the Institute for Health Policy Studies at the University of California, San Francisco. There are 1,771 observations and 345 variables.

I know Californians are supposed to be “laid back” (I’m a native Californian). But can you imagine agreeing to take a survey and then be asked 345 questions? Dude!

I looked at the original questionnaire and noticed that all “yes/no” questions were coded 1 for yes and 2 for no. Unfortunately indicator (dummy) variables have to be coded 0,1. Typically no is coded 0 and yes is coded 1.

The question of the day is, how can I quickly locate all of the dichotomous variables in a data set with 345 variables so that I can recode the values?

Using macros and loops makes it quite easy.

The first step is to create a macro with no entries. I created a global macro named “dichot”. Next I started my loop with the foreach command, telling Stata to look one by one at all of the variables in the data set.

I tell Stata to summarize the first variable in the list. If you recall from my previous blogs on stored results, Stata temporarily stores results when it performs a calculation. Two of the results that the summarize command stores are a variable’s minimum and maximum values.

Next I tell Stata to add the variable to my global macro if the minimum value is equal to 1 and the maximum value is equal to 2.  I do this by creating a loop within a loop.

Stata then repeats these steps for the remaining variables in the list.

From start to finish my code looks like this:
global dichot
foreach v of var * {
summarize `v’, meanonly
if r(min) == 1 & r(max) == 2 {
global dichot $dichot `v’
}
}

To look at the variables in my global macro and make sure they all have minimum values of 1, maximum values of 2 and only 2 distinct numbers I use the following code:
codebook $dichot ,compact

I used eight lines of code to discover that there are 96 dichotomous variables in the data set.

Because they are listed in my global macro, I can quickly recode all 96 of them with one line of  code:
recode $dichot (2=0)

I could have put the recode command in my loop but I wanted to review my variables before recoding them.


Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here.

Unlocking the Power of Stata's Macros and Loops
Learn to run lengthy, repetitive tasks in Stata quickly and easily by setting up these two useful Stata tools in a do-file.

Tagged With: compact, recode, Stata, SUMMARIZE

Related Posts

  • Using the Collapse Command in Stata
  • Tricks for Using Word to Make Statistical Syntax Easier
  • Using the Same Sample for Different Models in Stata
  • Using Stored Calculations in Stata to Center Predictors: an Example

Reader Interactions

Comments

  1. Erick Axxe says

    February 7, 2019 at 11:13 am

    Thanks so much for all the work you do on this blog and the Stata help forum! It’s been very helpful.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

Free Webinars

Effect Size Statistics on Tuesday, Feb 2nd

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.