Macros in Stata, Why and How to Use Them

We finished the last article about Stata with the confusing coding of:

local continuous educat exper wage age

foreach var in `continuous'{
graph box `var’, saving(`var’,replace)

I admit it looks like a foreign language.  Let me explain how simple it is to understand. Let’s look at the first line from above:  local continuous educatexper wage age

Stata allows you to use a single word, such as “continuous”, to represent many other words.  In Stata this process is known as a macro.

Please note that a macro in Stata is not the same as a macro in Microsoft Excel.

Writing macros in Excel can be long and involved.

Writing a macro in Stata is very easy.

How to write a simple macro in Stata

A macro in Stata begins with the word “global” or “local”.  The command global tells Stata to store everything in the command line in its memory until you exit Stata.  If you open another data set before exiting, the global macro will still be in memory.

The command local tells Stata to keep everything in the command line in memory only until the program or do-file ends.

If you plan on analyzing only one data set then the global command shouldn’t cause you any problems.

If you will be analyzing more than one data set then you might want to consider using the local command.

If you remember from the previous article, the wages data set had four continuous variables.

In the first line of my code above, local continuous educat exper wage age, I am using the word “continuous” to represent the four variables educat, exper, wage, and age.

Some commands in Stata allow you to analyze more than one variable at a time. For example, I might want to run the following commands in order to see what my continuous variables look like.

tab1 educat exper wage age
codebook educat exper wage age
summarize educat exper wage age

But why bother with a macro?

Using a macro allows me to simplify my work, which will reduce the potential for errors and keep it organized.

The code below will perform the exact same actions as the code above.

local continuous educat exper wage age

The command local tells Stata to use the word continuous to represent the variables educat, exper, wage, and age.  I then substitute the word continuous for the variable names in the the last lines.

Note that each time you use the word continuous in a command line you must begin with the forward slanting single quote key ` (to the left of the 1 key on your keyboard) and finish with the backward slanting single quote key ‘ (to the right of the ; key on your keyboard).

Using a macro to represent several variables may not seem like a big deal and why bother with it.  But wait, there is more.

For example, you might have to run numerous linear regressions, using several of the same predictor variables in each regression you run. After reviewing your results you might decide to eliminate one of the variables.

If you didn’t use a macro you will have to go back to every line of code and remove it. If you did use a macro you will only have to go back to the line of code for the macro and remove the variable from the group.

Remove it once and it is gone. Que facil!

Macros for formatting tables and graphs

Another great use for macros is for creating tables and graphs.  Unless you are Raymond (of Rain Man fame), there is no way you will remember your favorite options when creating a graph.

What are options for a graph?  To name just a few, the major and minor tick labels for the X and Y axis, the X and Y axis scale properties, whether to have a legend, placement of the legend, placement of the title, etc. etc.

This is an example of the coding for a graph that I once created:

ms(O) mc(gs0) msize(small)) (line populatn_hat d, sort lcolor(gs0)) (line populatn_hat_p d, sort lcolor(gs0) lpattern(dash)) (line populatn_hat_m d, sort lcolor(gs0) lpattern(dash)), xline(0, lcolor(gs0))  title(“Quadratic fit”, color(gs0))  ytitle(“Population in district”) legend(label(1 “Vote shares”) label(2 “Quadratic fit”))

It would sure be easier to use a one word macro to represent all of the above.

I suggest you create a do-file template for tables and graphs (see my previous article on creating do-file templates).

In the template you create various macros that contain the options for your tables and graphs.

Be sure to document (by using // or “*” as discussed in the previous article) in your do-file what the table or graph will look like.

Perhaps the best reason for using the macros for your table and graph formatting is it ensures consistent formatting.

If you decide you want to tweak how the tables in your research paper look, you will only need to make a change to the macro.  This saves you from having to go back to every table and change the coding.

In my next article I will show you another way to save time and effort in Stata through looping.


Getting Started with Stata
Jeff introduces you to the consistent structure that Stata uses to run every type of statistical analysis.

Reader Interactions


  1. Meghan Shirley Bezerra says

    Hi Jeff. Thanks for this helpful post.

    I’m working with a dataset with >100 continuous variables. I tried out just assigning 3 of these variables to a macro as you illustrated, so:

    local continuous age fevpp fasting_glucose

    But then when I tried:

    codebook `continuous’, compact
    summarize `continuous’

    it gave me all continuous variables in the dataset, no just the 3 specified in the macro.

    Any idea what I might have done wrong?


    • Jeff Meyer says

      There are 2 possible causes that I can think of. Do you have a global macro named continuous as well that contains all of the continuous variables in your data set? This shouldn’t be the cause because a global macro would require a “$” in fromt of continuous. The only other reason might be that you have to highlight all of the lines of code from the line local continuous age fevpp fasting_glucose to the line summarize `continuous’ in order to enact the local macro.

  2. Caitlin Klaassen says

    Hey all I am trying to update a global directory for a replication project but I am unsure how to do this could someone help ?

  3. Miranda says

    For macros and loops, is it possible to define large blocks of variables that are not contiguous in the dataset? As a simple example, let’s say I have a dataset that examines the look, feel, and taste of apples, bananas, and oranges with a varlist as follows:

    v1= apple look q1
    v2= apple look q2
    v3= apple look q3
    v4= apple feel q1
    v5= apple feel q2
    v6= apple feel q3
    v7= apple taste q1
    v8= apple taste q2
    v9= apple taste q3
    v10-18 = bananas
    v19-27 = oranges

    Lets say I’m only interested in taste, and I want to create a macro or a loop to route out miscoded cases among those variables. I want to create a single macro that includes the following variable groups: v7-v9, v16-v18, and v25-v27. Is there a straightforward way to do this without having to type out each variable?

  4. irina says

    thank you for such a simple explanation with great examples! Saved in my personal notes. Thank you sooooo much!

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.