Getting Started with Stata Tutorial #11: Editing Variables Using recode and recast

From our last posts in this series, you should be comfortable with how Stata handles data editing, as well as with making your own variables. In this post, we’ll talk about commands that edit the content or storage type of your variables in Stata: recode and recast. Let’s start off with the recode command.

We’ll be using Stata’s auto dataset. You can access it by typing into your command window

     sysuse auto,clear

recode (changes values of a variable subject to some rule)

Recode allows us to change the values of variables systematically and all at once.

recode syntax

Imagine we learned that the rep78 variable is miscoded, such that all missing values need to be 1’s and all other values need to be 1 unit higher.

This can all be changed in one line with recode

     recode rep78 (.=1) (1=2) (2=3) (3=4) (4=5) (5=6)

We start with naming the variable or variable we want to make changes to. Then in each set of parentheses we put the old value, and the new value we’re replacing it with.

If you remember our attempt to do this in the last post with generate and replace, that took 4 lines!

Let’s try another command with recode that’s a bit more complicated

Say we want to make a new variable that takes the updated value of rep78 and adds 1 to it if it’s foreign, otherwise it is set to be 3. We can do this in two lines of code

     recode rep78 (1=2) (2=3) (3=4) (4=5) (5=6) (6=7) if (foreign == 1), generate(newrep78)

     recode newrep78 (.=3)

Notice that in the first recode command, we use the original name of the variable, then add the generate option with a new name.

In the second recode command, we are making edits to the new variable, so we use its name.

Although foreign looks like a string, it is coded as a factor, which is why we used a mathematical expression for it. Check this blog post if you need a refresher on factors in Stata.

Since we used the generate option, no changes were made to the original rep78 variable. All the listed recodes apply to the variable “newrep78” that we generate.

We’ll show off one last use-case for recode: it allows us to change two or more variables simultaneously.

Let’s say we want to make variables based on price and weight which round the values to the nearest 1000, but drop observations higher than 5499. We want these new variables to have the names “roundprice” and “roundweight”

     recode price weight (500/1499 = 1000) (1500/2499 = 2000) (2500/3499 = 3000) (3500/4499 = 4000) (4500/5499 = 5000) (else = .), prefix(round)

This command tells us to take price and weight variables and round them to the nearest thousand unless they are over 5499 or under 500, in which case they’re dropped.

The option “prefix(round)” means that instead of replacing the old variables, we make new ones that have the same name, but with the word “round” in front.

recast (switches between types)

The recast command changes the storage type of one or more variables.

Stata recast syntax

While recast works with both strings and numeric data, it only changes numerics to other numerics, or strings to other types of strings.

If you tried to use recast to change make into an integer and price into a string, you would get this result:

recast in Stata

Even if you use the “force” option, recast won’t allow these types of changes; those need tostring and destring.

Recasting Numbers

Let’s do a task that recast is more suited for. Take the variable gear_ratio, and attempt to turn it into an integer

That didn’t work.

Because of the decimal points in gear_ratio, Stata knows we will lose precision by switching it to an integer. Let’s use the force option to override this.

     recast int gear_ratio, force

This switched it to an integer, truncating all values after the decimal point.

Recasting Strings

One of Stata’s unusual features is that the software uses different data types to store strings of different lengths.

If a variable is stored as str18, each of its observations must be 18 characters or fewer.

When you add new observations to a string that are longer than the current data format allows, the variable will automatically be promoted to a different type.

If we want to change a string variable to be stored in a different format, we can use the recast command.

Let’s change make from str18 to be str10.

Observations longer than 10 characters will be truncated. Since this would lose information, we must use force to make it go through

     recast str10 make, force

It tells us 53 values were changed

That finishes up our lesson on changing variables in Stata. You should feel confident using Stata syntax to change how numeric variables are stored and changing their values.

In the next post we’ll talk about commands that change variables to and from Strings.

by James Harrod


About the Author:
James Harrod interned at The Analysis Factor in the summer of 2023. He plans to continue into a career as an actuary, and hopes to continue finding interesting ways of educating people about statistics. James is well-versed in R and Stata programming and enjoys teaching the intuition behind common statistical methods.  James is a 2023 graduate of the University of Rochester with bachelor’s degrees in Statistics and Economics.

Getting Started with Stata
Jeff introduces you to the consistent structure that Stata uses to run every type of statistical analysis.

Reader Interactions


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.