Getting Started with Stata Tutorial #13: Changing variable labels using label, encode, and decode 

From the last posts in this series, you should feel comfortable using Stata’s data editor, changing values and types, and creating new variables.  

We’ll now teach you to make your variables more approachable by adding labels. 

The image below shows label information for the foreign variable.  

Note the different sections for “Label” and “Value label”.  

A label just provides a description of the variable, while a value label matches numbers to words.  

Foreign is a factor variable. This means that it is coded as an integer within Stata, and then assigned a value label that matches words to each of its values. In the image below, we can see that the value “0” is labeled with the word “Domestic”. 

How Value Labels Work in Stata

Value labels work in a different way from what I’d expect in Stata, and so it’s worth taking a moment to explain their structure.  

I assumed value labels would work like this: we tell Stata that for the variable foreign, it should consider 0’s to count as the word “Domestic,” and 1’s to count as the word “Foreign”. I would expect this change to only affect the variable foreign and be directly tied to that variable. 

Instead, value labels work like this: I create some object that gives a series of numbers and their paired words. This object is given a name, then is assigned to one or more variables.  

Built into the auto dataset is a value label called “origin” that assigns “Domestic” to 0 and “Foreign” to 1. 

label (Changes label of a variable or dataset)

The label command is used to add and manipulate labels on variables, datasets, and values.  

Let’s use the label command to assign “origin,” the value label used for foreign, to the rep78 variable 

     label values rep78 origin 

If you browse the variable now, you will see the two 1’s in the rep78 variable masked with the word “Foreign” 

Making New Value Labels

Now let’s define and assign a new value label! I’m going to make a value label for the headroom variable, attempting to assign labels to each value it takes. 

     label define head 1.5 “none” 2 “minimal” 2.5 “a bit” 3 “some” 3.5 “enough” 4 “plenty” 4.5 “lots” 5 “excessive” 

Stata tells us that we can’t label 1.5; only integers can have value labels. 

To fix this issue I’ll use a few of the data editing tricks from this series together. I’ll double all values, then change the type to be a byte 

     replace headroom = headroom*2 

     recast byte headroom 

Now let’s create and assign a new label 

     label define head 3 “none” 4 “minimal” 5 “a bit” 6 “some” 7 “enough” 8 “plenty” 9 “lots” 10 “excessive” 

     label values headroom head 

Look at the variable now that it has labels! 

To see all the value labels we’ve defined for this dataset, we can type 

     label list  

And view the output in the results window.  

Dataset and Variable Labels

Label also changes the non-value-labels for variables and datasets. 

To change the label of the whole dataset we can type 

     label data “car data” 

Then to change the label of just the variable turn, we can type 

     label variable turn “spinning” 

And we can view these changes in the properties window.

encode and decode (switches between factor and string)

When we used the tostring and destring commands

in the last post, they didn’t allow us to extract the text information from our factor variable, foreign. 

Encode and decode are useful commands to switch storage type between string and factor without losing the text data. 

We’ll start with decode. This command turns a numeric factor variable into a string, with its value labels as data. 

Decode doesn’t allow the replace option, so we’ll make a new variable called decodedF that takes the labels from foreign 

     decode foreign, generate(decodedF) 

We now have a variable called decodedF that looks the same as foreign, except it is a string. 

Now let’s use encode to change it back! This command takes a string variable and assigns it numbers and a value label so that it becomes a factor. 

     encode decodedF, generate(encodedF) 

While the original foreign variable has 0 and 1 as the values, encodedF has 1 and 2. 

 We see that by default, the value label for this variable has the same name as the variable itself. 

You could switch them to be 1 and 0 again by using recode to change 1 to 0 and 2 to 1, and using label to assign it the same value label that foreign uses.  

     recode encodedF (1=0) (2=1) 

     label values encodedF origin 

With labels and names wrapped up, you should feel confident in taking a messy dataset and making it clean! Remember to look at the help files if you get stuck, and check out our series of blog posts, trainings, and webinars on Stata if you want to deepen your understanding. 

 

by James Harrod


About the Author:
James Harrod interned at The Analysis Factor in the summer of 2023. He plans to continue into a career as an actuary, and hopes to continue finding interesting ways of educating people about statistics. James is well-versed in R and Stata programming and enjoys teaching the intuition behind common statistical methods.  James is a 2023 graduate of the University of Rochester with bachelor’s degrees in Statistics and Economics.

 

Getting Started with Stata
Jeff introduces you to the consistent structure that Stata uses to run every type of statistical analysis.

Reader Interactions


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.