From the last post in this series, you should know how to change between numeric types and easily change numeric data. We’ll now expand your type-changing skills to include changing string variables with two new commands.
tostring and destring (switch between string and numeric types)
tostring takes one or more numeric variables and turns them into strings.
destring takes one or more string variables and attempts to convert them to numeric. I say attempts, because If a variable contains non-numeric data, it won’t be converted.
For either of these, you must choose replace or generate, based on if you want it to override the old variable or make a new one.
An Example of Changing Numeric Variables to String
To show off these commands, let’s get a fresh version of the auto data:
sysuse auto, clear
Let’s now take the foreign variable and gear_ratio variable and see what happens if we try to turn them into strings.
The foreign variable looks like a string already because it is a factor with “Domestic” marked for 0, and “Foreign” marked for 1. But the actual values of factors are technically numbers in Stata, so we can use tostring. Type
tostring foreign gear_ratio, gen (foreignString gearString)
Stata didn’t give us an error, but it also didn’t do what we hoped for.
Stata did convert foreign to a string, but not in the way we wanted. It used the numeric value of the factor as the string, rather than the label (observations that used to say “Domestic” now say “0” in the new variable).
For gear_ratio, it worked. But Stata is telling us that we would lose some of the formatting information such that we couldn’t change it exactly back by using destring.
If Stata tells us that we’d lose information from converting a type, we can override this by using the force option like so:
tostring gear_ratio, gen (gearString) force
Our new variables look like this next to our old variables. We can tell they’re strings because of the red color.
And Back Again
Now let’s change them back with destring!
This time we can choose the replace option:
destring gearString foreignString, replace
Look at how the variables don’t appear identical after we used tostring and destring together.
Be careful when switching types in Stata; these types of data loss will often arise.
gearString switched from a float to a double and encountered some rounding errors. foreignString completely lost its value labels.
To fix the issues with gearString, we can use the recast command. To fix the issues with foreignString, we can use the label command.
The recast command was discussed in the previous article, and we’ll go over the label command in the next one.
Finally, let’s try to destring a variable that doesn’t have numbers. Type
destring make, replace
Stata tells us that make was not changed because it has nonnumeric characters. If we tried to force it, we’d just get 74 missing values.
That should be all you need to turn variables to and from strings! In our next post, we will go over Stata labels for variables, datasets, and values.
Leave a Reply