From our last article, you should feel comfortable with the idea of editing and saving data sets in Stata. In this article, we’ll explain how to create new variables in Stata using replace, generate, egen, and clonevar.
From our last article, you should feel comfortable with the idea of editing and saving data sets in Stata. In this article, we’ll explain how to create new variables in Stata using replace, generate, egen, and clonevar.
Stata makes it a breeze to edit or clean your data. If you’re unfamiliar with using data sets in Stata, check out these blog posts to get a good grasp on importing and browsing data in Stata.
For this tutorial we will be using Stata’s “auto” data set. If you haven’t loaded it in yet, type
There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices made when coding missing values in a data set. Here are a few examples.
Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)
Survey questions are often structured without regard for ease of use within a statistical model.
Take for example a survey done by the Centers for Disease Control (CDC) regarding child births in the U.S. One of the variables in the data set is “interval since last pregnancy”. Here is a histogram of the results.
I recently opened a very large data set titled “1998 California Work and Health Survey” compiled by the Institute for Health Policy Studies at the University of California, San Francisco. There are 1,771 observations and 345 variables. (more…)
One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis.
In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.
A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)
A
[1] 3 2 NA 5 3 7 NA NA 5 2 6
We can re-code all missing values by another number (such as zero) as follows: (more…)