*by Jeff Meyer
*

Fortunately there are some really, really smart people who use Stata. Yes I know, there are really, really smart people that use SAS and SPSS as well.

But unlike SAS and SPSS users, Stata users benefit from *the contributions* made by really, really smart people. How so? Is Stata an “open source” software package?

Technically a commercial software package (software you have to pay for) cannot be open source. Based on that definition Stata, SPSS and SAS are not open source. R is open source.

*But*, because I have a Stata license (once you have it, it never expires) *I think of* Stata as being open source. This is because Stata allows members of the Stata community to share their expertise.

There are countless commands written by very, very smart non-Stata employees that are available to all Stata users.

Practically all of these commands, which are free, can be downloaded from the SSC (Statistical Software Components) archive. The SSC archive is maintained by the Boston College Department of Economics. The website is: https://ideas.repec.org/s/boc/bocode.html

There are over three thousand commands available for downloading. Below I have highlighted three of the 185 that I have downloaded.

** 1. coefplot** is a command written by Ben Jann of the Institute of Sociology, University of Bern, Bern, Switzerland. This command allows you to plot results from estimation commands.

In a recent post on diagnosing missing data, I ran two models comparing the observations that reported income versus the observations that did not report income, models 3d and 3e.

Using the coefplot command I can graphically compare the coefficients and confidence intervals for each independent variable used in the models.

The code and graph are:

*coefplot model_3d model_3e, drop(_cons) xline(0)*

Including the code ** xline(0)** creates a vertical line at zero which quickly allows me to determine whether a confidence interval spans both positive and negative territory.

I can also separate the predictor variables into individual graphs:

*coefplot model_3d || model_3e, yline(0) bycoefs vertical byopts(yrescale) ylabel(, labsize(vsmall))*

2. Nicholas Cox of Durham University and Gary Longton of the Fred Hutchinson Cancer Research Center created the command ** distinct**. This command generates a table with the count of distinct observations for each variable in the data set.

When getting to know a data set, it can be helpful to search for potential indicator, categorical and continuous variables. The ** distinct** command along with its

**and**

*min(#)***options allows an easy search for variables that fit into these categories.**

*max(#)*For example, to create a table of all variables with three to seven distinct observations I use the following code:

*distinct, min(3) max(7)*

In addition, the command generates the scalar r(ndistinct). In the workshop *Managing Data and Optimizing Output in Stata*, we used this scalar within a loop to create macros for continuous, categorical and indicator variables.

3. In a data set it is not uncommon to have outliers. There are primarily three options for dealing with outliers. We can keep them as they are, winsorize the observations (change their values), or delete them. Note, winsorizing and deleting observations can introduce statistical bias.

If you choose to winsorize your data I suggest you check out the command ** winsor2**. This was created by Lian Yujun of Sun Yat-Sen University, China. This command incorporates coding from the command

**created by Nicholas Cox and Judson Caskey.**

*winsor*The command creates a new variable, adding a suffix “_w” to the original variable’s name. The default setting changes observations whose values are less than the 1st percentile to the 1 percentile. Values greater than the 99th percentile are changed to equal the 99th percentile. Example:

** winsor2 salary **(makes changes at the 1

^{st}and 99

^{th}percentile for the variable “salary”)

The user has the option to change the values to the percentile of their choice.

** winsor2 salary, cuts(0.5 99.5) **(makes changes at the 0.5

^{st}and 99.5

^{th}percentile)

To add these three commands to your Stata software execute the following code and click on the links to download the commands:

*findit**coefplot
findit*

*distinct*

finditfindit

*winsor2*As shown in the December, 2015 free webinar “Stata’s Bountiful Help Resources”, you can also explore all the add-on commands via Stata’s “Help” menu. Go to “Help” => “SJ and User Written Commands” to explore.

*Jeff Meyer is a statistical consultant with The Analysis Factor, a stats mentor for Statistically Speaking membership, and a workshop instructor. Read more about Jeff here*.

{ 2 comments… read them below or add one }

It is very important for may current work

Thank you very much and keep it up!