# scatterplot

### Getting Started with Stata Tutorial #3: the Graphics Menu

September 11th, 2023 by

In part 2 of this series, we got started on the various menus in Stata. This post covers an important menu that you’ll probably use often: the graphics menu.

### What’s in the Graphics menu

The graphics menu provides an impressive variety of options for creating just about any graph you might need.

Take a look at the menu. It includes everything from univariate graphs like bar charts and pie charts to more complex, multivariate plots. Go ahead and explore some of the graphs available in the menu.

A comprehensive resource for a full understanding of the graphics you can do in Stata is the Stata Graphics Reference Manual, which is a free pdf download from the Stata web site. At nearly 800 pages, though, it’s not a quick read (it is excellent, though!).

A much quicker read is the Stata Data Visualization Cheat Sheet. Pages 5 – 6.

Browsing this two-page resource will tell you a lot about what you can do in Stata graphics. This includes not only which kinds of graphs you can create, but how to customize a graph’s appearance, apply themes, and save plots.

But first let’s explore how easy it is to create a simple, but customized plot using only the menus.

### An Example of creating a Scatter plot using menus

To show an example, we’ll use the auto data. If you haven’t loaded up the data in your current session, type the following into your command line

`sysuse auto`

Note that you could also open this data set using the File menu, but this is a command that is so simple, it’s faster to just type it into the command line.

As you’ll see, every time you use the menus, Stata fills in the associated commands for you into the command line

Now say we want to make a scatter plot with price on the y axis, and mpg on the x axis, but only for observations where the gear ratio is less than 3. We want this graph to have red triangles representing points, and we want it to have informative titles:

We click on Graphics -> Twoway graph. In the plots window click Create and select Basic plots -> Scatter.

Choose price as the y variable and mpg as the x variable; don’t press accept yet.

Under Marker properties choose Triangle as the symbol and Red as the color. Also notice you can also change the size or opacity of points or mark particular observations.

Click accept, then click accept on the next page.

Under the if/in tab, type “`gear_ratio<3`” so only observations with a gear ratio less than 3 are plotted; click on Y axis.

Under the Y axis tab type the title “Price”, and under the X axis tab type the title “MPG”. Note how you can also change properties of the axis.

In the Titles tab type an appropriate title for the graph – I chose “Price and Mileage”.

We don’t want to see a legend so in the Legend tab choose “hide legend”.

You can now press ok and should see the following graph:

You’ll also see that Stata put this code put to the console:

`twoway (scatter price mpg, mcolor(red) msymbol(triangle)) if gear_ratio<3, ytitle(`"Price"') xtitle(`"MPG"') title(`"Price and Mileage"') legend(off)`

Now you know how to make this graph and similar ones with syntax as well! If you’re ever having trouble creating a certain chart using code, the graph menu can provide an easier way to select the options you want.

Note: when you make plots in Stata menus, make sure to always make a new plot rather than layering on an old one. If you press create when you already have a plot selected, your new scatterplot will layer on top of the old one.

by James Harrod

Getting Started with Stata Tutorial #4: Do-files

James Harrod interned at The Analysis Factor in the summer of 2023. He plans to continue into a career as an actuary, and hopes to continue finding interesting ways of educating people about statistics. James is well-versed in R and Stata programming and enjoys teaching the intuition behind common statistical methods. James is a 2023 graduate of the University of Rochester with bachelor’s degrees in Statistics and Economics.

### Member Training: Preparing to Use (and Interpret) a Linear Regression Model

November 1st, 2020 by

You think a linear regression might be an appropriate statistical analysis for your data, but you’re not entirely sure. What should you check before running your model to find out?

### Member Training: Practical Suggestions for Improving Your Scatterplots

April 1st, 2020 by

The scatterplot is a simple display of the relationship between two, or sometimes three, variables. You have a wide range of options for displaying a scatterplot. In particular, you can control the location, size, shape, and color of the points in your scatterplot.

### Ways to Customize a Scatter Plot in R Commander

October 21st, 2015 by

I mentioned in my last post that R Commander can do a LOT of data manipulation, data analyses, and graphs in R without you ever having to program anything.

Here I want to give you some examples, so you can see how truly useful this is.

Let’s start with a simple scatter plot between Time and the number of Jobs (in thousands) in 67 counties.  Time is measured in decades since 1960.

The green line is the best fit linear regression line.

This wasn’t the default in R Commander (I actually had to remove a few things to get to this), but it’s a useful way to start out.

A few ways we can easily customize this graph:

### Jittering

We see here a common issue in scatter plots–because the X values are discrete, the points are all on top of each other.

It’s difficult to tell just how many points there are at the bottom of the graph–it’s just a mass of black.

One great way to solve this is by jittering the points.

All this means is that instead of putting identical points right on top of each other, we move it slightly, randomly, in either one or both directions.  In this example, I jittered only horizontally:

So while the points aren’t graphed exactly where they are, we can see the trends and we can now see how many points there are in each decade.

How hard is this to do in R Commander? One click:

### Regression Lines by Group

Another useful change to a scatter plot is to add a separate regression line to the graph based on some sort of factor in the data set.

In this example, the observations are measured for counties and each county is classified as being either Rural or Metropolitan.

If we’d like to see if the growth in jobs over time is different in Rural and Metropolitan counties, we need a separate line for each group.

In R Commander we can do this quite easily.  Not only do we get two regression lines, but each point is clearly designated as being from either a Rural or Metropolitan county through its color and shape.

It’s quite clear that not only was there more growth in the number of jobs in Metro counties, there was almost no change at all in the Rural counties.

And once again, how difficult is this?  This time, two clicks.

There are quite a few modifications you can make just using the buttons, but of course, R Commander doesn’t do everything.

For example, I could not figure out how to change those red triangles to green rectangles through the menus.

But that’s the best part about R Commander.  It works very much like the Paste button in SPSS.

Meaning, it creates the code for you.   So I can take the code it created, then edit it to get my graph looking the way I want.

I don’t have to memorize which command creates a scatter plot.

I don’t have to memorize how to pull my SPSS data into R or tell R that Rural is a factor.  I can do all that through R Commander, then just look up the option to change the color and shape of the red triangles.

### What R Commander Can do in R Without Coding–More Than You Would Think

October 19th, 2015 by

I received a question recently about R Commander, a free R package.

R Commander overlays a menu-based interface to R, so just like SPSS or JMP, you can run analyses using menus.  Nice, huh?

The question was whether R Commander does everything R does, or just a small subset.

Unfortunately, R Commander can’t do everything R does. Not even close.

But it does a lot. More than just the basics.

So I thought I would show you some of the things R Commander can do entirely through menus–no programming required, just so you can see just how unbelievably useful it is.

Since R commander is a free R package, it can be installed easily through R! Just type `install.packages("Rcmdr")` in the command line the first time you use it, then type `library("Rcmdr")` each time you want to launch the menus.

### Data Sets and Variables

Import data sets from other software:

• SPSS
• Stata
• Excel
• Minitab
• Text
• SAS Xport
` `` `` `` `

Define Numerical Variables as categorical and label the values

Open the data sets that come with R packages

Merge Data Sets

Edit and show the data in a data spreadsheet

Personally, I think that if this was all R Commander did, it would be incredibly useful. These are the types of things I just cannot remember all the commands for, since I just don’t use R often enough.

### Data Analysis

Yes, R Commander does many of the simple statistical tests you’d expect:

• Chi-square tests
• Paired and Independent Samples t-tests
• Tests of Proportions
• Common nonparametrics, like Friedman, Wilcoxon, and Kruskal-Wallis tests
• One-way ANOVA and simple linear regression

What is surprising though, is how many higher-level statistics and models it runs:

• Hierarchical and K-Means Cluster analysis (with 7 linkage methods and 4 options of distance measures)
• Principal Components and Factor Analysis
• Linear Regression (with model selection, influence statistics, and multicollinearity diagnostic options, among others)
• Logistic regression for binary, ordinal, and multinomial responses
• Generalized linear models, including Gamma and Poisson models

In other words–you can use R Commander to run in R most of the analyses that most researchers need.

### Graphs

A sample of the types of graphs R Commander creates in R without you having to write any code:

• QQ Plots
• Scatter plots
• Histograms
• Box Plots
• Bar Charts

The nice part is that it does not only do simple versions of these plots.  You can, for example, add regression lines to a scatter plot or run histograms by a grouping factor.