Karen Grace-Martin

What is an ROC Curve?

October 14th, 2016 by

An incredibly useful tool in evaluating and comparing predictive models is the ROC curve.

Its name is indeed strange. ROC stands for Receiver Operating Characteristic. Its origin is from sonar back in the 1940s. ROCs were used to measure how well a sonar signal (e.g., from an enemy submarine) could be detected from noise (a school of fish).

ROC curves are a nice way to see how any predictive model can distinguish between the true positives and negatives. (more…)


Linear Mixed Models for Missing Data in Pre-Post Studies

August 30th, 2016 by

In the past few months, I’ve gotten the same question from a few clients about using linear mixed models for repeated measures data.  They want to take advantage of its ability to give unbiased results in the presence of missing data.  In each case the study has two groups complete a pre-test and a post-test measure.  Both of these have a lot of missing data.

The research question is whether the groups have different improvements in the dependent variable from pre to post test.

As a typical example, say you have a study with 160 participants.

90 of them completed both the pre and the post test.

Another 48 completed only the pretest and 22 completed only the post-test.

Repeated Measures ANOVA will deal with the missing data through listwise deletion. That means keeping only the 90 people with complete data.  This causes problems with both power and bias, but bias is the bigger issue.

Another alternative is to use a Linear Mixed Model, which will use the full data set.  This is an advantage, but it’s not as big of an advantage in this design as in other studies.

The mixed model will retain the 70 people who have data for only one time point.  It will use the 48 people with pretest-only data along with the 90 people with full data to estimate the pretest mean.

Likewise, it will use the 22 people with posttest-only data along with the 90 people with full data to estimate the post-test mean.

If the data are missing at random, this will give you unbiased estimates of each of these means.

But most of the time in Pre-Post studies, the interest is in the change from pre to post across groups.

The difference in means from pre to post will be calculated based on the estimates at each time point.  But the degrees of freedom for the difference will be based only on the number of subjects who have data at both time points.

So with only two time points, if the people with one time point are no different from those with full data (creating no bias), you’re not gaining anything by keeping those 72 people in the analysis.

Compare this to a study I also saw in consulting with 5 time points.  Nearly all the participants had 4 out of the 5 observations.  The missing data was pretty random–some participants missed time 1, others, time 4, etc.  Only 6 people out of 150 had full data.  Listwise deletion created a nightmare, leaving only 6 people in the data set.

Each person contributed data to 4 means, so each mean had a pretty reasonable sample size.  Since the missingness was random, each mean was unbiased.  Each subject fully contributed data and df to many of the mean comparisons.

With more than 2 time points and data that are missing at random, each subject can contribute to some change measurements.  Keep that in mind the next time you design a study.

 


Mixed Models: Can you specify a predictor as both fixed and random?

February 16th, 2016 by

One of the most confusing things about mixed models arises from the way it’s coded in most statistical software.  Of the ones I’ve used, only HLM sets it up differently and so this doesn’t apply.

But for the rest of them—SPSS, SAS, R’s lme and lmer, and Stata, the basic syntax requires the same pieces of information.

1.       The dependent variable

2.       The predictor variables for which to calculate fixed effects and whether those (more…)


Member Training: Analysis of Ordinal Variables–Options Beyond Nonparametrics

January 5th, 2016 by

There are many types and examples of ordinal variables: percentiles, ranks, likert scale items, to name a few.

These are especially hard to know how to analyze–some people treat them as numerical, others emphatically say not to.  Everyone agrees nonparametric tests work, but these are limited to testing only simple hypotheses and designs.  So what do you do if you want to test something more elaborate?

In this webinar we’re going to lay out all the options and when each is (more…)


Ways to Customize a Scatter Plot in R Commander

October 21st, 2015 by

I mentioned in my last post that R Commander can do a LOT of data manipulation, data analyses, and graphs in R without you ever having to program anything.

Here I want to give you some examples, so you can see how truly useful this is.

Let’s start with a simple scatter plot between Time and the number of Jobs (in thousands) in 67 counties.  Time is measured in decades since 1960.

scatter_basic

The green line is the best fit linear regression line.

This wasn’t the default in R Commander (I actually had to remove a few things to get to this), but it’s a useful way to start out.

A few ways we can easily customize this graph:

Jittering

We see here a common issue in scatter plots–because the X values are discrete, the points are all on top of each other.

It’s difficult to tell just how many points there are at the bottom of the graph–it’s just a mass of black.

One great way to solve this is by jittering the points.

All this means is that instead of putting identical points right on top of each other, we move it slightly, randomly, in either one or both directions.  In this example, I jittered only horizontally:

scatter_jitter

So while the points aren’t graphed exactly where they are, we can see the trends and we can now see how many points there are in each decade.

How hard is this to do in R Commander? One click:

Rcmdr_Jitter

Regression Lines by Group

Another useful change to a scatter plot is to add a separate regression line to the graph based on some sort of factor in the data set.

In this example, the observations are measured for counties and each county is classified as being either Rural or Metropolitan.

If we’d like to see if the growth in jobs over time is different in Rural and Metropolitan counties, we need a separate line for each group.

In R Commander we can do this quite easily.  Not only do we get two regression lines, but each point is clearly designated as being from either a Rural or Metropolitan county through its color and shape.

It’s quite clear that not only was there more growth in the number of jobs in Metro counties, there was almost no change at all in the Rural counties.

scatter_bygroupAnd once again, how difficult is this?  This time, two clicks.

Rcmdr_groups

There are quite a few modifications you can make just using the buttons, but of course, R Commander doesn’t do everything.

For example, I could not figure out how to change those red triangles to green rectangles through the menus.

But that’s the best part about R Commander.  It works very much like the Paste button in SPSS.

Meaning, it creates the code for you.   So I can take the code it created, then edit it to get my graph looking the way I want.

I don’t have to memorize which command creates a scatter plot.

I don’t have to memorize how to pull my SPSS data into R or tell R that Rural is a factor.  I can do all that through R Commander, then just look up the option to change the color and shape of the red triangles.

 


What R Commander Can do in R Without Coding–More Than You Would Think

October 19th, 2015 by

I received a question recently about R Commander, a free R package.

R Commander overlays a menu-based interface to R, so just like SPSS or JMP, you can run analyses using menus.  Nice, huh?

The question was whether R Commander does everything R does, or just a small subset.

Unfortunately, R Commander can’t do everything R does. Not even close.

But it does a lot. More than just the basics.

So I thought I would show you some of the things R Commander can do entirely through menus–no programming required, just so you can see just how unbelievably useful it is.

Since R commander is a free R package, it can be installed easily through R! Just type install.packages("Rcmdr") in the command line the first time you use it, then type library("Rcmdr") each time you want to launch the menus.

Data Sets and Variables

Import data sets from other software:

  • SPSS
  • Stata
  • Excel
  • Minitab
  • Text
  • SAS Xport

Define Numerical Variables as categorical and label the values

Open the data sets that come with R packages

Merge Data Sets

Edit and show the data in a data spreadsheet

Personally, I think that if this was all R Commander did, it would be incredibly useful. These are the types of things I just cannot remember all the commands for, since I just don’t use R often enough.

Data Analysis

Yes, R Commander does many of the simple statistical tests you’d expect:

  • Chi-square tests
  • Paired and Independent Samples t-tests
  • Tests of Proportions
  • Common nonparametrics, like Friedman, Wilcoxon, and Kruskal-Wallis tests
  • One-way ANOVA and simple linear regression

What is surprising though, is how many higher-level statistics and models it runs:

  • Hierarchical and K-Means Cluster analysis (with 7 linkage methods and 4 options of distance measures)
  • Principal Components and Factor Analysis
  • Linear Regression (with model selection, influence statistics, and multicollinearity diagnostic options, among others)
  • Logistic regression for binary, ordinal, and multinomial responses
  • Generalized linear models, including Gamma and Poisson models

In other words–you can use R Commander to run in R most of the analyses that most researchers need.

Graphs

A sample of the types of graphs R Commander creates in R without you having to write any code:

  • QQ Plots
  • Scatter plots
  • Histograms
  • Box Plots
  • Bar Charts

The nice part is that it does not only do simple versions of these plots.  You can, for example, add regression lines to a scatter plot or run histograms by a grouping factor.

If you’re ready to get started practicing, click here to learn about making scatterplots in R commander, or click here to learn how to use R commander to sample from a uniform distribution.