by David Lillis, Ph.D.

One data manipulation task that you need to do in pretty much any data analysis is recode data.  It’s almost never the case that the data are set up exactly the way you need them for your analysis.

In R, you can re-code an entire vector or array at once. To illustrate, let’s set up a vector that has missing values.

A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)


[1] 3 2 NA 5 3 7 NA NA 5 2 6

We can re-code all missing values by another number (such as zero) as follows: [click to continue…]

Send to Kindle


R Is Not So Hard! A Tutorial, Part 17: Testing for Existence of Particular Values

Sometimes you need to know if your data set contains elements that meet some criterion or a particular set of criteria. For example, you may need to know if you have missing data (NAs) lurking somewhere in a large data set…

Read the full article →

R Is Not So Hard! A Tutorial, Part 16: Counting Values within Cases

SPSS has the Count Values within Cases option, but R does not have an equivalent function. Here are two functions that you might find helpful, each of which counts values within cases inside a rectangular array…

Read the full article →

R Is Not So Hard! A Tutorial, Part 15: Counting Elements in a Data Set

Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria…

Read the full article →

Random Sample from a Uniform Distribution in R Commander

Why We Needed a Random Sample of 6 numbers between 1 and 10000 As you may have read in one of our recent newsletters, this month The Analysis Factor hit two milestones: 10,000 subscribers to our mailing list 6 years in business. We’re quite happy about both, and seriously grateful to all members of our […]

Read the full article →

Spotlight Analysis for Interpreting Interactions

Not too long ago, a client asked for help with using Spotlight Analysis to interpret an interaction in a regression model.

Spotlight Analysis? I had never heard of it.

As it turns out, it’s a (snazzy) new name for an old way of interpreting an interaction between a continuous and a categorical grouping variable in a regression model…

Read the full article →

When a Variable’s Level of Measurement Isn’t Obvious

A central concept in statistics is level of measurement of variables. It’s so important to everything you do with data that it’s usually taught within the first week in every intro stats class. But even something so fundamental can be tricky once you start working with real data…

Read the full article →

Classification and Regression Trees Webinar

Cluster analysis classifies individuals into two or more unknown groups based on a set of numerical variables.

It is related to, but distinct from, a few other multivariate techniques including discriminant Function Analysis..

Read the full article →

Generalized Linear Models in R, Part 3: Plotting Predicted Probabilities

We continue with the same glm on the mtcars data set (regressing the vs variable on the weight and engine displacement).

Now we want to plot our model, along with the observed data.

Although we ran a model with multiple predictors, it can help interpretation to plot the predicted probability that vs=1 against each predictor separately. So first we fit

Read the full article →

Generalized Linear Models in R, Part 2: Understanding Model Fit in Logistic Regression Output

by David Lillis, Ph.D. In the last article, we saw how to create a simple Generalized Linear Model on binary data using the glm() command. We continue with the same glm on the mtcars data setSend to KindleRelated PostsMixed Models for Logistic Regression in SPSS Chi-square test vs. Logistic Regression: Is a fancier test better? […]

Read the full article →