Interpreting regression coefficients can be tricky, especially when the model has interactions or categorical predictors (or worse – both).
But there is a secret weapon that can help you make sense of your regression results: marginal means.
They’re not the same as descriptive stats. They aren’t usually included by default in our output. And they sometimes go by the name LS or Least-Square means.
And they’re your new best friend.
So what are these mysterious, helpful creatures?
What do they tell us, really? And how can we use them?
(more…)
One question that seems to come up pretty often is:
What is the difference between logistic and probit regression?
Well, let’s start with how they’re the same:
Both are types of generalized linear models. This means they have this form:
(more…)
An incredibly useful tool in evaluating and comparing predictive models is the ROC curve.
Its name is indeed strange. ROC stands for Receiver Operating Characteristic. Its origin is from sonar back in the 1940s. ROCs were used to measure how well a sonar signal (e.g., from an enemy submarine) could be detected from noise (a school of fish).
ROC curves are a nice way to see how any predictive model can distinguish between the true positives and negatives. (more…)
When you have data measuring the time to an event, you can examine the relationship between various predictor variables and the time to the event using a Cox proportional hazards model.
In this webinar, you will see what a hazard function is and describe the interpretations of increasing, decreasing, and constant hazard. Then you will examine the log rank test, a simple test closely tied to the Kaplan-Meier curve, and the Cox proportional hazards model.
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
(more…)
I received a question recently about R Commander, a free R package.
R Commander overlays a menu-based interface to R, so just like SPSS or JMP, you can run analyses using menus. Nice, huh?
The question was whether R Commander does everything R does, or just a small subset.
Unfortunately, R Commander can’t do everything R does. Not even close.
But it does a lot. More than just the basics.
So I thought I would show you some of the things R Commander can do entirely through menus–no programming required, just so you can see just how unbelievably useful it is.
Since R commander is a free R package, it can be installed easily through R! Just type install.packages("Rcmdr")
in the command line the first time you use it, then type library("Rcmdr")
each time you want to launch the menus.
Data Sets and Variables
Import data sets from other software:
- SPSS
- Stata
- Excel
- Minitab
- Text
- SAS Xport
Define Numerical Variables as categorical and label the values
Open the data sets that come with R packages
Merge Data Sets
Edit and show the data in a data spreadsheet
Personally, I think that if this was all R Commander did, it would be incredibly useful. These are the types of things I just cannot remember all the commands for, since I just don’t use R often enough.
Data Analysis
Yes, R Commander does many of the simple statistical tests you’d expect:
- Chi-square tests
- Paired and Independent Samples t-tests
- Tests of Proportions
- Common nonparametrics, like Friedman, Wilcoxon, and Kruskal-Wallis tests
- One-way ANOVA and simple linear regression
What is surprising though, is how many higher-level statistics and models it runs:
- Hierarchical and K-Means Cluster analysis (with 7 linkage methods and 4 options of distance measures)
- Principal Components and Factor Analysis
- Linear Regression (with model selection, influence statistics, and multicollinearity diagnostic options, among others)
- Logistic regression for binary, ordinal, and multinomial responses
- Generalized linear models, including Gamma and Poisson models
In other words–you can use R Commander to run in R most of the analyses that most researchers need.
Graphs
A sample of the types of graphs R Commander creates in R without you having to write any code:
- QQ Plots
- Scatter plots
- Histograms
- Box Plots
- Bar Charts
The nice part is that it does not only do simple versions of these plots. You can, for example, add regression lines to a scatter plot or run histograms by a grouping factor.
If you’re ready to get started practicing, click here to learn about making scatterplots in R commander, or click here to learn how to use R commander to sample from a uniform distribution.
In my last post I used the glm() command in R to fit a logistic model with binomial errors to investigate the relationships between the numeracy and anxiety scores and their eventual success.
Now we will create a plot for each predictor. This can be very helpful for helping us understand the effect of each predictor on the probability of a 1 response on our dependent variable.
We wish to plot each predictor separately, so first we fit a separate model for each predictor. This isn’t the only way to do it, but one that I find especially helpful for deciding which variables should be entered as predictors.
model_numeracy <- glm(success ~ numeracy, binomial)
summary(model_numeracy)
Call:
glm(formula = success ~ numeracy, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5814 -0.9060 0.3207 0.6652 1.8266
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.1414 1.8873 -3.254 0.001138 **
numeracy 0.6243 0.1855 3.366 0.000763 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 68.029 on 49 degrees of freedom
Residual deviance: 50.291 on 48 degrees of freedom
AIC: 54.291
Number of Fisher Scoring iterations: 5
We do the same for anxiety.
model_anxiety <- glm(success ~ anxiety, binomial)
summary(model_anxiety)
Call:
glm(formula = success ~ anxiety, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8680 -0.3582 0.1159 0.6309 1.5698
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.5819 5.6754 3.450 0.000560 ***
anxiety -1.3556 0.3973 -3.412 0.000646 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 68.029 on 49 degrees of freedom
Residual deviance: 36.374 on 48 degrees of freedom
AIC: 40.374
Number of Fisher Scoring iterations: 6
Now we create our plots. First we set up a sequence of length values which we will use to plot the fitted model. Let’s find the range of each variable.
range(numeracy)
[1] 6.6 15.7
range(anxiety)
[1] 10.1 17.7
Given the range of both numeracy and anxiety. A sequence from 0 to 15 is about right for plotting numeracy, while a range from 10 to 20 is good for plotting anxiety.
xnumeracy <-seq (0, 15, 0.01)
ynumeracy <- predict(model_numeracy, list(numeracy=xnumeracy),type="response")
Now we use the predict() function to set up the fitted values. The syntax type = “response” back-transforms from a linear logit model to the original scale of the observed data (i.e. binary).
plot(numeracy, success, pch = 16, xlab = "NUMERACY SCORE", ylab = "ADMISSION")
lines(xnumeracy, ynumeracy, col = "red", lwd = 2)
The model has produced a curve that indicates the probability that success = 1 to the numeracy score. Clearly, the higher the score, the more likely it is that the student will be accepted.
Now we plot for anxiety.
xanxiety <- seq(10, 20, 0.1)
yanxiety <- predict(model_anxiety, list(anxiety=xanxiety),type="response")
plot(anxiety, success, pch = 16, xlab = "ANXIETY SCORE", ylab = "SUCCESS")
lines(xanxiety, yanxiety, col= "blue", lwd = 2)
Clearly, those who score high on anxiety are unlikely to be admitted, possibly because their admissions test results are affected by their high level of anxiety.
****
See our full R Tutorial Series and other blog posts regarding R programming.
About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.