SPSS

How to do a Chi-square test when you only have proportions and denominators

March 18th, 2011 by

by Annette Gerritsen, Ph.D.

In an earlier article I discussed how to do a cross-tabulation in SPSS. But what if you do not have a data set with the values of the two variables of interest?

For example, if you do a critical appraisal of a published study and only have proportions and denominators.

In this article it will be demonstrated how SPSS can come up with a cross table and do a Chi-square test in both situations. And you will see that the results are exactly the same.

‘Normal’ dataset

If you want to test if there is an association between two nominal variables, you do a Chi-square test.

In SPSS you just indicate that one variable (the independent one) should come in the row, (more…)


Recoding Variables in SPSS Menus and Syntax

March 11th, 2011 by

SPSS offers two choices under the recode command: Into Same Variable and Into Different Variables.

The command Into Same Variable replaces existing data with new values, but the command Into Different Variables adds a new variable to the data set.

In almost every situation, you want to use Into Different Variables. Recoding Into Same Variables replaces the values in the existing variable.

So if you notice a mistake after you’ve recoded, you can’t fix it.

But you may not even notice the mistake, because you can’t even test it.

And that’s just dangerous. (more…)


Mixed Models for Logistic Regression in SPSS

February 25th, 2011 by

Can I use SPSS MIXED models for (a) ordinal logistic regression, and (b) multi-nomial logistic regression?

Every once in a while I get emailed a question that I think others will find helpful.  This is definitely one of them.

My answer:

No.

(And by the way, this is all true in SAS as well.  I’ll include the SAS versions in parentheses). (more…)


Confusing Statistical Terms #5: Covariate

November 8th, 2010 by

Stage 2Covariate is a tricky term in a different way than hierarchical or beta, which have completely different meanings in different contexts.

Covariate really has only one meaning, but it gets tricky because the meaning has different implications in different situations, and people use it in slightly different ways.  And these different ways of using the term have BIG implications for what your model means.

The most precise definition is its use in Analysis of Covariance, a type of General Linear Model in which the independent variables of interest are categorical, but you also need to adjust for the effect of an observed, continuous variable–the covariate.

In this context, the covariate is always continuous, never the key independent variable, (more…)


Variable Formats in SPSS Syntax

October 21st, 2010 by

One of the places that SPSS syntax excels at efficiency is when you’re creating new variables.  This is especially true when you’re creating a LOT of new variables, but even one or two can be quicker if you write the syntax code instead of menus.

And just as importantly, you’ll have documentation for exactly how you created them. (You think you’ll remember now, but 75 new variables later, you’ll thank me).

So once you create a new variable, you should of course immediately assign a Variable Label, and if appropriate, Value Labels and Missing Data Codes using Syntax.

Another thing that helps keep your new variable clean and interpretable is to assign the format.  The default format is F8.2, which indicates a numerical value

You could go into the Variable View screen and manually change the Width and Decimals columns, which indicate how many characters go before and after (for numeric variables) the decimal point.

But why do all that when you can just use a single command to define multiple variables?

The syntax command is FORMATS.  Here is the command for some common formats:

FORMATS NumVar1 NumVar2 (F5.0)
/NumVar3 (F6.1)
/StringVar1 (A15).

You can see the FORMATS command is followed by the variable names, then the format in parentheses.

Numeric variables NumVar1 and Numvar2 will both get the same format: with 5 digits, and nothing after the decimal.

Numeric variable NumVar3 will have 6 digits total, with one after the decimal.

And string variable (i.e. its value contain letters) StringVar1 is 15 characters wide.

This will get you started, but you can get all the specifics in the FORMATS section of the  Command Syntax Reference, which is included in the SPSS help.

[Note: Edited explanation of F6.1 to be 6 digits total, not 6 digits before the decimal).

 


Using Adjusted Means to Interpret Moderators in Analysis of Covariance

September 24th, 2010 by

Stage 2If you’re like most researchers, your statistical training focused on Regression or ANOVA, but not both. It all depends on whether your field focuses more on experimental data (Biology, Psychology) or observed data (Sociology, Economics). Maybe one class covered a bit of the other, but most people are comfortable in one, but not the other.

This, in my opinion, is a shame. (Okay, I was going to say tragedy, but let’s be real.  Tsunami that kills thousands=tragedy.  Different scale here).

First of all, the distinction between ANOVA and linear regression is arbitrary. They’re really the same model with different outfits on.

Second, regardless of which one you normally use, you’re going to occasionally have to use the other kind of predictor variables–categorical or continuous. And we can come up with nice names for these models–a regression with dummy variables or an Analysis of Covariance.

But real understanding of the relationships among variables comes only when you dispense of the names and can focus on analyzing and interpreting the model using the kinds of variables you have.

There are other examples, but today I’m going to focus on an ANOVA model with a continuous covariate.

A common model is one in which one predictor is categorical (we’ll use 4 categories) and the other is continuous. Here is an example of a scatterplot of just such a model:

Scatterplot of Ancova
Scatterplot of Ancova

There are four groups, each of which received a different training.  The continuous moderator is Age, and the outcome is OverallPost, which is the post-training test score to see how well they learned the material in each training program.

As you can see, the effect of the training program is moderated by age.  Another way to say that is there is a significant interaction between Age and Training Group.  The effect of the training is depending on the trainee’s age.

One way to interpret this significant interaction is to compare the slopes of the four lines, which is easily done with any regression coefficient table.  (Okay, not always easily done, but easily found in…)

But this doesn’t make very much sense when Age is really a moderator–a predictor we want to control for, and see how it affects the relationship between the independent (IV) and dependent variables (DV), but not really the IV we’re interested in.

A better way to do it in this situation is to compare the means among groups at a low value of Age, say 20, and again at a high value of Age, say 50.  You can get p-values, adjusted for multiple comparisons, using either SAS or SPSS GLM.

SAS Proc GLM uses the LSMeans statement and SPSS GLM uses EMMeans.  They do the same thing–calculate the mean of Y for each group, at a specific value of the covariate.

If you use the menus in SPSS, you can only get those EMMeans at the Covariate’s mean, which in this example is about 25, where the vertical black line is.  This isn’t very useful for our purposes.  But we can change the value of the covariate at which to compare the means using syntax.

So it would tell us that at a young age of say 20, the three treatment groups (green, tan, and purple lines) all have means higher than the control (blue).  Young people learned more in all three treatment groups.

But at an older age, say 50, the means of the purple and tan groups were not significantly different from the control group’s (blue), and the green  (EIQ group) did worse!

In SPSS GLM, the syntax would be:

UNIANOVA OverallPost BY group WITH NEWAGE
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/EMMEANS=TABLES(group) WITH(NEWAGE=MEAN) COMPARE ADJ(SIDAK)
/EMMEANS=TABLES(group) WITH(NEWAGE=45) COMPARE ADJ(SIDAK)
/EMMEANS=TABLES(group) WITH(NEWAGE=20) COMPARE ADJ(SIDAK)
/PRINT=PARAMETER
/CRITERIA=ALPHA(.05)
/DESIGN=NEWAGE group NEWAGE*group.