Statistical Software

How to Pick an R Package

April 24th, 2023 by

One big advantage of R is its breadth. If anything has been done in statistics, there is an R package that will do it.

The problem is that sometimes there are four packages that will do it. This is big problem with R (and with Python for that matter). (more…)


Member Training: Using Macros, Loops, and Functions in Stata to Manage Your Data Software Tutorial

March 31st, 2023 by

Many data sets are challenging and time consuming to work with because the data are seldom in an optimal format.

(more…)


Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups

March 22nd, 2023 by

Stage 2If you have a categorical predictor variable that you plan to use in a regression analysis in SPSS, there are a couple ways to do it.

You can use the SPSS Regression procedure.  Or you can use SPSS General Linear Model–>Univariate, which I discuss here. If you use Syntax, it’s the UNIANOVA command.

The big question in SPSS GLM is what goes where.  As I’ve detailed in another post, any continuous independent variable goes into covariates.  And don’t use random factors at all unless you really know what you’re doing.

 

So the question is what to do with your categorical variables.  You have two choices, and each has advantages and disadvantages.

The easiest is to put categorical variables in Fixed Factors.  SPSS GLM will dummy code those variables for you, which is quite convenient if your categorical variable has more than two categories.

However, there are some defaults you need to be aware of that may or may not make this a good choice.

The dummy coding reference group default

SPSS GLM always makes the reference group the one that comes last alphabetically.

So if the values you input are strings, it will be the one that comes last.  If those values are numbers, it will be the highest one.

Not all procedures in SPSS use this default so double check the default if you’re using something else. Some procedures in SPSS let you change the default, but GLM doesn’t.

In some studies it really doesn’t matter which is the reference group.

But in others, interpreting regression coefficients will be a whole lot easier if you choose a group that makes a good comparison such as a control group or the most common group in the data.

If you want that to be the reference group in SPSS GLM, make it come last alphabetically.  I’ve been known to do things like change my data so that the control group becomes something like ZControl.  (But create a new variable–never overwrite original data).

It really can get confusing, though, if the variable was already dummy coded–if it already had values of 0 and 1.  Because 1 comes last alphabetically, SPSS GLM will make that group the reference group and internally code it as 0.

This can really lead to confusion when interpreting coefficients.  It’s not impossible if you’re paying attention, but you do have to pay attention. It’s generally better to recode the variable so that you don’t confuse yourself. And while you may believe you’re up for overcoming the confusion, why make things harder on yourself or with any other colleague you’re sharing results with?

Interactions among fixed factors default

There is another key default to keep in mind. GLM will automatically create interactions between any and all variables you specify as Fixed Factors.

If you put 5 variables in Fixed Factors, you’ll get a lot of interactions. SPSS will automatically create all 2-way, 3-way, 4-way, and even a 5-way interaction among those 5 variables.

That’s a lot of interactions.

In contrast, GLM doesn’t create by default any interactions between Covariates or between Covariates and Fixed Factors.

So you may find you have more interactions than you wanted among your categorical predictors. And fewer interactions than you wanted among numerical predictors.

There is no reason to use the default. You can override it quite easily.

Just click on the Model button. Then choose “Custom Model.”  You can then choose which interactions you do, or don’t, want in the model.

If you’re using SPSS syntax, simply add the interactions you want to the /Design subcommand.

So think about which interactions you want in the model. And take a look at whether your variables are already dummy coded.

 


Getting Started with SPSS Syntax

December 22nd, 2022 by

spss-logoYou may have heard that using SPSS syntax is more efficient, gives you more control, and ultimately saves you time and frustration.  It’s all true.

….And yet you probably use SPSS because you don’t want to code.  You like the menus.

I get it.

I like the menus, too, and I use them all the time.

But I use syntax just as often.

At some point, if you want to do serious data analysis, you have to start using syntax.  (more…)


Three SPSS Shortcuts that Make Life Easier

October 24th, 2022 by

Okay, maybe these SPSS shortcuts won’t make your whole life easier, but it will help your work life, at least the SPSS part of it.

When I consult with researchers, a common part of that is going through their analysis together.  Sometimes I notice that they’re using some shortcut in SPSS that I had not known about.

Or sometimes they could be saving themselves some headaches.

So I thought I’d share three buttons you may not have noticed before that will make your data analysis more efficient.

(more…)


Member Training: Introduction to Stata Software Tutorial

September 30th, 2022 by

In this 8-part tutorial, you will learn how to get started using Stata for data preparation, analysis, and graphing. This tutorial will give you the skills to start using Stata on your own. You will need a license to Stata and to have it installed before you begin.

(more…)