OptinMon

Confusing Statistical Terms #2: Alpha and Beta

December 11th, 2009 by

Oh so many years ago I had my first insight into just how ridiculously confusing all the statistical terminology can be for novices.

I was TAing a two-semester applied statistics class for graduate students in biology.  It started with basic hypothesis testing and went on through to multiple regression.

It was a cross-listed class, meaning there were a handful of courageous (or masochistic) undergrads in the class, and they were having trouble keeping (more…)


Series on Confusing Statistical Terms

December 3rd, 2009 by

One of the biggest challenges in learning statistics and data analysis is learning the lingo.  It doesn’t help that half of the notation is in Greek (literally).

The terminology in statistics is particularly confusing because often the same word or symbol is used to mean completely different concepts.

I know it feels that way, but it really isn’t a master plot by statisticians to keep researchers feeling ignorant.

Really.

It’s just that a lot of the methods in statistics were created by statisticians working in different fields–economics, psychology, medicine, and yes, straight statistics.  Certain fields often have specific types of data that come up a lot and that require specific statistical methodologies to analyze.

Economics needs time series, psychology needs factor analysis.  Et cetera, et cetera.

But separate fields developing statistics in isolation has some ugly effects.

Sometimes different fields develop the same technique, but use different names or notation.

Other times different fields use the same name or notation on different techniques they developed.

And of course, there are those terms with slightly different names, often used in similar contexts, but with different meanings. These are never used interchangeably, but they’re easy to confuse if you don’t use this stuff every day.

And sometimes, there are different terms for subtly different concepts, but people use them interchangeably.  (I am guilty of this myself).  It’s not a big deal if you understand those subtle differences.  But if you don’t, it’s a mess.

And it’s not just fields–it’s software, too.

SPSS uses different names for the exact same thing in different procedures.  In GLM, a continuous independent variable is called a Covariate.  In Regression, it’s called an Independent Variable.

Likewise, SAS has a Repeated statement in its GLM, Genmod, and Mixed procedures.  They all get at the same concept there (repeated measures), but they deal with it in drastically different ways.

So once the fields come together and realize they’re all doing the same thing, people in different fields or using different software procedures, are already used to using their terminology.  So we’re stuck with different versions of the same word or method.

So anyway, I am beginning a series of blog posts to help clear this up.  Hopefully it will be a good reference you can come back to when you get stuck.

We’ve expanded on this list with a member training, if you’re interested.

If you have good examples, please post them in the comments.  I’ll do my best to clear things up.

 

Why Statistics Terminology is Especially Confusing

Confusing Statistical Term #1: Independent Variable

Confusing Statistical Terms #2: Alpha and Beta

Confusing Statistical Term #3: Levels

Confusing Statistical Terms #4: Hierarchical Regression vs. Hierarchical Model

Confusing Statistical Term #5: Covariate

Confusing Statistical Term #6: Factor

Same Statistical Models, Different (and Confusing) Output Terms

Confusing Statistical Term #7: GLM

Confusing Statistical Term #8: Odds

Confusing Statistical Term #9: Multiple Regression Model and Multivariate Regression Model

Confusing Statistical Term #10: Mixed and Multilevel Models

Confusing Statistical Terms #11: Confounder

Six terms that mean something different statistically and colloquially

Confusing Statistical Term #13: MAR and MCAR Missing Data

 


Sharing SPSS Output across Versions

November 18th, 2009 by

If you’ve ever tried sharing SPSS output with your collaborators, advisor, or statistical consultant, you have surely noticed that the output is often not compatible across different versions of SPSS.

And if you work in a company where everyone is working on the same site license, it’s not a problem.  But if you’re collaborating with colleagues at different universities on different upgrade schedules, you might run into some problems.

It’s true that most software programs aren’t back-compatible.  You can’t read documents created in newer versions in older versions of software.

But SPSS’s sharing capabilities are more, um, interesting.

The syntax and data files are back and forward-compatible across many versions, at least since v9 or so.  (I don’t (more…)


Chi-square test vs. Logistic Regression: Is a fancier test better?

November 9th, 2009 by

I recently received this email, which I thought was a great question, and one of wider interest…

Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis.  Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?

I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.

My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.

Thank you!

(more…)


Have you Wondered how using SPSS Burns Calories?

October 30th, 2009 by

Maybe I’ve noticed it more because I’m getting ready for next week’s SPSS in GLM workshop. Just this week, I’ve had a number of experiences with people’s struggle with SPSS, and GLM in particular.

Number 1: I read this in a technical report by Patrick Burns comparing SPSS to R:

“SPSS is notorious for its attitude of ‘You want to do one of these things. If you don’t understand what the output means, click help and we’ll pop up five lines of mumbo-jumbo that you’re not going to understand either.’ “

And while I still prefer SPSS, I had to laugh because the anonymous person Burns (more…)


3 Pieces of SPSS Syntax to Keep Handy

October 23rd, 2009 by

I hope you’re getting started using SPSS Syntax by hitting that Paste button when you use the menus.

But there are a few parts of SPSS you can’t do that with. Specifically, there are syntax commands for doing all the variable definitions that you usually fill out in the “Variable View” window. But there are no Paste buttons there, so you have to know how to write the syntax from scratch.

I find the three variable definitions that I use the most are defining Variable Labels, Value Labels and Missing Data codes. The syntax is simple and logical for all three, so I’m going to just give you the basic code, which you can keep on hand and edit as you need.

For a data set with the variables Gender, Smoke, and Exercise, with the following definitions:

Gender: 0=Male, 1=Female
Smoke: 1=Never 2=Sometimes 3=Daily
Exercise: 1=Never 2=Sometimes 3=Daily

For all three variables, 999 = a user-defined missing value

We could use the following code to give descriptive variable labels, encode the value labels, and define the missing data:

VARIABLE LABELS
GENDER ‘Participant Gender’
SMOKE ‘Does Participant ever Smoke Cigarettes?’
EXERCISE ‘How Often Does Participant Exercise for a30 Minute Period?’.

Notice two things:
1. I could put all three Variable labels in the same Variable Label statement
2. There is a period at the end of the statement. This is required.

VALUE LABELS
GENDER 0 ‘Male’ 1 ‘Female
/SMOKE EXERCISE
1 ‘Never’
2 ‘Sometimes’
3 ‘Daily’.

MISSING VALUES
GENDER SMOKE EXERCISE (999).

Since all three variables have the same missing data code, I could include them all in the same statement.

There are, of course syntax rules for all of these commands, but you can easily look them up in the Command Syntax Manual.

Want to learn more? If you’re just getting started with data analysis in SPSS, or would like a thorough refresher, please join us in our online workshop Introduction to Data Analysis in SPSS.