The Analysis Factor - StatWise - Before the Sample Size Estimate: Planning an Appropriate Statistical Analysis

Volume 4, Issue 5

December 2011

In this Issue

A Note from Karen

Featured Article: The Meanings of Interaction and Association

Resource of the Month

What's New

About Us

Quick Links

Our Website

More About Us

You received this email because you subscribed to The Analysis Factor's list community. To change your subscription, see the link at end of this email. If your email is having trouble with the format, click here for a web version.

Please forward this to anyone you know who might benefit. If you received this from a friend, sign up for this email newsletter here.

A Note From Karen

Karen Grace-Martin Happy December!

I hope things are slowing down a bit as your semester or year ends, and you're able to take some time off during the holiday season.

Our entire office will be closed the week of December 26– December 30 so that we can all come back refreshed and ready to serve you in the new year.

In January we'll be offering one of our most popular workshops once again: Interpreting (Even Tricky) Regression Coefficients. This was the first workshop we ever offered and that's because it's one of those topics that has the biggest jump from the mechanics and concepts you learn in classes to what it all really means when you're analyzing real data. The focus is on understanding on an intuitive level what the different terms in your model mean.

It's one of my favorite workshops, because I consider this the fun stuff. If you're interested, all the details are on the registration page.

This month's article is on the difference between interactions and associations among variables. This very question came up recently both in consulting and in the Analyzing Repeated Measures Data Workshop. This is exactly the kind of confusing but important topic that you can really help you to understand on an intuitive level. So I hope this article helps illuminate things.

Thanks again for all your support this year. All of us at The Analysis Factor wish you a wonderful holiday season and a happy new year.

Happy analyzing!
Karen

Feature Article

The Meanings of Interaction and Association

It’s really easy to mix up the concepts of association (a.k.a. correlation) and interaction. Or to assume if two variables interact, they must be associated. But it’s not actually true.

In statistics, they have different implications for the relationships among your variables, especially when the variables you’re talking about are predictors in a regression or ANOVA model.

Association

Association between two variables means the values of one variable relate in some way to the values of the other. Association is usually measured by correlation for two continuous variables and by cross tabulation and a Chi-square test for two categorical variables.

Unfortunately, there is no nice, descriptive measure for association between one categorical and one continuous variable, but either one-way analysis of variance or logistic regression can test an association (depending upon whether you think of the categorical variable as the independent or the dependent variable).

Essentially, association means the values of one variable generally co-occur with certain values of the other.

Interaction

Interaction is different. Whether two variables are associated says nothing about whether they interact in their effect on a third variable. Likewise, if two variables interact, they may or may not be associated.

An interaction between two variables means the effect of one of those variables on a third variable is not constant—the effect differs at different values of the other.

What Association and Interaction Describe in a Model

The following examples show three situations for three variables: X1, X2, and Y. X1 is a continuous independent variable, X2 is a categorical independent variable, and Y is the dependent variable. I chose these types of variables to make the plots easy to read, but any of these variables could be either categorical or continuous.

In scenario 1, X1 and X2 are associated. If you ignore Y, you can see the mean of X1 is lower when X2=0 than when X2=1. But they do not interact in how they affect Y—the regression lines are parallel. X1 has the same effect on Y (the slope) for both X2=1 and X2=0.

A simple example is the relationship between height (X1) and weight (Y) in male (X2=0) and female (X2=1) teenagers. There is a relationship between height (X1) and gender (X2), but for both genders, the relationship between height and weight is the same.

This is the situation you’re trying to take care of by including control variables. If you didn’t include gender as a control, a regression would fit a single line to all these points and attribute all variation in weights to differences in heights. This line would also be steeper, as it tried to fit all the points using one line, and it would overestimate the size of the unique effect of height on weight.

In a second scenario, X1 and X2 are not associated—the mean of X1 is the same for both categories of X2. But how X1 affects Y differs for the two values of X2—the definition of an interaction. The slope of X1 on Y is greater for X2=1 than it is for X2=0, in which it is nearly flat.

An example of this would be an experiment in which X1 was a pretest score and Y a posttest score. Imagine participants were randomly assigned to a control (X2=1) or a training (X2=0) condition.

If randomization is done well, the assigned condition (X2) should be unrelated to the pretest score (X1). But they do interact—the relationship between pretest and posttest differs in the two conditions.

In the control condition, without training, the pretest and posttest scores would be highly correlated, but in the training condition, if the training worked well, pretest scores would have less effect on posttest scores.

In the third scenario, we’ve got both an association and an interaction. X1 and X2 are associated—once again the mean of X1 is lower when X2=0 than when X2=1. They also interact with Y—the slopes of the relationship between X1 and Y are different when X2=0 and X2=1. So X2 affects the relationship between X1 and Y.

A good example here would be if Y is the number of jobs in a county, X1 is the percentage of the workforce that holds a college degree, and X2 is whether the county is rural (X2=0) or metropolitan (X1=0).

It’s clear rural counties have, on average, lower percentages of college-educated citizens than metropolitan counties. They also have fewer jobs.

It’s also clear that the workforce’s education level in metropolitan counties is related to how many jobs there are. But in rural counties, it doesn’t matter at all.

This situation is also what you would see if the randomization in the last example did not go well or if randomization was not possible.

The differences between Interaction and Association will become clearer as you analyze more data. It's always a good idea to stop and explore your data through graphs or by trying different terms in your model to figure out exactly what's going on with your variables.

Resource of the Month

Steps to Take When Your Regression (or Other Statistical) Results Just Look…Wrong

Interpreting Interactions Between Two Effect-Coded Categorical Predictors

Clarifications on Interpreting Interactions in Regression

Interpreting Interactions: When the F test and the Simple Effects disagree

What's New

Upcoming Workshops:

Interpreting (Even Tricky) Regression Coefficients

When you understand the meaning of Linear Regression Coefficients for All Types of Predictor Variables--Dummy Variables, Interactions, Quadratic, Centered and Standardized Variables--you can build more sophisticated and more accurate models and easily translate your findings into meaningful results.

Begins January 12th

Early registration opens December 5. Get more information and register here.

Running Regressions and ANCOVAs in SPSS GLM

Become the SPSS GLM ninja you always wanted to be. It's often much easier to run Regression models in GLM than in the Regression procedure. This workshop walks through the options on both procedures, so you learn which advantages each has, what the details mean, and when to use each.

Begins March 1st

Early registration opens February 6th. Get more information and register here.

About Us

What is The Analysis Factor? The Analysis Factor is the difference between knowing about statistics and knowing how to use statistics in data analysis. It acknowledges that statistical analysis is an applied skill. It requires learning how to use statistical tools within the context of a researcher’s own data, and supports that learning.

The Analysis Factor, the organization, offers statistical consulting, resources, and learning programs that empower researchers to become confident, able, and skilled statistical practitioners. Our aim is to make your journey acquiring the applied skills of statistical analysis easier and more pleasant.

Karen Grace-Martin, the founder, spent seven years as a statistical consultant at Cornell University. While there, she learned that being a great statistical advisor is not only about having excellent statistical skills, but about understanding the pressures and issues researchers face, about fabulous customer service, and about communicating technical ideas at a level each client understands.

You can learn more about Karen Grace-Martin and The Analysis Factor at theanalysisfactor.com.

Please forward this newsletter to colleagues who you think would find it useful. Your recommendation is how we grow.

If you received this email from a friend or colleague, click here to subscribe to this newsletter.

Need to change your email address? See below for details.

No longer wish to receive this newsletter? See below to cancel.