• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Data Analysis Practice

Order affects Regression Parameter Estimates in SPSS GLM

by Karen Grace-Martin Leave a Comment

Stage 2I just discovered something in SPSS GLM that I never knew.

When you have an interaction in the model, the order you put terms into the Model statement affects which parameters SPSS gives you.

The default in SPSS is to automatically create interaction terms among all the categorical predictors.  But if you want fewer than all those interactions, or if you want to put in an interaction involving a continuous variable, you need to choose Model–>Custom Model.

In the specific example of an interaction between a categorical and continuous variable, to interpret this interaction you need to output Regression Coefficients. Do this by choosing  Options–>Regression Parameter Estimates.

If you put the main effects into the model first, followed by interactions, you will find the usual output–the regression coefficients (column B) for the continuous variable is the slope for the reference group.  The coefficients for the interactions in the other categories tell you the difference between the slope for that category and the slope for the reference group.  The coefficient for the reference group here in the interaction is 0.

What I was surprised to find is that if the interactions are put into the model first, you don’t get that.

Instead, the coefficients for the interaction of each category is the actual slope for that group, NOT the difference.

This is actually quite useful–it can save a bit of calculating and now you have a p-value for whether each slope is different from 0.  However, it also means you have to be cautious and make sure you realize what each parameter estimate is actually estimating.


Related Posts

  • Why report estimated marginal means?
  • What It Really Means to Remove an Interaction From a Model
  • Simplifying a Categorical Predictor in Regression Models
  • Linear Regression for an Outcome Variable with Boundaries

The Great Likert Data Debate

by Karen Grace-Martin 1 Comment

I first encountered the Great Likert Data Debate in 1992 in my first statistics class in my psychology graduate program.Stage 2

My stats professor was a brilliant mathematical psychologist and taught the class unlike any psychology grad class I’ve ever seen since.  Rather than learn ANOVA in SPSS, we derived the Method of Moments using Matlab.  While I didn’t understand half of what was going on, this class roused my curiosity and led me to take more theoretical statistics classes.  The rest is history.

A large section of the class was dedicated to the fact that Likert data was not interval and therefore not appropriate for  statistics that assume normality such as ANOVA and regression.  This was news to me.  Meanwhile, most of the rest of the field either ignored or debated this assertion.

16 years later, the debate continues.  A nice discussion of the debate is found on the Research Methodology blog by Hisham bin Md-Basir.  It’s a nice blog with thoughtful entries that summarize methodological articles in the social and design sciences.

To be fair, though, this blog entry summarizes an article on the “Likert scales are not interval” side of the debate.  For a balanced listing of references, see Can Likert Scale Data Ever Be Continuous?

Tagged With: Likert Scale, Research Methodology

Related Posts

  • Simplifying a Categorical Predictor in Regression Models
  • Why report estimated marginal means?
  • Four Weeds of Data Analysis That are Easy to Get Lost In
  • What It Really Means to Remove an Interaction From a Model

A Reason to Not Drop Outliers

by Karen Grace-Martin 4 Comments

I recently had this question in consulting:

I’ve got 12 out of 645 cases with Mahalanobis’s Distances above the critical value, so I removed them and reran the analysis, only to find that another 10 cases were now outside the value. I removed these, and another 10 appeared, and so on until I have removed over 100 cases from my analysis! Surely this can’t be right!?! Do you know any way around this? It is really slowing down my analysis and I have no idea how to sort this out!!

And this was my response:

I wrote an article about dropping outliers.  As you’ll see, you can’t just drop outliers without a REALLY good reason.  Being influential is not in itself a good enough reason to drop data.

Tagged With: dropping outliers, influential outliers, Mahanalobis Distance, outliers

Related Posts

  • Three Rules of Statistical Analysis from Your Statistics Class to Unlearn
  • Outliers: To Drop or Not to Drop
  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not

Outliers: To Drop or Not to Drop

by Karen Grace-Martin 24 Comments

Should you drop outliers? Outliers are one of those statistical issues that everyone knows about, but most people aren’t sure how to deal with.  Most parametric statistics, like means, standard deviations, and correlations, and every statistic based on these, are highly sensitive to outliers.

And since the assumptions of common statistical procedures, like linear regression and ANOVA, are also based on these statistics, outliers can really mess up your analysis.

Despite all this, as much as you’d like to, it is NOT acceptable to drop an observation just because it is an outlier.  They can be legitimate observations and are sometimes the most interesting ones.  It’s important to investigate the nature of the outlier before deciding.

  1. If it is obvious that the outlier is due to incorrectly entered or measured data, you should drop the outlier:

    For example, I once analyzed a data set in which a woman’s weight was recorded as 19 lbs.  I knew that was physically impossible.  Her true weight was probably 91, 119, or 190 lbs, but since I didn’t know which one, I dropped the outlier.

    This also applies to a situation in which you know the datum did not accurately measure what you intended.  For example, if you are testing people’s reaction times to an event, but you saw that the participant is not paying attention and randomly hitting the response key, you know it is not an accurate measurement.

  2. If the outlier does not change the results but does affect assumptions, you may drop the outlier.  But note that in a footnote of your paper.

    Neither the presence nor absence of the outlier in the graph below would change the regression line:

    graph-1

  3. More commonly, the outlier affects both results and assumptions.  In this situation, it is not legitimate to simply drop the outlier.  You may run the analysis both with and without it, but you should state in at least a footnote the dropping of any such data points and how the results changed.

    graph-2

  4. If the outlier creates a strong association, you should drop the outlier and should not report any association from your analysis.

    In the following graph, the relationship between X and Y is clearly created by the outlier.  Without it, there is no relationship between X and Y, so the regression coefficient does not truly describe the effect of X on Y.

    graph-3

So in those cases where you shouldn’t drop the outlier, what do you do?

One option is to try a transformation.  Square root and log transformations both pull in high numbers.  This can make assumptions work better if the outlier is a dependent variable and can reduce the impact of a single point if the outlier is an independent variable.

Another option is to try a different model.  This should be done with caution, but it may be that a non-linear model fits better.  For example, in example 3, perhaps an exponential curve fits the data with the outlier intact.

Whichever approach you take, you need to know your data and your research area well.  Try different approaches, and see which make theoretical sense.

Tagged With: dropping outliers, outliers, regression assumptions, transformation

Related Posts

  • Best Practices for Data Preparation
  • Four Weeds of Data Analysis That are Easy to Get Lost In
  • Member Training: Data Cleaning
  • Member Training: Determining Levels of Measurement: What Lies Beneath the Surface

The Statistics Myth: Why Statistics Seems so Hard to Learn

by Karen Grace-Martin 25 Comments

There are probably many myths about statistics, but there is one that I believe leads to the most frustration in researchers (and students) as they attempt to learn and apply statistics.

The Carpentry Class: A Fable

There was once a man who needed to build a house. He had a big pile of lumber, and he needed a place to live, so building one himself seemed like a good idea.

He realized that he did not have the knowledge and many skills needed to build a house.

So he did what any intelligent, well-educated person would do. He took a course: House Building 101.

There was a lot of new jargon: trusses, plumb walls, 16” on center, cripple studs. It was hard to keep it all straight. It didn’t make sense. Why would anyone ever need a header anyway?

But he made it through with a B+. He learned the basics. The doghouse he built in the lab was pretty straight. He even took another course to make sure he knew enough: Advanced Carpentry.

It was time for the man to build his house. He had his land, his plan, his tools, his sacks of concrete, windows, lumber, and nails.

The first day he started with enthusiasm. He swung his hammer with gusto and nailed his first wall into place. It felt good.

But wait. His house was being built on a hill. The textbook only had flat land. How should he deal with hills?

And this house has a bay window. His doghouse had only double hung windows. Doesn’t a bay window stick out?

And he was not sure which technique to use to make that 145 degree angle in the hall. The courses never mentioned anything but 90 degree angles.

In class, they used circular saws. In order to install the trim he ordered, he needed to use a chop saw and a table saw.

He didn’t realize he was supposed to put in the plumbing before the electric, so he ended up doing a LOT of rewiring when the plumbing wouldn’t fit around his wires.

Even with the plans in front of him, there were so many decisions to make, so many new skills to learn.

And he was supposed to move into the house in 4 months when his lease ran out. He’d never get it done in time. Not on his own.

He sounds like a fool, doesn’t he? No one could build a house after taking even a few courses. Especially not with a deadline.

Building a house requires the knowledge of how walls are constructed, sure. But it also requires the ability to use the tools, and the practical skills to implement the techniques.

We can see that this project was a silly one to tackle, yet all the time we think it’s our fault that we have trouble with statistical analysis after taking a few classes.

The Statistics Myth:

Having knowledge about statistics is the only thing necessary to practice statistics.

This isn’t true.

And it’s not helpful.

Yes, the knowledge is necessary, but it is not sufficient.

Statistics doesn’t make sense to students because it is taught out of context. Most people don’t really learn statistics until they start analyzing data in their own research. Yes, it makes those classes tough. You need to acquire the knowledge before you can truly understand it.

The only way to learn how to build a house is to build one. The only way to learn how to analyze data is to analyze some.

Here’s the thing. Data analysts (and house builders) need practical support as they learn. Yes, both could slug it out on their own, but it takes longer, is more frustrating, and leads to many more mistakes.

None of this is necessary. There can be a happy ending.

Carpenters work alongside a master to learn their craft. I have never heard of a statistician or a thesis advisor who sits next to a novice analyzing data. (Anyone who had an advisor like that should consider themselves lucky). Unlike a novice carpenter, a novice data analyst is not helpful. They can’t even hold the ladder.

More common are advisors who tell their students which statistics classes to take (again, if they’re lucky) then send them off to analyze data. The student can ask questions as they go along if they are not too afraid to admit what they don’t know.  And if their advisor is available. And knows the answer.

Really good advisors are not too busy to answer in a timely manner and are willing to admit it if they don’t know the answer.

But most data analysts feel a bit lost. Not just new ones—many experienced researchers never really learned statistical practice very well in the first place. Nearly all researchers face new statistical challenges as their research progresses, and it’s often difficult to find someone knowledgeable enough who is willing to and able explain it.

They are not lost because they are stupid.

They are not lost because statistics is beyond their capabilities.

They are not lost because they didn’t do well in their statistics classes.

They are lost because like carpentry, statistical analysis is an applied skill, a craft.

Acquiring the background knowledge is only one essential part of mastering a craft.

Others include:

  • a belief you can do it
  • a commitment to best practices
  • experience in applying the skills in different situations
  • proficiency in using the tools
  • a resource library
  • ongoing training to learn new skills
  • (ideally) a mentor to guide you as you practice.

Think about it.  How many skills (dancing, sailing, teaching) have you acquired in your life by only taking a class that gave you background knowledge, but no real experience and no real mentor to coach you?

So if you’re stuck on something in statistics, give yourself a break.  You can do this with the right support.

Everything we do at The Analysis Factor is to help you get unstuck.  If you’re frustrated, tired, or even scared…there is another way.

 

If you need help right now, we’ve got your back. Please check out our Statistical Consulting services and our Statistically Speaking membership.

Tagged With: learning statistics, statistics

Related Posts

  • Member Training: Analyzing Pre-Post Data
  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not
  • Member Training: Heterogeneity in Meta-analysis

  • « Go to Previous Page
  • Go to page 1
  • Interim pages omitted …
  • Go to page 5
  • Go to page 6
  • Go to page 7

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: A Gentle Introduction to Bootstrapping

Upcoming Free Webinars

Getting Started with R
3 Overlooked Strengths of Structural Equation Modeling
4 Critical Steps in Building Linear Regression Models

Upcoming Workshops

    No Events

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT