• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • Our Programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • About
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • Statistical Resources
  • Contact
  • Blog
  • Login

Data Analysis Practice

Member Training: An Introduction into the Grammar of Graphics

by TAF Support 1 Comment

As it has been said a picture is worth a thousand words and so it is with graphics too. A well constructed graph can summarize information collected from tens to hundreds or even thousands of data points. But not every graph has the same power to convey complex information clearly. [Read more…] about Member Training: An Introduction into the Grammar of Graphics

Tagged With: communicate results, formatting graphs, graphics, graphs, statistical results

Related Posts

  • Member Training: How to Avoid Common Graphical Mistakes
  • Member Training: Communicating Statistical Results to Non-Statisticians
  • Member Training: Communicating Statistical Results: When to Use Tables vs Graphs to Tell the Data’s Story
  • Member Training: Analyzing Pre-Post Data

Four Weeds of Data Analysis That are Easy to Get Lost In

by Karen Grace-Martin 1 Comment

Every time you analyze data, you start with a research question and end with communicating an answer. But in between those start and end points are twelve other steps. I call this the Data Analysis Pathway. It’s a framework I put together years ago, inspired by a client who kept getting stuck in Weed #1. But I’ve honed it over the years of assisting thousands of researchers with their analysis.

[Read more…] about Four Weeds of Data Analysis That are Easy to Get Lost In

Tagged With: Data Analysis, data analysis plan, data issues

Related Posts

  • Eight Data Analysis Skills Every Analyst Needs
  • The Difference Between Model Assumptions, Inference Assumptions, and Data Issues
  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not

The Difference Between Model Assumptions, Inference Assumptions, and Data Issues

by Karen Grace-Martin 4 Comments

Have you ever compared the list of model assumptions for linear regression across two sources? Whether they’re textbooks, lecture Stage 2notes, or web pages, chances are the assumptions don’t quite line up.

Why? Sometimes the authors use different terminology. So it just looks different.

And sometimes they’re including not only model assumptions, but inference assumptions and data issues. All are important, but understanding the role of each can help you understand what applies in your situation.

[Read more…] about The Difference Between Model Assumptions, Inference Assumptions, and Data Issues

Tagged With: Assumptions, data issues, inference

Related Posts

  • Four Weeds of Data Analysis That are Easy to Get Lost In
  • Member Training: Assumptions of Linear Models
  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not

What It Really Means to Remove an Interaction From a Model

by Karen Grace-Martin 3 Comments

When you’re model building, a key decision is which interaction terms to include. And which interactions to remove.Stage 2

As a general rule, the default in regression is to leave them out. Add interactions only with a solid reason. It would seem like data fishing to simply add in all possible interactions.

And yet, that’s a common practice in most ANOVA models: put in all possible interactions and only take them out if there’s a solid reason. Even many software procedures default to creating interactions among categorical predictors.

[Read more…] about What It Really Means to Remove an Interaction From a Model

Tagged With: categorical predictor, interaction, Model Building

Related Posts

  • Simplifying a Categorical Predictor in Regression Models
  • Differences in Model Building Between Explanatory and Predictive Models
  • Should I Specify a Model Predictor as Categorical or Continuous?
  • The Impact of Removing the Constant from a Regression Model: The Categorical Case

Three Rules of Statistical Analysis from Your Statistics Class to Unlearn

by Karen Grace-Martin Leave a Comment

There are important ‘rules’ of statistical analysis. Like

  • Always run descriptive statistics and graphs before running tests
  • Use the simplest test that answers the research question and meets assumptions
  • Always check assumptions.

But there are others you may have learned in statistics classes that don’t serve you or your analysis well once you’re working with real data.

When you are taking statistics classes, there is a lot going on. You’re learning concepts, vocabulary, and some really crazy notation. And probably a software package on top of that.

In other words, you’re learning a lot of hard stuff all at once. 

Good statistics professors and textbook authors know that learning comes in stages. Trying to teach the nuances of good applied statistical analysis to students who are struggling to understand basic concepts results in no learning at all.

And yet students need to practice what they’re learning so it sticks. So they teach you simple rules of application.  Those simple rules work just fine for students in a stats class working on sparkling clean textbook data.

But they are over-simplified for you, the data analyst, working with real, messy data. 

Here are three rules of data analysis practice that you may have learned in classes that you need to unlearn.  They are not always wrong. They simply don’t allow for the nuance involved in real statistical analysis.

The Rules of Statistical Analysis to Unlearn:

1. To check statistical assumptions, run a test. Decide whether the assumption is met by the significance of that test. 

Every statistical test and model has assumptions. They’re very important. And they’re not always easy to verify.

For many assumptions, there are tests whose sole job is to test whether the assumption of another test is being met. Examples include the Levene’s test for constant variance and Kolmogorov-Smirnov test, often used for normality. These tests are tools to help you decide if your model assumptions are being met.

But they’re not definitive.

When you’re checking assumptions, there are a lot of contextual issues you need to consider: the sample size, the robustness of the test you’re running, the consequences of not meeting assumptions, and more.

What to do instead:

Use these test results as one of many pieces of information that you’ll use together to decide whether an assumption is violated.

2. Delete outliers that are 3 or more standard deviations from the mean.

This is an egregious one. Really. It’s bad.

Yes, it makes the data look pretty. Yes, there are some situations in which it’s appropriate to delete outliers (like when you have evidence that it’s an error). And yes, outliers can wreak havoc on your parameter estimates.

But don’t make it a habit. Don’t follow a rule blindly.

Deleting outliers because they’re outliers (or using techniques like Winsorizing) is a great way to introduce bias into your results or to miss the most interesting part of your data set.

What to do instead:

When you find an outlier, investigate it. Try to figure out if it’s an error. See if you can figure out where it came from.

3. Check Normality of Dependent Variables before running a linear model

Q-Q plot and histogramIn a t-test, yes, there is an assumption that Y, the dependent variable, is normally distributed within each group. In other words, given the group as defined by X, Y follows a normal distribution.

ANOVA has a similar assumption: given the group as defined by X, Y follows a normal distribution.

In linear regression (and ANCOVA), where we have continuous variables, this same assumption holds. But it’s a little more nuanced since X is not necessarily categorical. At any specific value of X, Y has a normal distribution. (And yes, this is equivalent to saying the errors have a normal distribution).

But here’s the thing: the distribution of Y as a whole doesn’t have to be normal.

In fact, if X has a big effect, the distribution of Y, across all values of X, will often be skewed or bimodal or just a big old mess. This happens even if the distribution of Y, at each value of X, is perfectly normal.

What to do instead:

Because normality depends on which Xs are in a model, check assumptions after you’ve chosen predictors. 

Conclusion:

The best rule in statistical analysis: always stop and think about your particular data analysis situation.

If you don’t understand or don’t have the experience to evaluate your situation, discuss it with someone who does. Investigate it. This is how you’ll learn.

 

Tagged With: checking assumptions, data analysis practice, dropping outliers, winsorizing

Related Posts

  • A Reason to Not Drop Outliers
  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not
  • Best Practices for Data Preparation

Statistical Software Access From Home

by Karen Grace-Martin 1 Comment

Of all the stressors you’ve got right now, accessing your statistical software from home shouldn’t be one of them. (You know, the one on your office computer).

We’ve gotten some updates from some statistical software companies on how they’re making it easier to access the software you have a license to or to extend a free trial while you’re working from home.

[Read more…] about Statistical Software Access From Home

Tagged With: MPlus, R, SAS, SPSS, Stata, Statistical Software

Related Posts

  • Member Training: What’s the Best Statistical Package for You?
  • SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two
  • Tricks for Using Word to Make Statistical Syntax Easier
  • Ten Ways Learning a Statistical Software Package is Like Learning a New Language

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to page 4
  • Interim pages omitted …
  • Go to page 7
  • Go to Next Page »

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: A Gentle Introduction to Bootstrapping

Upcoming Free Webinars

Getting Started with R
3 Overlooked Strengths of Structural Equation Modeling
4 Critical Steps in Building Linear Regression Models

Upcoming Workshops

    No Events

Copyright © 2008–2022 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT