• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • our programs
    • Membership
    • Online Workshops
    • Free Webinars
    • Consulting Services
  • statistical resources
  • blog
  • about
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Collaborate with Us
  • contact
  • login

syntax

Best Practices for Organizing your Data Analysis

by Audrey Schnell  Leave a Comment

There is a lot of skill needed to perform good data analyses. It is not just about statistical knowledge (though more statistical knowledge is always helpful). Organizing your data analysis, and knowing how to do that, is a key skill.  [Read more…] about Best Practices for Organizing your Data Analysis

Tagged With: best practices, Data Analysis, organization, syntax

Related Posts

  • Three Habits in Data Analysis That Feel Efficient, Yet are Not
  • Best Practices for Data Preparation
  • On Puzzles, Statistics, Algorithms, and Understanding
  • Four Weeds of Data Analysis That are Easy to Get Lost In

Three Habits in Data Analysis That Feel Efficient, Yet are Not

by Karen Grace-Martin  1 Comment

It’s easy to develop bad habits in data analysis. When you’re new to it, you just don’t have enough experience to realize that what feels like efficiency will actually come back to make things take longer, introduce problems, and lead to more frustration. [Read more…] about Three Habits in Data Analysis That Feel Efficient, Yet are Not

Tagged With: bad habits, Data Analysis, organization, syntax

Related Posts

  • Best Practices for Organizing your Data Analysis
  • Best Practices for Data Preparation
  • On Puzzles, Statistics, Algorithms, and Understanding
  • Four Weeds of Data Analysis That are Easy to Get Lost In

Best Practices for Data Preparation

by Audrey Schnell  1 Comment

If you’ve been doing data analysis for long, you’ve probably had the ‘AHA’ moment where you realized statistical practice is a craft and not just a science. As with any craft, there are best practices that will save you a lot of pain and suffering and elevate the quality of your work. And yet, it’s likely that no one may have taught you these. I know I never had a class on this. [Read more…] about Best Practices for Data Preparation

Tagged With: best practices, data cleaning, data preparation, Missing Data, syntax

Related Posts

  • Best Practices for Organizing your Data Analysis
  • Preparing Data for Analysis is (more than) Half the Battle
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not
  • Member Training: Data Cleaning

The Data Analysis Work Flow: 9 Strategies for Keeping Track of your Analyses and Output

by Karen Grace-Martin  7 Comments

Knowing the right statistical analysis to use in any data situation, knowing how to run it, and being able to understand the output are all really important skills for statistical analysis.  Really important.

But they’re not the only ones.

Another is having a system in place to keep track of the analyses.  This is especially important if you have any collaborators (or a statistical consultant!) you’ll be sharing your results with.  You may already have an effective work flow, but if you don’t, here are some strategies I use.  I hope they’re helpful to you.

1. Always use Syntax Code

All the statistical software packages have come up with some sort of easy-to-use, menu-based approach.  And as long as you know what you’re doing, there is nothing wrong with using the menus.  While I’m familiar enough with SAS code to just write it, I use menus all the time in SPSS.

But even if you use the menus, paste the syntax for everything you do.  There are many reasons for using syntax, but the main one is documentation.  Whether you need to communicate to someone else or just remember what you did, syntax is the only way to keep track.  (And even though, in the midst of analyses, you believe you’ll remember how you did something, a week and 40 models later, I promise you won’t.  I’ve been there too many times.  And it really hurts when you can’t replicate something).

In SPSS, there are two things you can do to make this seamlessly easy.  First, instead of hitting OK, hit Paste.  Second, make sure syntax shows up on the output.  This is the default in later versions, but you can turn in on in Edit–>Options–>Viewer.  Make sure “Display Commands in Log” and “Log” are both checked.  (Note: the menus may differ slightly across versions).

2.  If your data set is large, create smaller data sets that are relevant to each set of analyses.

First, all statistical software needs to read the entire data set to do many analyses and data manipulation.  Since that same software is often a memory hog, running anything on a large data set will s-l-o-w down processing. A lot.

Second, it’s just clutter.  It’s harder to find the variables you need if you have an extra 400 variables in the data set.

3. Instead of just opening a data set manually, use commands in your syntax code to open data sets.

Why?  Unless you are committing the cardinal sin of overwriting your original data as you create new variables, you have multiple versions of your data set.  Having the data set listed right at the top of the analysis commands makes it crystal clear which version of the data you analyzed.

4. Use Variable and Value labels religiously

I know you remember today that your variable labeled Mar4cat means marital status in 4 categories and that 0 indicates ‘never married.’  It’s so logical, right?  Well, it’s not obvious to your collaborators and it won’t be obvious to you in two years, when you try to re-analyze the data after a reviewer doesn’t like your approach.

Even if you have a separate code book, why not put it right in the data?  It makes the output so much easier to read, and you don’t have to worry about losing the code book.  It may feel like more work upfront, but it will save time in the long run.

5. Put data manipulation, descriptive analyses, and models in separate syntax files

When I do data analysis, I follow my Steps approach, which means first I create all the relevant variables, then run univariate and bivariate statistics, then initial models, and finally hone the models.

And I’ve found that if I keep each of these steps in separate program files, it makes it much easier to keep track of everything.  If you’re creating new variables in the middle of analyses, it’s going to be harder to find the code so you can remember exactly how you created that variable.

6. As you run different versions of models, label them with model numbers

When you’re building models, you’ll often have a progression of different versions.  Especially when I have to communicate with a collaborator, I’ve found it invaluable to number these models in my code and print that model number on the output.  It makes a huge difference in keeping track of nine different models.

7. As you go along with different analyses, keep your syntax clean, even if the output is a mess.

Data analysis is a bit of an iterative process.  You try something, discover errors, realize that variable didn’t work, and try something else.  Yes, base it on theory and have a clear analysis plan, but even so, the first analyses you run won’t be your last.

Especially if you make mistakes as you go along (as I inevitably do), your output gets pretty littered with output you don’t want to keep.  You could clean it up as you go along, but I find that’s inefficient.  Instead, I try to keep my code clean, with only the error-free analyses that I ultimately want to use.  It lets me try whatever I need to without worry.  Then at the end, I delete the entire output and just rerun all code.

One caveat here:  You may not want to go this approach if you have VERY computing intensive analyses, like a generalized linear mixed model with crossed random effects on a large data set.  If your code takes more than 20 minutes to run, this won’t be more efficient.

8. Use titles and comments liberally

I’m sure you’ve heard before that you should use lots of comments in your syntax code.  But use titles too.  Both SAS and SPSS have title commands that allow titles to be printed right on the output.  This is especially helpful for naming and numbering all those models in #6.

9. Name output, log, and programs the same

Since you’ve split your programs into separate files for data manipulations, descriptives, initial models, etc. you’re going to end up with a lot of files.  What I do is name each output the same name as the program file.  (And if I’m in SAS, the log too-yes, save the log).

Yes, that means making sure you have a separate output for each section.  While it may seem like extra work, it can make looking at each output less overwhelming for anyone you’re sharing it with.

Tagged With: Data analysis work flow, Statistical analysis, syntax

Related Posts

  • Best Practices for Organizing your Data Analysis
  • Three Habits in Data Analysis That Feel Efficient, Yet are Not
  • Best Practices for Data Preparation
  • Why report estimated marginal means?

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • Member Training: The Link Between ANOVA and Regression

Upcoming Workshops

    No Events

Upcoming Free Webinars

TBA

Quick links

Our Programs Statistical Resources Blog/News About Contact Log in

Contact

Upcoming

Free Webinars Membership Trainings Workshops

Privacy Policy

Search

Copyright © 2008–2023 The Analysis Factor, LLC.
All rights reserved.

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT