Three Habits in Data Analysis That Feel Efficient, Yet are Not

It’s easy to develop bad habits in data analysis. When you’re new to it, you just don’t have enough experience to realize that what feels like efficiency will actually come back to make things take longer, introduce problems, and lead to more frustration.

I’ve outlined 14 steps to running any data analysis, in four phases. They help keep your analysis on track. But even if you’re following those steps, you can make this harder on yourself with a few bad habits.

Bad Habit #1. Not allowing enough time to implement and learn

One of the great things about doing data analysis is each one is an opportunity to constantly improve your skills. Because there is so much nuance in every messy data set and analysis, you learn something new with every analysis.

The other side of this, of course, is that few statistical analyses are routine or quick.

That means it’s easy to underestimate not just the time it will take to run the analysis, but to troubleshoot issues and to learn new methods that you hadn’t realized you needed. Chances are the new methods you’ll have to employ will be challenging.

Even if you already have good statistical skills, a method that is new to you can take weeks or months to learn and implement. Not days.

This is especially true if it turns out you need to use a new statistical software program to implement it.

Suggested Strategy: Plan your data analysis to take months, not days. If there are no surprises, you’ll finish early.

Bad Habit #2. Not using a system for keeping track of files and steps

No matter which statistical software you use, every analysis has lots of files. Data files, program files, output files, log files. Then there are the supporting files, like the data codebook file and the statistical analysis plan. Oh yes, and the report you’re writing from the results.

And there are many steps to any analysis, within each of the four phases of data analysis: design & planning; data preparation; data analysis; and communicating results.

Your system doesn’t have to be complicated or technical. But throwing dozens of files with generic names into too few (or too many) folders is a recipe for frustration.

The same is true for variable names. Come up with a naming convention for variables too (and make sure it’s documented somewhere).

This is especially helpful if you’re collaborating with someone else. But even if you’re on your own, be kind to your future self and make everything clear.

Suggested Strategy: Take the time to organize files and institute a naming convention for files and variables. And follow them!

Bad Habit #3. Not making it easy to replicate what you’ve done on each step

As I already mentioned, there are many steps in each of the four phases of data analysis. One thing to keep in mind: while there is a clear order to these steps, it’s also common to have to backtrack.

For example, writing a statistical analysis plan is an early step and checking model assumptions is a later one.

But data don’t always behave the way we expected. So checking an assumption can derail the plan, requiring you to come up with a new plan. This is fine and it’s common. It’s worth planning, but you have to incorporate the realities and limitations of the real data set in the final analysis.

So the easier you made it to replicate your early steps, the easier they will be to rerun. There are just too many steps to remember exactly what you did on each one.

Suggested Strategy: Always use (or record) syntax for every data change and analysis you do. Use the tools available in your software to help you with this. Comment liberally so you remember exactly what each piece of code does.

The Pathway: Steps for Staying Out of the Weeds in Any Data Analysis
Get the road map for your data analysis before you begin. Learn how to make any statistical modeling – ANOVA, Linear Regression, Poisson Regression, Multilevel Model – straightforward and more efficient.

Reader Interactions

Comments

  1. Jeremy says

    Good advice. It’s nice to hear from a pro that you have to learn new things all the time, and that analysis can take weeks or months (not just days) when techniques that are new to you are involved. I especially like the advice on having a naming convention for files and intermediate output– something I’m guilty of not doing enough.


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.