Data Preparation

Seven Steps for Data Cleaning

June 20th, 2024 by

Ever consider skipping the important step of cleaning your data? It’s tempting but not a good idea. Why? It’s a bit like baking.stage 1

I like to bake. There’s nothing nicer than a rainy Sunday with no plans, and a pantry full of supplies. I have done my shopping, and now it’s time to make the cake. Ah, but the kitchen is a mess. I don’t have things in order. This is no way to start.

First, I need to clear the counter, wash the breakfast dishes, and set out my tools. I need to take stock, read the recipe, and measure out my ingredients. Then it’s time for the fun part. I’ll admit, in my rush to get started I have at times skipped this step.

(more…)


Issues in Coding Missing Values

October 11th, 2023 by

There’s no mincing words here. Missing values can cause problems for every statistician. That’s true for a lot of reasons, but it can start with simple issues of choices stage 1made when coding missing values in a data set. Here are a few examples.

Example 1: The Null License Plate

Researcher Joseph Tartaro thought it would be funny to get the following California vanity license plate: (more…)


Best Practices for Formatting Date Variables

March 9th, 2023 by

Formatting Date Variables seems like it should be straightforward, but sadly, it’s not.

If you are given data that includes dates, expect confusion. Dates can be represented in many different ways. (more…)


Best Practices for Data Preparation

October 4th, 2021 by

If you’ve been doing data analysis for long, you’ve probably had the ‘AHA’ moment where you realized statistical practice is a craft and not just a science. As with any craft, there are best practices that will save you a stage 1lot of pain and suffering and elevate the quality of your work. And yet, it’s likely that no one may have taught you these. I know I never had a class on this. (more…)


Four Weeds of Data Analysis That are Easy to Get Lost In

January 18th, 2021 by

Every time you analyze data, you start with a research question and end with communicating an answer. But in between those start and end points are twelve other steps. I call this the Data Analysis Pathway. It’s a framework I put together years ago, inspired by a client who kept getting stuck in Weed #1. But I’ve honed it over the years of assisting thousands of researchers with their analysis.

(more…)


Member Training: Data Cleaning

June 1st, 2020 by

Data Cleaning is a critically important part of any data analysis. Without properly prepared data, the analysis will yield inaccurate results. Correcting errors later in the analysis adds to the time, effort, and cost of the project.

(more…)