Making Dummy Codes Easy to Keep Track of

Here’s a little tip.Stage 2

When you construct Dummy Variables, make it easy on yourself  to remember which code is which.  Heck, if you want to be really nice, make it easy for anyone else who will analyze the data or read the results.

Make the codes inherent in the Dummy variable name.

So instead of a variable named Gender with values of 1=Female and 0=Male, call the variable Female.

Instead of a set of dummy variables named MaritalStatus1 with values of 1=Married and 0=Single, along with MaritalStatus2 with values 1=Divorced and 0=Single, name the same variables Married and Divorced.

And if you’re new to dummy coding, this has the extra bonus of making the dummy coding intuitive.  It’s just a set of yes/no variables about all but one of your categories.


Four Critical Steps in Building Linear Regression Models
While you’re worrying about which predictors to enter, you might be missing issues that have a big impact your analysis. This training will help you achieve more accurate results and a less-frustrating model building experience.

Reader Interactions


  1. ben says

    i would LOVE to do this, but how on earth for interactions???
    (especially with full interactions between two class variables AND a continuous covariate?)

    karen, i really enjoy just coming over here to poke around and randomly learn something new about some of these high level stats vagaries. thank you for writing about them.

    in this case, i’ve been trying to understand how to code “estimate” statements in SAS for my 2(two-‘level’)classes+1covariate situation, and i’ve been making a mell of a hess out of it. from this post, it sounds like i should code the class interaction as “class1 non-reference level \ class 2 non-reference level”, but of course, this is confusing now since the estimate codes for more than just that one group of the four possible.

    (to make it even worse, when i check estimate statement outputs, the doubly referential group (class1 reference \ class2 reference) ends up being added to the doubly non-references group while the other two are subtracted out when estimating the interaction.
    and adding the covariate is just pounding my head against the screen…)

    thanks for any useful tips on this situation, and please let me know if i haven’t been clear enough,

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.