# dummy variable

November 5th, 2018 by

Last week I had the pleasure of teaching a webinar on Interpreting Regression Coefficients. We walked through the output of a somewhat tricky regression model—it included two dummy-coded categorical variables, a covariate, and a few interactions.

As always seems to happen, our audience asked an amazing number of great questions. (Seriously, I’ve had multiple guest instructors compliment me on our audience and their thoughtful questions.)

### Multiple Imputation for Missing Data: Indicator Variables versus Categorical Variables

February 25th, 2016 by

A data set can contain indicator (dummy) variables, categorical variables and/or both. Initially, it all depends upon how the data is coded as to which variable type it is.

For example, a categorical variable like marital status could be coded in the data set as a single variable with 5 values: (more…)

### Missing Data Diagnosis in Stata: Investigating Missing Data in Regression Models

January 4th, 2016 by

In the last post, we examined how to use the same sample when running a set of regression models with different predictors.

Adding a predictor with missing data causes cases that had been included in previous models to be dropped from the new model.

Using different samples in different models can lead to very different conclusions when interpreting results.

Let’s look at how to investigate the effect of the missing data on the regression models in Stata.

The coefficient for the variable “frequent religious attendance” was negative 58 in model 3 and then rose to a positive 6 in model 4 when income was included. Results (more…)

### About Dummy Variables in SPSS Analysis

September 7th, 2010 by

Whenever I get email questions whose answers I think would benefit others, I like to answer them here.  I leave out the asker’s name for privacy, but this is a great question about dummy coding:

First of all, thanks for all those helpful information you provided! Thanks sincerely for all your efforts!

Actually I am here to ask a technical question. See, I have 6 locations (let’s say A, B, C, D, E, and F), and I want to see the location effect on the outcome using OLS models.

I know that if I included 5 dummy location variables (6 locations in total, with A as the reference group) in 1 block of the regression analysis, the result would be based on the comparison with the reference location.

Then what if I put 6 dummies (for example, the 1st dummy would be “1” for A location, and “0” for otherwise) in 1 block? Will it be a bug? If not, how to interpret the result?

Thanks a lot!

Great question!

If you put in a 6th dummy code for Location A, your reference group, the model will actually blow up. (Yes, that’s a technical term).

This is one of those cases of pure multicollinearity, and the model can’t be estimated uniquely.

It’s the same situation you learned back in Algebra where you have two equations, one unknown.  The problem isn’t that it can’t be solved–the problem is there are an infinite number of equally good solutions.

If an observation falls in Location A, the reference group, we’ve already gotten that information from the other 5 dummy variables.  That observation would have a 0 on all of them.  So we already know it’s location is A.  We don’t need another dummy variable to tell the model that.  It’s redundant information.  And so perfectly redundant that the model will choke.

Dummy coding is one of the topics I get the most questions about.  It can get especially tricky to interpret when the dummy variables are also used in interactions, so I’ve created some resources that really dig in deeply.

### Interpreting (Even Tricky) Regression Coefficients – A Quiz

January 15th, 2010 by

Here’s a little quiz:

### True or False?

1. When you add an interaction to a regression model, you can still evaluate the main effects of the terms that make up the interaction, just like in ANOVA.

2. The intercept is usually meaningless in a regression model. (more…)

### Logistic Regression Models for Multinomial and Ordinal Variables

January 14th, 2009 by

### Multinomial Logistic Regression

The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model.  They are used when the dependent variable has more than two nominal (unordered) categories.

Dummy coding of independent variables is quite common.  In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables.  There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables.  All but one category has its own dummy variable.  Each category’s dummy variable has a value of 1 for its category and a 0 for all others.  One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.

The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables.  The result is (more…)