One great thing about logistic regression, at least for those of us who are trying to learn how to use it, is that the predictor variables work exactly the same way as they do in linear regression.

Dummy coding, interactions, quadratic terms–they all work the same way.

### Dummy Coding

In pretty much every regression procedure in every stat software, the default way to code categorical variables is with dummy coding.

All dummy coding means is recoding the original categorical variable into a set of binary variables that have values of one and zero. You may find it helpful to think of these as yes/no variables for each category that indicate whether or not the original variable has that particular category value.

You need one variable for each category except one. The last category, known as the reference category, has a value of zero (no) on all the other dummy variables, so including a variable with a value of 1 (yes) for that one is redundant.

(Incidentally, if you were ever wondering what true, absolute multicollinearity looks like, go ahead and run the model with a dummy variable for that last category).

So in any regression model, the unstandardized coefficients for a dummy variable represent the difference in predicted values that variable’s category compared to the reference category.

### How Dummy Codes affect interpretation in Logistic Regression

In logistic regression, the odds ratios for a dummy variable is the factor of the odds that Y=1 within that category of X, compared to the odds that Y=1 within the reference category.

For example, let’s say you have an experiment with six conditions and a binary outcome: did the subject answer correctly or not. You need to control for a number of covariates, so you can’t just use a chi-square test.

Because there are six conditions, you’ll need 5 dummy variables. The first indicates whether the subject was in condition 1 or not; the second whether the subject was in condition 2 or not, etc. We don’t need a dummy variable for condition 6, since everyone in condition 6 has a 0 (no) on all Condition 1-5 dummy variables.

So the odds ratio for condition 1 is a ratio of the odds of answering correctly in condition 1 compared to the odds of answering correctly in condition 6. The odds ratio for condition 2 is the ratio of the odds of answering correctly in condition 2 compared to condition 6.

This way of coding works especially well if condition 6 is the control condition.

{ 2 comments… read them below or add one }

Very informative and rather easy to understand. Though, have the same question as Dina- how to read he non significant values?

Hi karen,

I want to ask you about condition that all of the dummy variables are not significant. How do i do to interprate that condition? Can i say that all of them are not significant or what should i say that?

Thank you