OptinMon

The Exposure Variable in Poisson Regression Models

January 23rd, 2009 by Karen Grace-Martin

Poisson Regression Models and its extensions (Zero-Inflated Poisson, Negative Binomial Regression, etc.) are used to model counts and rates. A few examples of count variables include:

– Number of words an eighteen month old can say

– Number of aggressive incidents performed by patients in an impatient rehab center

Most count variables follow one of these distributions in the Poisson family. Poisson regression models allow researchers to examine the relationship between predictors and count outcome variables.

Using these regression models gives much more accurate parameter (more…)

43 comments

Interpreting Interactions in Regression

January 19th, 2009 by Karen Grace-Martin

Adding interaction terms to a regression model has real benefits. It greatly expands your understanding of the relationships among the variables in the model. And you can test more specific hypotheses. But interpreting interactions in regression takes understanding of what each coefficient is telling you.

The example from Interpreting Regression Coefficients was a model of the height of a shrub (Height) based on the amount of bacteria in the soil (Bacteria) and whether the shrub is located in partial or full sun (Sun). Height is measured in cm, Bacteria is measured in thousand per ml of soil, and Sun = 0 if the plant is in partial sun, and Sun = 1 if the plant is in full sun.

(more…)

31 comments

Logistic Regression Models for Multinomial and Ordinal Variables

January 14th, 2009 by Karen Grace-Martin

Multinomial Logistic Regression

The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model. They are used when the dependent variable has more than two nominal (unordered) categories.

Dummy coding of independent variables is quite common. In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables. There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables. All but one category has its own dummy variable. Each category’s dummy variable has a value of 1 for its category and a 0 for all others. One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.

The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables. The result is (more…)

59 comments

The Great Likert Data Debate

January 9th, 2009 by Karen Grace-Martin

I first encountered the Great Likert Data Debate in 1992 in my first statistics class in my psychology graduate program.

My stats professor was a brilliant mathematical psychologist and taught the class unlike any psychology grad class I’ve ever seen since. Rather than learn ANOVA in SPSS, we derived the Method of Moments using Matlab. While I didn’t understand half of what was going on, this class roused my curiosity and led me to take more theoretical statistics classes. The rest is history.

A large section of the class was dedicated to the fact that Likert data was not interval and therefore not appropriate for statistics that assume normality such as ANOVA and regression. This was news to me. Meanwhile, most of the rest of the field either ignored or debated this assertion.

16 years later, the debate continues. A nice discussion of the debate is found on the Research Methodology blog by Hisham bin Md-Basir. It’s a nice blog with thoughtful entries that summarize methodological articles in the social and design sciences.

To be fair, though, this blog entry summarizes an article on the “Likert scales are not interval” side of the debate. For a balanced listing of references, see Can Likert Scale Data Ever Be Continuous?

1 comment

Variable Labels and Value Labels in SPSS

January 2nd, 2009 by Karen Grace-Martin

SPSS Variable Labels and Value Labels are two of the great features of its ability to create a code book right in the data set. Using these every time is good data analysis practice.

SPSS doesn’t limit variable names to 8 characters like it used to, but you still can’t use spaces, and it will make coding easier if you keep the variable names short. You then use Variable Labels to give a nice, long description of each variable. On questionnaires, I often use the actual question.

There are good reasons for using Variable Labels right in the data set. I know you want to get right to your data analysis, but using Variable Labels will save so much time later.

1. If your paper code sheet ever gets lost, you still have the variable names.

2. Anyone else who uses your data–lab assistants, graduate students, statisticians–will immediately know what each variable means.

3. As entrenched as you are with your data right now, you will forget what those variable names refer to within months. When a committee member or reviewer wants you to redo an analysis, it will save tons of time to have those variable labels right there.

4. It’s just more efficient–you don’t have to look up what those variable names mean when you read your output.

Variable Labels

The really nice part is SPSS makes Variable Labels easy to use:

1. Mouse over the variable name in the Data View spreadsheet to see the Variable Label.

2. In dialog boxes, lists of variables can be shown with either Variable Names or Variable Labels. Just go to Edit–>Options. In the General tab, choose Display Labels.

3. On the output, SPSS allows you to print out Variable Names or Variable Labels or both. I usually like to have both. Just go to Edit–>Options. In the Output tab, choose ‘Names and Labels’ in the first and third boxes.

Value Labels

Value Labels are similar, but Value Labels are descriptions of the values a variable can take. Labeling values right in SPSS means you don’t have to remember if 1=Strongly Agree and 5=Strongly Disagree or vice-versa. And it makes data entry much more efficient–you can type in 1 and 0 for Male and Female much faster than you can type out those whole words, or even M and F. But by having Value Labels, your data and output still give you the meaningful values.

Once again, SPSS makes it easy for you.

1. If you’d rather see Male and Female in the data set than 0 and 1, go to View–>Value Labels.

2. Like Variable Labels, you can get Value Labels on output, along with the actual values. Just go to Edit–>Options. In the ‘Output Labels’ tab, choose ‘Values and Labels’ in the second and fourth boxes.

100 comments

Poisson Regression Analysis for Count Data

December 31st, 2008 by Karen Grace-Martin

There are many dependent variables that no matter how many transformations you try, you cannot get to be normally distributed. The most common culprits are count variables–the variable that measures the count or rate of some event in a sample. Some examples I’ve seen from a variety of disciplines are:

Number of eggs in a clutch that hatch
Number of domestic violence incidents in a month
Number of times juveniles needed to be restrained during tenure at a correctional facility
Number of infected plants per transect

A common quality of these variables is that 0 is the mode–the most common value. 1 is the next most common, 2 the next, and so on. In variables with low expected counts (number of cars in a household, number of degrees earned), (more…)

4 comments