Karen Grace-Martin

EM Imputation and Missing Data: Is Mean Imputation Really so Terrible?

April 15th, 2009 by

I’m sure I don’t need to explain to you all the problems that occur as a result of missing data.  Anyone who has dealt with missing data—that means everyone who has ever worked with real data—knows about the loss of power and sample size, and the potential bias in your data that comes with listwise deletion.

stage-3

Listwise deletion is the default method for dealing with missing data in most statistical software packages.  It simply means excluding from the analysis any cases with data missing on any variables involved in the analysis.

A very simple, and in many ways appealing, method devised to (more…)


Checking Assumptions in ANOVA and Linear Regression Models: The Distribution of Dependent Variables

April 10th, 2009 by

Here’s a little reminder for those of you checking assumptions in regression and ANOVA:

The assumptions of normality and homogeneity of variance for linear models are not about Y, the dependent variable.    (If you think I’m either stupid, crazy, or just plain nit-picking, read on.  This distinction really is important). (more…)


The Distribution of Independent Variables in Regression Models

April 9th, 2009 by

I often hear concern about the non-normal distributions of independent variables in regression models, and I am here to ease your mind.Stage 2

There are NO assumptions in any linear model about the distribution of the independent variables.  Yes, you only get meaningful parameter estimates from nominal (unordered categories) or numerical (continuous or discrete) independent variables.  But no, the model makes no assumptions about them.  They do not need to be normally distributed or continuous.

It is useful, however, to understand the distribution of predictor variables to find influential outliers or concentrated values.  A highly skewed independent variable may be made more symmetric with a transformation.

 


Is Multicollinearity the Bogeyman?

April 8th, 2009 by

Stage 2Multicollinearity occurs when two or more predictor variables in a regression model are redundant.  It is a real problem, and it can do terrible things to your results.  However, the dangers of multicollinearity seem to have been so drummed into students’ minds that it created a panic.

True multicolllinearity (the kind that messes things up) is pretty uncommon.  High correlations among predictor variables may indicate multicollinearity, but it is NOT a reliable indicator that it exists.  It does not necessarily indicate a problem.  How high is too high depends on (more…)


Regression Models:How do you know you need a polynomial?

April 3rd, 2009 by

A polynomial term–a quadratic (squared) or cubic (cubed) term turns a linear regression model into a curve.  But because it is X that is squared or cubed, not the Beta coefficient, it still qualifies as a linear model.  This makes it a nice, straightforward way to model curves without having to model complicated non-linear models.

But how do you know if you need one–when a linear model isn’t the best model? (more…)


Dummy Coding in SPSS GLM–More on Fixed Factors, Covariates, and Reference Groups, Part 2

March 31st, 2009 by

Part 1 outlined one issue in deciding whether to put a categorical predictor variable into Fixed Factors or Covariates in SPSS GLM.  That issue dealt with how SPSS automatically creates dummy variables from any variable in Fixed Factors.

There is another key default to keep in mind. SPSS GLM will automatically create interactions between any and all variables you specify as Fixed Factors.

If you put 5 variables in Fixed Factors, you’ll get a lot of interactions. SPSS will automatically create all 2-way, 3-way, 4-way, and even a 5-way interaction among those 5 variables. (more…)