A number of years ago when I was still working in the consulting office at Cornell, someone came in asking for help interpreting their ordinal logistic regression results.
The client was surprised because all the coefficients were backwards from what they expected, and they wanted to make sure they were interpreting them correctly.
It looked like the researcher had done everything correctly, but the results were definitely bizarre. They were using SPSS and the manual wasn’t clarifying anything for me, so I did the logical thing: I ran it in another software program. I wanted to make sure the problem was with interpretation, and not in some strange default or  (more…)
	 
	 
	
	 
	 
		
	One great thing about logistic regression, at least for those of us who are trying to learn how to use it, is that the predictor variables work exactly the same way as they do in linear regression.
Dummy coding, interactions, quadratic terms–they all work the same way.
Dummy Coding
In pretty much every regression procedure in every stat software, the default way to code categorical variables is with dummy coding.
All dummy coding means is recoding the original categorical variable into a set of binary variables that have values of one and zero.  You may find it helpful to  (more…)
	 
	 
	
	 
	 
		
	Principal Component Analysis (PCA) is a handy statistical tool to always have available in your data analysis tool belt.
It’s a data reduction technique, which means it’s a way of capturing the variance in many variables in a smaller, easier-to-work-with set of variables.
There are many, many details involved, though, so here are a few things to remember as you run your PCA.
1. The goal of PCA is to summarize the correlations among a set of observed variables with a smaller set of linear  (more…)
	 
	 
	
	 
	 
		
	
In Part 3 and Part 4 we used the lm() command to perform least squares regressions. We saw how to check for non-linearity in our data by fitting polynomial models and checking whether they fit the data better than a linear model. Now let’s see how to fit an exponential model in R.
As before, we will use a data set of counts (atomic disintegration events that take place within a radiation source), taken with a Geiger counter at a nuclear plant.
The counts were registered over a 30 second period for a short-lived, man-made radioactive compound. We read in the data and subtract the background count of 623.4 counts per second in order to obtain (more…)
	 
	 
	
	 
	 
		
	One important consideration in choosing a missing data approach is the missing data mechanism—different approaches have different assumptions about the mechanism.
Each of the three mechanisms describes one possible relationship between the propensity of data to be missing and values of the data, both missing and observed.
The Missing Data Mechanisms
Missing Completely at Random, MCAR, means there is no relationship between  (more…)
	 
	 
	
	 
	 
		
	In all linear regression models, the intercept has the same definition: the mean of the response, Y, when all predictors, all X = 0.
But “when all X=0” has different implications, depending on the scale on which each X is measured and on which terms are included in the model.
So let’s specifically discuss the meaning of the intercept in some common models: (more…)