Interpreting Linear Regression Coefficients: A Walk through Output

In this webinar we’re doing something a little different – rather than give you an overivew of a topic, we will interpret together the regression coefficients table from a real data set. This data set is from the dissertation of a client I worked with a few years ago.  She has graciously allowed us to [...]

Read the full article →

How Simple Should a Model Be? The Case of Insignificant Controls, Interactions, and Covariance Structures

“Everything should be made as simple as possible, but no simpler” – Albert Einstein* For some reason, I’ve heard this quotation 3 times in the past 3 days.  Maybe I hear it everyday, but only noticed because I’ve been working with a few clients on model selection, and deciding how much to simplify a model. [...]

Read the full article →

Covariance Matrices, Covariance Structures, and Bears, Oh My!

But you, a researcher and data analyst, don’t need to be able to do all those complicated processes to your matrices. You do need to understand what a matrix is, be able to follow the notation, and understand a few simple matrix processes, like multiplication of a matrix by a constant.

The thing to keep in mind when it all gets overwhelming is a matrix is just a table. That’s it.

Read the full article →

5 Reasons to Run Sample Size Calculations Before Collecting Data

Reason 5: The biggest benefit of doing these calculations is to not waste years and thousands of dollars in grants or tuition pursuing an impossible analysis.

If sample size calculations indicate you need a thousand subjects to find significant results but time, money, or ethical constraints limit you to 50, don’t do that study.

Read the full article →

How to Combine Complicated Models with Tricky Effects

You’re dealing with both a complicated modeling technique (survival analysis, logistic regression, multilevel modeling) and tricky effects in the model (dummy coding, interactions, and quadratic terms).

The only way to figure it all out in a situation like that is to break it down into parts. Trying to understand all those complicated parts together is a recipe for disaster.

But if you can do linear regression, each part is just one step up in complexity. Take one step at a time.

Read the full article →

Dummy Code Software Defaults Mess With All of Us

The takeaway for you, the researcher and data analyst:

1. Give yourself a break if you hit a snag. Even very experienced data analysts, statisticians who understand what they’re doing, get stumped sometimes. Don’t ever think that performing data analysis is an IQ test. You’re bringing together many skills and complex tools.

Read the full article →

When Dummy Codes are Backwards, Your Stat Software may be Messing With You

In SAS proc glm, when you specify a predictor as categorical in the CLASS statement, it will automatically dummy code it for you in the parameter estimates table (the regression coefficients). The default reference category–what GLM will code as 0–is the highest value. This works just fine if your values are coded 1, 2, and 3. But if you’ve dummy coded them already, it’s switching them on you.

Read the full article →

Assumptions of Linear Models are about Residuals, not the Response Variable

I recently received a great question in a comment about whether the assumptions of normality, constant variance, and independence in linear models are about the residuals or the response variable.

The asker had a situation where Y, the response, was not normally distributed, but the residuals were.

Read the full article →

SAS User Group (SUGI) Proceedings

One of my favorite resources when I get stuck on a statistical detail is SUGI Proceedings papers. These are pdf papers written by and for SAS users, often with solutions to very specific analysis issues.

Read the full article →

7 Practical Guidelines for Accurate Statistical Model Building

But if the point is to answer a research question that describes relationships, you’re going to have to get your hands dirty.

It’s easy to say “use theory” or “test your research question” but that ignores a lot of practical issues. Like the fact that you may have 10 different variables that all measure the same theoretical construct, and it’s not clear which one to use.

Read the full article →