I recently received this question:
I have scale which I want to run Chronbach’s alpha on. One response category for all items is ‘not applicable’. I want to run Chronbach’s alpha requiring that at least 50% of the items must be answered for the scale to be defined. Where this is the case then I want all missing values on that scale replaced by the average of the non-missing items on that scale. Is this reasonable? How would I do this in SPSS?
My Answer:
In RELIABILITY, the SPSS command for running a Cronbach’s alpha, the only options for Missing Data (more…)
Two designs commonly used in epidemiology are the cohort and case-control studies. Both study causal relationships between a risk factor and a disease. What is the difference between these two designs? And when should you opt for the one or the other?
Cohort studies
Cohort studies begin with a group of people (a cohort) free of disease. The people in the cohort are grouped by whether or not they are exposed to a potential cause of disease. The whole cohort is followed over time to see if (more…)
In a previous post, Interpreting Interactions in Regression, I said the following:
In our example, once we add the interaction term, our model looks like:
Height = 35 + 4.2*Bacteria + 9*Sun + 3.2*Bacteria*Sun
Adding the interaction term changed the values of B1 and B2. The effect of Bacteria on Height is now 4.2 + 3.2*Sun. For plants in partial sun, Sun = 0, so the effect of Bacteria is 4.2 + 3.2*0 = 4.2. So for two plants in partial sun, a plant with 1000 more bacteria/ml in the soil would be expected to be 4.2 cm taller than a (more…)
There are many types of outcome variables that don’t work in linear models, but look like they should. (I mean, specifically, OLS regression and ANOVA models).
They include discrete counts; truncated or censored variables, where part of the distribution is cut off or measured only up to a certain point; and bounded variables, like proportions and percentages.
This article outlines a particular type of outcome variable: one that measures whether or when an event occurs. They are typically called (more…)
Do you find quizzes irresistible? I do.
Here’s a little quiz about working with missing data:
True or False?
1. Imputation is really just making up data to artificially inflate results. It’s better to just drop cases with missing data than to impute.
2. I can just impute the mean for any missing data. It won’t affect results, and improves power.
3. Multiple Imputation is fine for the predictor variables in a statistical model, but not for the response variable.
4. Multiple Imputation is always the best way to deal with missing data.
5. When imputing, it’s important that the imputations be plausible data points.
6. Missing data isn’t really a problem if I’m just doing simple statistics, like chi-squares and t-tests.
7. The worst thing that missing data does is lower sample size and reduce power.
Answers: (more…)
In my last post, I gave a little quiz about missing data. This post has the answers.
If you want to try it yourself before you see the answers, go here. (It’s a short quiz, but if you’re like me, you find testing yourself irresistible).
True or False?
1. Imputation is really just making up data to artificially inflate results. It’s better to just drop cases with missing data than to impute. (more…)