One of the hardest steps in any project is learning to ask the right research question!
(more…)
One of the hardest steps in any project is learning to ask the right research question!
(more…)
One activity in data analysis that can seem impossible is the quest to find the right analysis. I applaud the conscientiousness and integrity that underlies this quest.
The problem: in many data situations there isn’t one right analysis.
It’s easy to think that if you just knew statistics better, data analysis wouldn’t be so hard.
It’s true that more statistical knowledge is always helpful. But I’ve found that statistical knowledge is only part of the story.
Another key part is developing data analysis skills. These skills apply to all analyses. It doesn’t matter which statistical method or software you’re using. So even if you never need any statistical analysis harder than a t-test, developing these skills will make your job easier.
There is a bit of art and experience to model building. You need to build a model to answer your research question but how do you build a statistical model when there are no instructions in the box?
Recently I gave a webinar The Steps to Running Any Statistical Model. A few hundred people were live on the webinar. We held a Q&A session at the end, but as you can imagine, we didn’t have time to get through all the questions.
This is the first in a series of written answers to some of those questions. I’ve tried to sort them by the step each is about.
A written list of the steps is available here.
If you missed the webinar, you can view the video here. It’s free.
Yes. There’s no point in asking research questions that the data you have available can’t answer.
So the order of the steps would have to change—you may have to start with a vague idea of the type of research question you want to ask, but only refine it after doing some descriptive statistics, or even running an initial model.
You want to at least start thinking about them as you’re doing the lit review and formulating your research questions.
Think about how you could measure variables, which ones are likely to be collinear or have a lot of missing data. Think about the kind of model you’d have to do for each research question.
Think of a scenario where the same research question could be operationalized such that the dependent variable is measured either continuous or ordered categories. An easy example is income in dollars measured by actual income or by income categories.
By all means, if people can answer the question with a real and accurate number, your analysis will be much, much easier. In many situations, they can’t. They won’t know, remember, or tell you their exact income. If so, you may have to use categories to prevent missing data. But these are things to think about early.
I would start by putting the literature review before Step 1. You’ll use that to decide on a theoretical research question, as well as ways to operationalize it..
But it will help you other places as well. For example, it helps the sample size calculations to have variance estimates from other studies. Other studies may give you an idea of variables that are likely to have missing data, too little variation to include as predictors. They may change your exploratory factor analysis in Step 7 to a confirmatory one.
In fact, just about every step can benefit from a good literature review.
If you missed the webinar, you can view the video here. It’s free.
The first real data set I ever analyzed was from my senior honors thesis as an undergraduate psychology major. I had taken both intro stats and an ANOVA class, and I applied all my new skills with gusto, analyzing every which way.
It wasn’t too many years into graduate school that I realized that these data analyses were a bit haphazard and not at all well thought out. 20 years of data analysis experience later and I realized that’s just a symptom of being an inexperienced data analyst.
But even experienced data analysts can get off track, especially with large data sets with many variables. It’s just so easy to try one thing, then another, and pretty soon you’ve spent weeks getting nowhere.