One of the most common situations in which researchers get stuck with statistics is choosing which statistical methodology is appropriate to analyze their data. If you start by asking the following four questions, you will be able to narrow things down considerably.
Even if you don’t know the implications of your answers, answering these questions will clarify issues for you. It will help you decide what information to seek, and it will make any conversations you have with statistical advisors more efficient and useful.
1. What is your research question?
This informs everything. You need to know if you’re interested in a relationship, how one variable mediates another relationship, which variables were responded to similarly, and so on. What is it you need to know about your data?
The answer to this question will tell you the family of statistics that you need to use. Regression, Factor Analysis, Cluster Analysis, etc.
2. What variables will you use to test the question? On what scales are they measured?
Clearly define the scale of each variable. Categorical? Ordered Categories? Count? Continuous? Then do some descriptive and bivariate analyses to understand their distributions.
In some data sets, you have choices about which variable to use or which form of a variable to use. Sometimes a relationship is clear using one variable, but not the other, so you need to explore a bit. Likewise, sometimes you get the exact same result in both, but one analysis is much harder to implement.
The answer to this question informs which specific method to use within the family–eg., linear vs. logistic regression or factor analysis vs. latent class analysis.
3. What is the design of the study?
Like the scale of variables, the study design will inform which specific method is appropriate.
One major issue is independence of observations. Are there repeated measures to consider? Or stratified surveys? Did each participant respond in all conditions, just one, or some? It is vital that your analysis take non-independence into account, or all your p-values will be too small.
But there are other design issues to take into account: How many categories does your categorical variable have? Are your predictor variables crossed or nested? At how many time points did you measure a longitudinal response? Do independent variables change over time along with the dependent ones, or do they remain the same over time?
Having a thorough understanding of your design is necessary to figure out its implications. Even if you don’t understand the statistical jargon, make sure you know what you did.
4. Are there any data issues to consider?
Issues like missing data, truncated distributions, unequal sample sizes, and response rates are not part of the design or the research question. But they still can lead to the necessity of applying correctional methods (such as multiple imputation for missing data) or an entirely different analysis (such as tobit regression for censored data).
Often these data issues don’t become apparent until you have already started your analysis. This is one very good reason for conducting exploratory descriptives of your data before deciding on your final analysis. It is also why it is vital to check your assumptions.