Like any applied skill, mastering statistical analysis requires:
1. building a body of knowledge
2. adeptness of the tools of the trade (aka software package)
3. practice applying the knowledge and using the tools in a realistic, meaningful context.
If you think of other high-level skills you’ve mastered in your life–teaching, survey design, programming, sailing, landscaping, anything–you’ll realize the same three requirements apply.
These three requirements need to be developed over time–over many years to attain mastery. And they need to be developed together. Having more background knowledge improves understanding of how the tools work, and helps the practice go better. Likewise, practice in a real context (not perfect textbook examples) makes the knowledge make more sense, and improves skills with the tools.
I don’t know if this is true of other applied skills, but from what I’ve seen over many years of working with researchers as they master statistical analysis, the journey seems to have 3 stages. Within each stage, developing all 3 requirements–knowledge, tools, and experience–to a level of mastery sets you up well for the next stage.
Knowing what stage you’re in can help you figure out where to put your energy, time, and resources to progress forward.
Stage 1. Mastering the Basic Approach
At this stage, knowledge-building focuses on the basic concepts and vocabulary–hypothesis tests and sampling–up through basic multiple regression with continuous predictors and simple factorial ANOVAs. It usually requires 2-3 statistics classes to master the knowledge in this stage.
Mastery of software usually includes a good working ability to enter and manipulate data and run descriptive and inferential statistics, as listed above. At this stage, most researchers use a menu-based software program, like SPSS, Minitab, or JMP, but could include software with steeper learning curves, like SAS, Stata, or R.
To master this basic level, a researcher needs experience with running the data analysis for a few research projects–an honor’s or master’s thesis is usually the first, and many dissertations give a really solid foundation at this level.
Stage 2: Mastering Linear Models
Exactly what this stage entails will depend on your field and the specific type of research you do. But usually the focus is on statistical modeling.
The beauty of statistical models (they are beautiful, no?) is they all have the same core structure. There is always a response variable, a set of predictors, an estimate of the nature of their relationship, and a residual. The details vary, but if you can master one basic type of modeling, any other is a step or two away.
So whereas the first stage took you up through basic regression and ANOVA, this stage is about mastering the entirety of linear modeling.
Topics will include dummy variables, interactions, polynomial effects, random effects, model building, model fit, etc. To truly master this stage means a thorough understanding of how ANOVA and regression fit together into the General Linear Model, and to be able to fluently move from one to the other.
It will also include other methods that are used in your field. These could include structural equation modeling, multivariate techniques, survival analysis, or complex survey techniques, among others.
In software, the same programs I mentioned in stage one work well here. But they need to be approached with a higher level of skill–SPSS users should use syntax as well as menus. The methods used in the software will, of course, be more sophisticated, and you should have not just a working knowledge, but real understanding of the program’s defaults, vocabulary, and what each bit of output means.
In this stage, for a number of reasons I’ve written about, I often recommend that you master one, and become conversant in a second statistical software package. You want another option there in your back pocket when you need it (and you will need it).
Most researchers move well into this stage with their dissertation. While they learn much, most don’t master it with that single project. To really master linear modeling requires experience with different data sets, models, and research questions, and it can take years to gain experience with a variety of models.
It’s not uncommon for even seasoned researchers with strong quantitative skills to have knowledge gaps in this area. It’s hard not to unless you’ve worked on many dozens of models.
Even 10 years ago, most researchers could stop here. But with the enormous capacity of computing power has come the availability of increasingly sophisticated statistical techniques. These techniques account for issues that we previously had to gloss over with the general linear model. Because sophisticated techniques now have widespread availability, journal editors and grant issuers no longer let you get away with glossing anything over.
Stage 3: Beyond Linear Models
The knowledge base in stage 3 includes truly sophisticated statistical methodology, such as generalized linear models for categorical and discrete response variables, multilevel models, generalized linear mixed models, modern techniques for missing data, robust regression models, nonlinear models, among many, many others.
I’ve said it before and I’ll say it again–do everything you can to master linear models before moving on to these techniques. Many are extensions of the general linear model, so if you’re still struggling with interpreting interactions in a linear model, it will be doubly hard to interpret interactions that involve odds ratios.
At this point your old faithful software package may fail you. No statistical software package can do everything, and this is why you want an extra one in your repertoire.
Stata, R (or SPlus) and SAS are all quite comprehensive, and SPSS is one step behind. It has made impressive inroads into high-level techniques in recent years, but still cannot do all that the others do. JMP and Minitab just aren’t contenders at this level.
The other thing to remember as you emerge into this stage is you can’t master all of it. No one can. Things branch out widely at this point, and you just can’t learn all of it.
But you don’t need to.
You may need to master two or three, but hopefully not all at once. And if you can confidently implement linear models, you are in an excellent position to take on any of its extensions.