At The Analysis Factor, we are on a mission to help researchers improve their statistical skills so they can do amazing research.
We all tend to think of “Statistical Analysis” as one big skill, but it’s not.
Over the years of training, coaching, and mentoring data analysts at all stages, I’ve realized there are four fundamental stages of statistical skill:
Stage 1: The Fundamentals
Stage 2: Linear Models
Stage 3: Extensions of Linear Models
Stage 4: Advanced Models
There is also a stage beyond these where the mathematical statisticians dwell. But that stage is required for such a tiny fraction of data analysis projects, we’re going to ignore that one for now.
If you try to master the skill of “statistical analysis” as a whole, it’s going to be overwhelming.
And honestly, you’ll never finish. It’s too big of a field.
But if you can work through these stages, you’ll find you can learn and do just about any statistical analysis you need to.
(An aside: you will not be able to do this in the week you have left to finish your dissertation or submit your conference abstract. It takes years).
The Three Components at Each Stage
If you were only interested in learning statistics as an intellectual exercise, you could work through these stages simply by increasing your statistical knowledge.
This is essentially what classes do.
But we data analysts have two more things to master at each stage: data analysis skills and software skills.
Both are vitally important. It’s very common for someone’s knowledge to be a bit ahead of their data analysis and software skills, simply because they took a lot of stats classes.
The ideal way to progress through the stages
In an ideal world, your first data analysis project or two would be simple–at the most fundamental stage. It would involve a one-way ANOVA or perhaps some non-parametric tests.
You would have already taken a couple statistic classes and have some solid background knowledge. And you have an accessible mentor who is knowledgeable, patient, and and available to answer your questions when they come up.
You would learn a lot of data analysis skills on that project: setting up the data set, running tests, checking assumptions. You would also learn how to do all of this in the software of your choice. You’d gain experience in interpreting confidence intervals and calculating sample sizes.
You are now ready to move on to Stage 2. In your next project you could tackle some complicated linear models, like one with polynomial effects or multicollinearity.
Only after that project is done (and ideally a few more of this type) do you tackle a logistic regression model at Stage 3.
Here’s the thing.
I’ve never seen this play out in reality.
Most of the time, your very first study requires logistic regression. Oh, yeah, and a principal component analysis on the predictors to deal with the multicollinearity.
If you’re really piling it on, there are repeated measures.
So you may jump right up to Stage 3 or 4 and while you’re there trying to figure out odds ratios, you’re also struggling with getting the data set up correctly in your software (Stage 1) and figuring out the best way to build the model (Stage 2).
The other reality we’re working with here is that unless data analysis is your full time job, you can go months or years in between projects. So even if you’ve gotten pretty skilled on one project, there’s a bit of forgetting going on before the next.
The real way to progress through the stages
Since that direct, uninterrupted path is not going to happen in this life, you’re going to have to hop around a bit.
Yes, the more you can do it by starting at the bottom and working your way up the stages, the easier it will be.
But realistically, you may have to jump ahead then back down again to fill in some holes in your skills and knowledge.
It’s like dancing a statistical-skill Time Warp.
The limitation is that it it nearly impossible to jump two stages. You might be able to jump from Level 2 to 3 with a bit of work, but jumping up from 2 to 4 will require a LOT of time, guidance, and work.
We set up our workshops to give you the soup-to-nuts-knowledge, practice, and guidance to progress toward the next stage through in-depth learning on one statistical method. Most of these are at stages 2-4.
And we set up our Statistically Speaking membership to help you in a few specific ways that no one else is doing:
- help you fill in some holes in your knowledge at lower stages
- introduce you to statistical methods higher up that could be useful to you that you may not even realize exist
- give you ongoing access to professional statistical consultants so you get guidance as you’re learning and gaining experience
- discuss “how to approach data analysis” issues that cross stages
Okay, so the basic strategy is to:
- Work your way up the stages in order as much as you are able
- When you find you’re straddling stages, go back and learn (or re-learn) prior-stage concepts or attain skills you may be missing
- Get help wherever you are
- Try not to jump too many stages at once
Okay, so what are these stages and how do we navigate them?
Stage 1. The Fundamentals
Even though this is the beginning, the fundamentals aren’t easy. In fact, there are some weirdly abstract concepts here that stymie really smart people.
The Stage 1 Statistical Component: The Fundamentals
Knowledge focuses on the concepts and vocabulary of probability, statistics, and data analysis.
- hypothesis tests
- p-values and statistical significance
- effect sizes
- population, sampling, and bias
- confidence intervals
- variation and distributions
- types of variables
- descriptive statistics
- common bivariate statistical tests: t-tests, chi-square, non-parametrics, tests of proportion, and correlations
- basics of multiple regression with continuous predictors
- simple factorial ANOVAs and post-hoc tests
- fundamentals of power and sample size
It usually requires 1-2 statistics classes to master the statistical knowledge in this stage.
The Stage 1 Data Analysis Skills Component
At this stage, you need to get started with the applied skills of data analysis. These include planning an analysis, doing the steps of analysis in the most efficient order, setting up and coding data, and presenting results through graphs, tables, and a clear and thorough report.
The Stage 1 Software Skills Component
Mastery of software usually includes a good working ability to enter and manipulate data; to define and work with variables in order to run the analysis; and run descriptive and inferential statistics.
This is harder than it sounds, but a good introductory software tutorial will be invaluable.
Stage 1 Wrap up
To master Stage 1, a researcher needs experience with running the data analysis for a few research projects–an honor’s or master’s thesis is usually the first.
This is when the Stage 1 statistical knowledge really starts to make sense, and you can make real progress on using software and learning how to conduct a data analysis.
Stage 2: Linear Modeling
This one is pretty big.
When you transition to Stage 2, there is a qualitative shift in your skills that will be the foundation for everything else.
First, we move beyond statistical tests and begin statistical modeling. It’s a subtle shift, but there are skills and ways of approaching an analysis that differ between tests and models.
Second, pretty much everything you learn at this point forward requires modeling. So you’re not just improving your skills by making this transition to modeling, you’re building a solid foundation.
The Stage 2 Statistical Component: Linear Models
The good news is there is only one type of statistical model you need to learn at this stage: Linear Models.
The bad news is there is a lot to it. This stage is about mastering the entirety of linear modeling.
Linear Models will include all the tricky parts of analysis of variance, analysis of covariance, and linear regression, including:
- dummy variables
- polynomial effects
- model building
- checking assumptions and knowing what to do if they’re not met
- centering, rescaling, and standardizing variables
- creating meaningful graphs and tables from results
- calculating and reporting effect sizes
- interpreting means, graphs, and coefficients
- how to deal with data issues like missing, censored, or truncated data
- model fit, etc.
To truly master this stage means a thorough understanding of how ANOVA and regression fit together into the General Linear Model, and an ability to fluently move from one to the other.
The other piece that is hugely important is a set of skills I call data analysis skills.
These are the skills that demand experience.
The Stage Two Data Analysis Skills Component
These aren’t very different that the data analysis skills in Stage 1, but here they get more complicated.
1. Planning the Data Analysis
2. Working with Data
3. Running the data analysis
4. Presenting and Communicating Results
The Stage Two Software Skills Component
Every skill and step in the statistical and data analysis components need to be implemented in software. At this stage, I recommend learning one software.
Aim to be a whiz in that software.
Many people will tell you that a particular software is better than others, but I disagree. There is a lot that goes into the choice of statistical software. It’s most important to just commit to one and become great at it.
At this stage, you must be using syntax, not menus, so that your analysis is reproducible.
Stage 2 Wrap-up
Phew, that’s a big stage. Many people never need anything else.
If you’re not quite here, mastering this stage will get you very far in statistical analysis.
If you’re moving beyond this stage, however, realize that this is a broad set of skills. It’s very, very common to have gaps or holes at Stage 2. It’s hard not to unless you’ve worked on many dozens of models.
So if you’ve found you are working on some analysis in Stage 3 or 4 and you’re getting stuck on something here, just jump back and strengthen that foundation.
Twenty or thirty years ago, most researchers could stop here. Not anymore.
With the enormous capacity of computing power has come the availability of increasingly sophisticated statistical methods. These methods account for issues that we previously had to gloss over with linear models.
Because they now have easy availability in software, journal editors and grant issuers no longer let you get away with glossing anything over.
Nor does your conscientiousness about doing great data analysis.
Which moves us on to Stage 3.
Stage 3: One Step Beyond Linear Models
There are a whole host of statistical methods that are either extensions of linear models or simply based on regression at their core.
So Stage 3 is primarily about going deep into learning statistical models in addition to linear models.
A few common examples include:
- Logistic Regression
- Count Models, including Poisson and Negative Binomial Regression
- Multiple Imputation for Missing Data
- Principal Component and Factor Analysis
- Linear Mixed Models
- Reliability and Validity Measures
- Time Series
- Cluster Analysis
- Classification and Regression Trees
- Survival Analysis: Kaplan Meier Curves and Cox Regression
- Basic Tests and Linear Models on Complex Survey Data
With the methods at this stage, the only prerequisite is Linear Models. You don’t need to know any of these to learn any of the others.
Unless your job requires statistical analysis every day, you probably need half a dozen of these at most. Two or three is probably more reasonable.
The other thing to remember as you emerge into this stage is once you’ve learned a few of these, you’ll find that learning another isn’t all that hard.
Stage 3 Data Analysis and Software Components
All the data analysis and software skills you developed in Stage 2 are needed here too. The only new thing you learn is how to apply those skills to the specific Stage 3 model you’re working on and how those differ from Linear Models.
So these are relatively minor, compared to what you needed to learn at Stage 2.
The one new software skill I recommend here is to pick up a second statistical package. You can still think of it as a secondary package that you only use in a pinch, but it’s really, really helpful to have the skills to use two different packages.
Stage 4: The Advanced Stuff
And now we get to the really high-stage stuff.
These may or may not be hard in a high-level-math sort of way. Some definitely are. But they usually require understanding two or more of the methods and concepts at stage 3. Sometimes because they’re actually mixing together two methods at stage 3 and sometimes because they’re just super-specialized esoteric specialties of something at stage 3 that requires general statistical theory.
Again, here are a few that I see being commonly useful:
- Structural Equation Modeling
- Generalized Linear Mixed Models
- Zero Inflated Models
- Latent Class Analysis
- Propensity Score Matching
- Growth Mixture Models
- Latent Growth Curve Analysis
- Mediation with non-normal variables
Again, probably no one needs to know how to do all of these.
I do not, though we’ve got all of these covered across our team of statistics mentors.
So first of all, if you find you need one of these, realize that you’re doing hard stuff. It’s not you. But you can do it if you’ve got solid experience and knowledge at stages 2 and 3. If you don’t, go back and fill in those holes.