Oh so many years ago I had my first insight into just how ridiculously confusing all the statistical terminology can be for novices.
I was TAing a two-semester applied statistics class for graduate students in biology. It started with basic hypothesis testing and went on through to multiple regression.
It was a cross-listed class, meaning there were a handful of courageous (or masochistic) undergrads in the class, and they were having trouble keeping up with the ambitious graduate-level pace.
I remember one day in particular. I was leading a discussion section when one of the poor undergrads was hopelessly lost. We were talking about the simple regression–a regression model with only one predictor variable. She was stuck on understanding the regression coefficient (beta) and the intercept.
In most textbooks, the regression slope coefficient is denoted by β1 and the intercept is denoted by β0. But in the one we were using (and I’ve seen this in others) the regression slope coefficient was denoted by β (beta), and the intercept was denoted by α (alpha). I guess the advantage of this is to not have to include subscripts.
It was only after repeated probing that I realized she was logically trying to fit what we were talking about into the concepts of alpha and beta that we had already taught her–Type I and Type II errors in hypothesis testing.
Entirely. Different. Concepts.
With the same names.
Once I realized the source of the misunderstanding, I was able to explain that we were using the same terminology for entirely different concepts.
But as it turns out, there are even more meanings of both alpha and beta in statistics. Here they all are:
As I already mentioned, the definition most learners of statistics come to first for beta and alpha are about hypothesis testing.
α (Alpha) is the probability of Type I error in any hypothesis test–incorrectly rejecting the null hypothesis.
β (Beta) is the probability of Type II error in any hypothesis test–incorrectly failing to reject the null hypothesis. (1 – β is power).
Population Regression coefficients
In most textbooks and software packages, the population regression coefficients are denoted by β.
Like all population parameters, they are theoretical–we don’t know their true values. The regression coefficients we estimate from our sample are estimates of those parameter values. Most parameters are denoted with Greek letters and statistics with the corresponding Latin letters.
Most texts refer to the intercept as β0 (beta-naught) and every other regression coefficient as β1, β2, β3, etc. But as I already mentioned, some statistics texts will refer to the intercept as α, to distinguish it from the other coefficients.
If the β has a ^ over it, it’s called beta-hat and is the sample estimate of the population parameter β. And to make that even more confusing, sometimes instead of beta-hat, those sample estimates are denoted B or b.
Standardized Regression Coefficient Estimates
But, for some reason, SPSS labels standardized regression coefficient estimates as Beta. Despite the fact that they are statistics–measured on the sample, not the population.
And I can’t verify this, but I vaguely recall that Systat uses the same term. If you have Systat and can verify or negate this claim, feel free to do so in the comments.
Another, completely separate use of alpha is Cronbach’s alpha, aka Coefficient Alpha, which measures the reliability of a scale.
It’s a very useful little statistic, but should not be confused with either of the other uses of alpha.
Beta Distribution and Beta Regression
You may have also heard of Beta regression, which is a generalized linear model based on the beta distribution.
The beta distribution is another distribution in statistics, just like the normal, Poisson, or binomial distributions. There are dozens of distributions in statistics, but some are used and taught more than others, so you may not have heard of this one.
The beta distribution has nothing to do with any of the other uses of the term beta.
Other uses of Alpha and Beta
If you really start to get into higher level statistics, you’ll see alpha and beta used quite often as parameters in different distributions. I don’t know if they’re commonly used simply because everyone knows those Greek letters. But you’ll see them, for example, as parameters of a gamma distribution. Relatedly, you’ll see alpha as a parameter of a negative binomial distribution.
If you think of other uses of alpha or beta, please leave them in the comments.
See the full Series on Confusing Statistical Terms.