R² is such a lovely statistic, isn’t it? Unlike so many of the others, it makes sense–the percentage of variance in Y accounted for by a model.

I mean, you can actually understand that. So can your grandmother. And the clinical audience you’re writing the report for.

A big R² is always good and a small one is always bad, right?

Well, maybe.

I’ve seen a lot of people get upset about small R² values, or any small effect size, for that matter. I recently heard a comment that no regression model with an R² smaller than .7 should even be interpreted.

Now, there may be a context in which that rule makes sense, but as a general rule, no.

Just because effect size is small doesn’t mean it’s bad, unworthy of being interpreted, or useless. It’s just small. Even small effect sizes can have scientific or clinical significance. It depends on your field.

For example, in a dissertation I helped a client with many years ago, the research question was about whether religiosity predicts physical health. (If you’ve been in any of my workshops, you’ll recognize this example–it’s a great data set. The model used frequency of religious attendance as an indicator of religiosity, and included a few personal and demographic control variables, including gender, poverty status, and depression levels, and a few others.

The model R² was about .04, although the model was significant.

It’s easy to dismiss the model as being useless. You’re only explaining 4% of the variation? Why bother?

But think about this. If you think about all of the things that might affect someone’s health, do you really expect religious attendance to be a *major* contributor?

Even though I’m not a health researcher, I can think of quite a few variables that I would expect to be much better predictors of health. Things like age, disease history, stress levels, family history of disease, job conditions.

And putting all of them into the model would indeed give better predicted values. If the *only* point of the model was prediction, my client’s model *would* do a pretty bad job. (Perhaps the 70% comment came from someone who only runs prediction models).

But it wasn’t. The point was to see if there was a small, but reliable relationship. And there was.

Do small effect sizes require larger samples to find significance? Sure. But this data set had over 5000 people. Not a problem.

Many researchers turned to using effect sizes because evaluating effects using p-values alone can be misleading. But effect sizes can be misleading too if you don’t think about what they mean within the research context.

Sometimes being able to easily improve an outcome by 4% is clinically or scientifically important. Sometimes it’s not even close enough. Sometimes it depends on how much time, effort, or money would be required to get a 4% improvement.

As much as we’d all love to have straight answers to what’s big enough, that’s not the job of any statistic. You’ve got to think about it and interpret accordingly.

{ 57 comments… read them below or add one }

hi colleagues,

I am also faced with the same R2 problem using climate data to analyse the impact of climate change on groundwater levels. I used rainfall, temperature and evaporation as my independent variables and groundwater level as the dependent variable. my R2 is 48% should I interpret these results

why i am getting low r square value of 0.0471 in nifty 50 and crude oil prices. and whether it is useful to accept the model.

Good day all, please i need help in my regression result. None of the IVs are significant and the Adjusted R2 is very low….35%. Please what should be done?

Hi guys. Here I have a question. When I run the regression with a sample size=99, the R squared is around 60%, but after I change the sample size into 270, the R squared suddenly changed to only about 1%. I wonder what happens here?

Hello,

I used a Zero Inflated Negative Binomial Model to predict duck presence and density based on a number of habitat covariates. I got very low R2 (0.03 in some cases). The statistician that helped me develop the model, said that a low R2 is not uncommon and the model can still be useful. I need to locate some primary literature references that support this statement. Please help.

Thank you

i m doing research on behavioral finance. r square value is very low 2.5 🙁 what can i do to improve it? plzzzz suggest

1) investigate the factors (probably you skip the most important ones)

2) probably you should use not linear but anouther type of model

Thank you for sharing your views on a widely debated topic. There is a lot of confusion regarding the use of small and big R2 values, you have surely made some good points related to it.

i have a model. who can help me to analyse it in EViews ?

my email is : armin.ghaznavi77@gmail.com

what does .78 r square value indicate?

is it a too high value?

r square change value *

It implies your model explains 75% of the variation caused by using the ‘explanatory’ variable you used.

As a rule of thumb, I learned that models only with a R-squared value of >80 is good for prediction.

Hope all will b good health.

My question is, please tell me about Beta value range regression analysis for high, moderate and low.

I am trying to find whether there is a relation between two variables.

I am using simple linear regression in which model R2 is very low 0.0008 but model p value which is same as the feature p-value is high 1.592e-05. How should this model be interpreted?

Clearly, feature only explains 0.08 percent of variation in data but still that feature is very significant. Does this mean there is some relation b/w feature and output?

A high R2 is important if you like to do prediction of a single observation (unit, patient, houshold, etc.). But if you are interested on the aggregate level (publich health issue, economics, etc.) I wouldn’t consider too much a low or very low R2. If the effect is large (and significant) than you can do prediction or inference on the higher level but not on the single sample member (patient, houshold, …).

What is wrong if my R square is 34% variance, having 100 sample of my variables?

Hi all

I have a question from my assignment that says to explain why the regression line (below) without referring to the numerical results cannot be the least squares line of best fit

Stature= -11.68 + 4.167 x Metacarpal length

The 2 variables measured were:

-Metacarpal bone length (mm)

– Stature- a persons standing height (cm)

Hi all,

I’m having unexpected problems with my analysis. All correlation indicator such as R square etc. indicate there should not be a correlation but I can visually see a correlation. One of the variables is of low values (between 0.02 and 0.12) and the other varies between 48 and 56, I have a sample size of 24.

This is an analysis I have conducted on other sites with the same variables and same sample size and I am getting reliable correlation indicators.

Any ideas as to why I could visually see a correlation but it not be reflected in the output data?

Hello everyone, I have r’s of 0,0035. So according to this discussion this means, that my independent variable is influenced by other things than the dependent variable, correct? This is only the first step, as I will include a moderator and controll variables in the next step. So maybe indeed, the moderator will amplify the relationship; or other things cause the independent variable. Did I get that right?

My model gives R square value .007 and adjusted R square .003 …Sample size has been increased upto 700, and the no. of independent variable is four ….worried ? plz help me out

same problem wd me .. my r square. 007 and adjusted r -.008 and r 0.081 im worried what I do

Thanks so much. I have gained a lot from all the conversations. My question is at what level is R-square considered a good-fit generally?

Thanks.

…I’ve always found Anscomb’s quartet a good illustration of the importance of visualizing data.

please am finding it very difficult to explain adjusted R- square value of 41% my r- square value is almost 43%

Please guys, When does a model stop being useful for estimating and predicting future response for y given a future value of x

In my data, almost all significant effects are smaller than r²=.05.

That seems pretty depressing, but guess, when my predictor is only one of a bazillion explanations, I can still go ahead, and say “Well, 5% is not much, but I there is at least a small portion to predict.” Right? 🙂

I am in love with this conversation by the way. It has reinforced a lot of my thoughts about R squared and model fit. Thanks to all contributors.

While context is definitely important, and a priori decided hypothesis should be backed with theory and depended on, there is a danger you are overlooking many variables explaining health–as you said–and omitted variable bias can occur, causing biased estimates and relationships.

One assumption of regression is that your model is theoretically the best model. You cannot and should not add or remove variables as you wish. That’s not how it works. If you end up with a lousy Rsquare value at the end, that just means that your model sucked in contrast to your theoretical support at the beginning. However, if you have something to explain at the end, you can order the value of the predictor variables, which was the actual purpose of your regression analysis. Which of my predictors is the best given that I included no more or less than all the relevant predictors in my model.

Respected Sir

In my multiple regression model the respective R square value is .92.can my model is significant or in significant and applicable in the social sciences research?

An R-square value of .92 represents a good fit and the model is fine.

unless it uses timeseries data. one can get R2 above 0.9 and the model could be wrong because of not-stationarity

what could be be done in a situation where an economic analysis is being done which include variables such as national expenditure ( dep. var) , debt and income. and produces a model with r squares below 0.2

why we always use R2 for comparison but not R?

what is the difference between model is fit & model is statistically significant?

R2 is the explained variance for the model you choose, and R is the correlation between IV and DV. If you use more than one IV’s then R means the overall correlation among variables and R2 is the exp. variance of IV’s. When R2 is high u need to understand that the model is fit but u must careful for that cause if the model is insignificant, high R2 is not useful.

Hi!

I’m currently facing a similar experience – a very low R square for my model. I’m basically testing my model for causal- prediction and am using PLS methods for analysis. I need to somehow justify my results with some literature on this issue (low r square), but I find it difficult to find articles (journals) about this. Can anyone help?

Thanks!

Err…I should say, if you feel the need to “somehow justify your (low R2) results” with “some” literature, you’re taking a misled approach to this whole “science” thing. Sometimes hypothesis aren’t confirmed by experiment. If that is what you inadvertently proved, it’s your duty to report it as such.

Amen

Good read, thanks!

I came across the same thing while doing economic research on capital gains tax for my thesis. I am not that experienced so it’s nice to see my thoughts reinforced by someone much more credible than myself.

Dear all,

I would like to add some complementary information about R2 and regression in general. First of all, I would recommand every researcher to explore the data with basic statistics and plots etc before undertaking a regression analysis and interpretating the results. Actually, it is quite rare to find linear relation in the nature (in social science as well) as the phenomena are most of the time very complex. Instead of trying to prove linear relations even with low R2value and/or low p-value, it might be interesting to think about non-linear relations (polynomial, exp, logistical, etc…). This could be done by plotting the data. Most of the software suggest alternative tools to the linear regression. I hope it helps!

What if even after plotting the data, you still don’t know what is going on? The analysis that I’m working on has R2=0.04, but the model fit has p-value<0.05 for either linear model or quadratic, cubic, exponential, logarithmic models. In that sense, I should pick the simplest one, right? But R2=0.04 can not imply linear relationship. So what kind of relationship do they have? And how do I find out? Thanks, Im glad I found this site and your reply!

Hi Huong,

Well, there may not be anything going on, or no discernible effects, anyway. Yes, start simple and see if you get an improvement in model fit with a more complicated model.

I would also suggest lots of graphing. Sometimes you can see the appropriate shape. Sometimes, not, though.

Hi Karen,

Do you you think it is lack of space rather than the residuals are not random? I am not sure, but this (small R2 values) may explain the conflicting findings of the various studies.

Coming back to explaining the past versus predicting (a critical difference) is where the value of R2 is important. An R2 of .04 may explain the past data in a statistical significant manner and may have some value in doing so, but its predictive ability is practically zero when wanting to extrapolate beyond the available data.

Have you seen a scatter plot for even an R2 of 0.7. The dispersion of the data around the regression equation is so large that has a tiny predicted value (the reason is the predictive confidence interval is so large as to be of no practical value). I can ensure from my experience any R2<0.5 has very little predictive value beyond describing the model data.

Sorry for getting so late in this discussion but I am interested on the R2 values in medical studies and specifically in those dealing with hypertension research. Does any one knows their size, as no study mentions it? Also in regression it is extremely difficult to make sure that the model residuals are random, does anyone knows if this is done, again no study reports anything about testing the residuals?

Hi Spyros,

Agreed. A low R-squared means the model is useless for prediction. If that is the point of the model, it’s no good.

I don’t know anything specifically about hypertension studies and typical R-square values. Anyone else want to comment?

And it’s a good point that most studies don’t mention assumption testing, which is too bad. I assume it’s because of space limitations in journals.

Karen

Great article – always nice when your own opinion is reinforced by someone who’s actually qualified in the area 🙂

Quick question – I have at times with multiple linear regression come across a significant independent association (beta value) of an independent var (that I’m interested in) with the dependent var. However – due to a small effect size, the model itself is not significant. My question is – can you report a significant independent association of 2 variables from a non-significant model?

The other thing to consider is that if the association between those 2 variables is the only thing you’re interested in (after controlling for other variables in the model), you could do a partial correlation. Correct me if Im wrong, but I believe this would give the same result as your multiple lin regression beta value (and the same P value), but you wouldn’t have a model R2 or p value to report. Am I missing something?

Hi Julian,

Thanks, glad it was helpful.

To answer your question, if I were in that situation–non significant model, but significant coefficient on a key predictor, I would dig into it more to understand what is going on in the data. Run correlations on the predictors, run the model with and without the key predictor, run a bunch of scatterplots, both of the raw variables and of residuals.

In many fields, I’ve seen it’s the norm to ignore the overall model F and just report coefficients. So can you report it? Yes. Should you? Hmm, maybe not.

Yes, the partial correlation gives you a measure of the association. What you lose there is not just those statistics, but the conceptual idea that one variable is an outcome to be predicted and the ability to come up with predicted values. So are you really trying to describe a relationship or model data?

Karen

Thanks – of course you would always do the necessary background with scatterplots and checking that the findings are not driven by an outlier, etc.

I guess I am talking about describing a relationship rather than modelling data. For example – you identify a significant correlation between 2 variables and would like to see if this is independent of a potential confounder. In this context, my impression is that a significant coefficient is still of interest (assuming a pre-specified analysis) even if the overall model is not significant. After all, its not your fault if what you thought was a confounder actually wasn’t, right?

If I might ask a follow-up question, I’ve read of various guidelines regarding how many predictor variables can be included in a model. eg: 1 per 10 or 1 per 15 subjects in a dataset for linear regression (I’m in clinical research). If a model is ‘over-fitted’ (eg: 10 predictor variables for a sample of 20), how would that affect model significance?

Dear Karen,

your tips are so useful, you are my virtual teacher in the hazardous world of data modeling.

Thanks!

It also depends on the type of model you run. In my experience time series models often get higher R2 than others. And if the dependent variable varies in magnitude a lot then the R2 will tend to be higher too. So conversely a poor model can quite happily get quite a respectable looking R2. It’s definitely not the whole story.

Good point, Tom.

It seems therefore that there is no hard rule to follow but it boils down to experience.

Thanks.

Hi Nico,

That’s true in almost all of statistics. Even hard rules like p<.05 indicating statistical significance aren't really hard.

So yes, experience always helps, especially in understanding your variables and research. But stopping and thinking about it helps at any level of experience.

Karen

The counterargument to this position is that if you believe that religiosity is only a small piece of the puzzle, your model should include a whole lot of things that you think are more important as controls, and check whether the broader model with religiosity included is a better model than the one with only the big predictors. Otherwise you could be misattributing another health predictor to religiosity (e.g., hereditary health is probably a big predictor, and it may well be that people with unhealthy parents are more likely to seek a religious community too).

A model that only *improves* by small amounts can still be useful (say going from .7 to .74), but a model that, in it’s entirety, only produces an R-sq of .04? I’d be worried that I haven’t even begun to properly model the relationship.

I agree (strongly) with the point about interpreting the result within the context in which the research is being conducted, though.

Hi Serje,

Yes, I see your point. I agree, it’s always ideal to have more of the variation explained. And for an outcome that is generally well understood for the population being studied, there is a higher expectation of being able to explain most of the variation. You’re absolutely correct that it would be better to model this hypothesis as an additional variation explained, and that not including the controls means you could be misattributing relationships.

However, there are some outcome variables (many in sociology, for example) for wide populations that just won’t ever be explained that much. So it’s not a matter of another variable that’s being left out of a model, but either so many competing variables each with a tiny effect that you can’t include them all or just randomness. (And I realize these are often the same thing).

Now it’s arguable that physical health isn’t one of those, and I concede that’s possible. But it’s possible that it is in certain populations. For example, you may be able to control for 70% of the variation in physical health in a clinical population, but not in a national population.

This is also true in more exploratory situations. If an outcome is a new construct that isn’t well known, it’s likely that data won’t have been collected on every possible control. In this case, it’s very possible that an effect of something like religiosity will later be explained away in another study. But that’s interesting–this effect we thought we had? Turns out it’s explained by X. If we never report the first small effect because we’re waiting for a model that explains everything, we may never know what needs to be built into the model.

Again, it’s the context.

Is there a way to quantify the ‘context’ in which one has to interpret R2?

Hi Nico,

I’m not exactly sure what you mean by quantifying the context, but I would think the answer is ‘no.’ It’s really about stopping and thinking about what information you really have.

Karen