R square

sri hari
Nerd For Tech
Published in
4 min readMay 21, 2021

--

we all know about correlation (R), correlation values close to 1 or -1 are good and tell you two quantitative variables are strongly related. Whereas variables with correlation 0 are not correlated.

What is r square and why should we go to r square?

r square is very similar to R but interpretation can be made using r square as follows,

  • R = 0.7 is not twice as good as R = 0.5
  • but R square =0.7 is 1.4 times as good as R square=0.5

Lets consider a example, here we are plotting person Id in x axis and weight in y axis.

person ID vs weight (mean line)

In above image I calculated the mean or average of the weights and plotted it as a line that explains the graph.

calculate the variance around that mean.

Now lets use height instead of ID in x axis.

height vs weight (mean line)

Now the plots get reordered, but mean and variance of this plot and previous are same. The distances between the dots and the line are not changed even after reordering.

Question: Is this the better way to predict mouse weight using the line generated by mean?

Answer: No, we should fit a line to the data(regression)

height vs weight (mean and linear line)

Now we can predict the weight with given height using linear line.

Question: Does the linear line fit the data better than the mean line? If so how much better?

Answer: By just looking the above image, it looks like linear line fits the data better than the mean line. we can quantify that difference using r square.

r square

Var(mean line) — Variation around mean line that we already calculated that in first image.

Var(linear line ) —Variation around linear line that we can calculate in similar way of calculating Var(mean line)

The formula above makes r square range from 0 to 1.

Variation around the linear line will never be greater than the variation around the mean line and will never be less than 0.

Lets consider a example and try to interpret the r square value,

example

Interpretation:

There is 81% less variance around the linear line than the mean.

or

The height /weight relationship accounts for 81% of the variation.

This means most of the variation in the data is explained by the height / weight relationship.

Now if r square value is 0.9 means, we can interpret, “good! the relationship of this two variables explains 90% of the variation in data. :-)”

now consider if r square is 0.01, then you can say, “ohhh! it only accounts for 1% variation in data! :-( something else must explain remaining 99%.”

Relating R and R square:

R square is just the square of R.

If someone says “R value is 0.9”,

You can say “good! the relationship of this two variables explains 81% (if R is 0.9 then r square will be 0.81) of the variation in data. :-)”

Finally,

R square doesn’t give direction, so if a R square is given, you can interpret that “Either two variables are positively correlated with R square =… or negatively correlated with R square =… but we can’t say the direction(positive or negative)”

“R square is the percentage of variation explained by the relationship between two variables. If someone gives you the plain R value, just square it”

Thankyou :-)

--

--

sri hari
Nerd For Tech

Student from Coimbatore Institute of Technology, R and D engineer trainee