Coefficient of Determination vs Correlation Coefficient

Did a search for Multiple R and R squared, but still having a little trouble understanding the two.

Correlation coefficient explains the relationship between the actual values of two variables (independent and dependent).

While Coefficient of determination explains how well the independent variable explains the dependent variable in a regression?

Does that sound correct?

yes, that’s correct. r and r^2 respectively.

However, in the 2012 CFAI EOC, reading 12 question #20. It states that the correlation between the predicted and actual values of the depent variable is the square root of the R-squared.

That sounds like to me like the variation described by the model since you are comparing predicted and actual values. I thought correlation coefficient looked at the relationship between ACTUAL dependent and independent variables.

You are correct, they are just two correlations between different items. Correlation coefficient is a measure of how the independent and dependent variables move together.

Coefficient of determination can be thought of as “what is the correlation between the predicted values and actual values” A high R-squared, for the most part, is a good thing. It means the model, more or less, closely resembles the data. Since the model is a linear representation, and correlation is a test of linearity, they sort of go hand in hand. Going through the math is too time consuming, but thats the general idea

Yes I do understand that. I’m more interested in the explaination provided by CFAI for the question I described above.

What part of it is confusing you?

Given Multiple R-Squared of 0.36 and the question “The most appropriate intrepretation of the multi R-squared for Hansen’s model is that:”

The answer is C. “correlation between predicted and actual values of the dependent variable is 0.60” And the explaination is that the correlation between the predicted and actual values of the dependent variable is the square root of the R-Squared, or the square root of 0.36.

Doesn’t the predicted value come from the model so wouldn’t the correlation betwen the predicted and actual values of the dependent variable be the linear model or multiple r-squared. I didn’t think predicted values came into play in the Multiple R calculation, which I thought measured the relationship between the independent and dependent variable.

Might just be overthinking something trivial. Thanks!

I think you’re on the right track - for simple regression they are essentially the same thing but correlation can’t be used as easily to find R squared in multiple regression.

for simple regression you can calc r squared two ways: 1.) the R squared is simply the square of the correlation between the two variables. This sort of makes sense, if you think about it. The only thing we know about two variables and their movements is their correlation, and simple regression does the same thing. Therefore, the correlation between predicted values of Y and actual values of Y is dependent on the independent variable. and 2.) the ratio of explained to total variance. We can calculate the correlation between the two variables, and t he square root of the r squared, and they should be the same.

In multiple regression, only the second method is accurate for determining R2. This makes sense, because correlation is only between two variables or sets of data. You can’t have a correlation among 4 different independent variables and the dependent, so we dont have a single correlation measure among all variables (ie, there is no way to take the correlation measure and square it to get R squared).

Finding R squared with mult regression is done by taking the total explained variation (ie, the variance of the predicted minus actual variation) divided by total variance (actual minus average). This gives us a measure of overall “fit” - if we take the square root of that, we get the correlation between the predicted and the actual.

Hopefully that sheds some light

Given R2 and adjusted-R2, which one should we take to calculate correlation? Is there such a thing?

Calvinclk: you use adjusted R2 when you add an additional independent variable to the regression… But as long as there are more than 1 indep variables in the regression, R2 cannot be used to calculate correlation. Correlation is computed only between two variables.

Adjusted r squared is a better measure because it accounts for the fact that you are adding more and more explanatory variables… ie you could add 100 variables to a model to try and raise the r squared (and you will), but it doesnt necessarily mean the results are better. Adjusted r squared corrects for this.

Calvin: Per the original posters question, the square root of the r squared would give you the correlation between the predicted value and the actual value for the dependent variable (aka Y) - which in simple linear regression is also the correlation between the two variables.

If they ask this, they’re really reaching into the curriculum