Detecting Multicollinearity - What's The R^2?

AndyPettitteIsGreat · June 2, 2010, 2:22am

So a high R^2 is a sign of multicollinearity. What is the cutoff for this? Is 0.69 not multicollinearity while 0.70 is??

cpk123 · June 2, 2010, 2:27am

I thought High F, with low T-stats on the regression coefficients was multicoll.

asg · June 2, 2010, 2:28am

I think a high R^2 in combination with insignificant coefficients for the independent variables is a red flag for multicollinearity, but doesn’t necessarily guarantee it. . . of course, I’m not positive about this, but I think I remember reading this.

asg · June 2, 2010, 2:30am

nm, i think cpk is correct.

janakisri · June 2, 2010, 2:33am

CFAI Mock afternoon had exactly this question, and I used CPK’s logic to answer it. I didn’t even RTFQ. Of course I got it wrong since there was no F stat given Instead the fine print gave the R^2 between the TWO independent variables as 0.3 . The answer claimed that this was low , hence no multi-collinearity

AndyPettitteIsGreat · June 2, 2010, 2:33am

Right Right, but I saw a question where T-Stats were low and R^2 was 81%. I thought that over like 90% was high? What’s the cutoff?

justinkc · June 2, 2010, 2:33am

R^2 is just How well the model as whole explains the variation in the dependent variable

cpk123 · June 2, 2010, 2:36am

what I was talking about was the F-stat for the multiple regression as a whole, while you are mentioning R^2 between two independent variables. R^2 between 2 independent variables being high - would indicate high correlation between the variables, and if both are present on the same equation - highly correlated variables cause multicollinearity. Here an independent linear regression in 2 variables is being performed between the indep. variables.

janakisri · June 2, 2010, 2:38am

But in a single variable regression , R^2 can detect strong correlation . Then , if you use the dep. and independent variables to explain a third dependent variable, you should claim multi-collinarity straightaway. Because of the strong correlation in the first equation. Kind of basic stuff

justinkc · June 2, 2010, 2:45am

I don’t think that is how R^2 is used. R^2 in a multiple regression is rss/sst It explains the variation in a model. Its uses similar inputs as the F test but is different. Multi-c is only tested with a significant F and insignificant T test.

janakisri · June 2, 2010, 2:49am

Try q. 42 in CFAI Mock 2010 Afternoon . You’ll know what I mean. Here 's the question: Because there are only two independent variables in her regression, Hamilton’s most appropriate conclusion is that multicollinearity is least likely a problem, based on the observation that the: A. model R2 is relatively low. B. correlation between S&P500 and SPREAD is low. C. model F-value is high and the p-values for S&P500 and SPREAD are low. Guess the answer?

cpk123 · June 2, 2010, 2:57am

definitive answer from the book: High R^2 and significant F-statistic even though the t–statistics on the slope coefficients are themselves not significant (indicating inflated std. errors for the slope coeffs). Also they specifically also write - magnitude of pairwise correlations between independent variables has occasionally been suggested to assess multicoll. but is generally not adequate. It is not necessary that the pairwise correlations be high for there to be a multicoll. problem. so based on the above B) is eliminated. C) F-value is high, but p-values are low - which means that the S&P500 and SPREAD are significant variables - so that too is out. has to be A) by elimination.

justinkc · June 2, 2010, 2:57am

i looked it up, the correlation is .3 the r^2 is .40 the answer B They don’t mention r^2 at all in the explanation. I don’t see a relation in the answer.

Paraguay · June 2, 2010, 2:58am

Should be B. Def of multicollinearity is that variables are highly correlated. R^2 that is high and an adjusted R^2 that is extremely low (very rare).

janakisri · June 2, 2010, 3:00am

Right answer is : B. correlation between S&P500 and SPREAD is low. The fine print gives the correlation between SPREAD and S&P to be 0.3, which supposedly is low.

Paraguay · June 2, 2010, 3:00am

If there are only 2 independent variables pairwise correlation is all you need. When adding variables past two you need to look for significant f, non-significant t-values.

cpk123 · June 2, 2010, 3:02am

book specifically mentions in the section “detecting multicoll.” that going with correlations is not the recommended approach. And I have not done the mock. So this is a -1 for me, definitely… .

Paraguay · June 2, 2010, 3:03am

Book mentions pairwise with multiple variables, read fine print on only two independent :).

justinkc · June 2, 2010, 3:04am

I believe the pairwise correlation needs to be .7 and up to be considered high also

justinkc · June 2, 2010, 3:20am

i just looked over it, R^2 in a linear regression with one independent variable is the coefficient determination. Correlation is the Square root of that, in a single linear regression model.