# pairwise correlations- this is really tripping me up

This is really tripping me up. A question on the Paul Charlent Case Scenario in practice questions. Ill summarize whats happening here. we are basically using some variables to explain the variation in set index returns.

1. 1st regression is

ln(1 + SET) = α + β × ln(1 + Libor) + ε

2. 2nd regression is

ln(1 + SET) = α + β × ln(1 + Libor) + β_{2} × ln(1 + Fed Funds) + β_{3} × £ + ε

3. as you can see they have added 2 extra variables for fed funds and exchange rate.

1st regression is f-stat is 2.355

intercept is significant and libor is not.

r^{2}= 0.0263

4. 2nd regression f stat is = 12.572

r^{2}= 0.3072

2 variables are significant. fed funds and exchange rate.

pairwise correlation between fed funds and libor is 0.9814 and fed funds and exchange rate is 0.6798.

Geoffrey Small, a colleague of Charlent, comments on the results of the two regressions. Small states that the highly significant *F*-statistic of the second regression along with the increased *R*^{2} of the second regression means that the addition of the Fed funds rate and the $/£ exchange rate to the analysis provides more reliable estimates of linear associations than the first regression.

the q asks

Regarding Geoffrey Small’s statement about the second regression, which of the following is most accurate?

- It is true that the second regression has substantially greater explanatory power than the first regression.
- The second regression displays multicollinearity.
- The
*F*-statistic of the second regression is likely underestimated.answer is 2.

this is really confusing me for the following reasons.

1. symptom of multicollinearity is very high f stat and very high r

^{2}with none of the independent variables being significant. but here 2 independent variables are significant and r^{2}is not that higih.

2. the pairwise correlations are really high agreed but the cfa book specifically says “high pairwise correlations among the independent variables are not a necessary condition for multicollinearity, and low pairwise correlations do not mean that multicollinearity is not a problem. The only case in which correlation between independent variables may be a reasonable indicator of multicollinearity occurs in a regression with exactly two independent variables”so why is it that multicollinearity is a problem here. the justification was that pairwise correlation is hgih but given the above statement this is really contradicting what has been said.

_{1}, X2,….1) https://en.wikipedia.org/wiki/Coefficient_of_determination

2) https://stats.stackexchange.com/questions/93933/explanatory-power-of-variables-in-multiple-regression

3) http://blog.minitab.com/blog/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

…. I can do this for days and so can you. The CFAI could also open a statistics textbook or journal article that discusses explanatory power… On any of those pages, control F for “explanatory” and it’ll take you to the relevant parts if you don’t want to read it all.

They argued that “explanatory power” refers to interpreting coefficients which is hard under multicollinearity—this is true but explanatory power is a word typically reserved for interpreting r-squared.

The answer they initially flagged as correct was 3, and they tried to claim multicollinearity causes overestimated f statistic. So they still must believe that’s correct because the new choice says underestimated (meaning they think overestimated is either too confusing, or correct, so they changed it to underestimated). They really demonstrated a poor understanding through the emails we had, and they demonstrated

extremeunwillingness to admit fault until they escalated it to the curriculum author who then agreed with what I had said. (I believe there is at least one thread here where I discussed this.)Again, the CFAI is not the place to learn statistics. Words in statistics have particular meaning that may be vastly different than use in common English.

Multicollinearity is a term used to refer to a problematic level of interrelation between a group of x variables, so yes, it is generally incorrect to say that pairwise correlations alone are sufficient. You also need to assess the stability of the beta estimates (do the fluctuate vastly if one or more of the collinear variables is removed? is the sign of the relationship different than theory or expected from prior knowledge?).

so just to confirm here does the above display multicolinearity or not? from the information given in the problem set I would say it does not. am i correct in justifying the reasons for why there is no multicollinearity. i.e. I posted

“you cannot claim there is multicollinearity if at least one of the independent variables is statistically significant”

is this a correct thing to say. it seems to be correct based on another problem which used this as a justification for why there is no multi-collinearity. i.e. the Jordan Garfield Case Scenario q 30 of 36 which asks “In preparing his response to Samora’s second question, Garfield’s

mostappropriate conclusion is that the model:It may be correct that multicollinearity is present (again, a PROBLEMATIC degree of interrelation), but it depends on other information that’s not provided. I think the question isn’t a good question.

I don’t have access to cases anymore, so I can’t compare the two.

For the excerpt from the text, they’re saying that MC can still be an issue even with low pairwise correlations. Only in the case of 2 independent variables can a pairwise correlation be used to get some kind of idea, however MC may not be problematic as it depends on other things I discussed.

On the exam, use their train of thought. Explanatory power is impaired with multicollinearity (disgustingly untrue, according to people with degrees in statistics), and multicollinearity is present if any pair of x-variables has a large magnitude pearson correlation.

In a strict technical sense of the word, however, multicollinearity is defined by any relationship among a group of predictors (strong or weak). Practically, it’s used in the sense of a potentially problematic issue.