Sign up  |  Log in

pairwise correlations- this is really tripping me up

This is really tripping me up. A question on the Paul Charlent Case Scenario in practice questions. Ill summarize whats happening here. we are basically using some variables to explain the variation in set index returns. 

1. 1st regression is 

ln(1 + SET) = α + β × ln(1 + Libor) + ε

2. 2nd regression is 

ln(1 + SET) = α + β × ln(1 + Libor) + β2 × ln(1 + Fed Funds) + β3 × £ + ε

 3. as you can see they have added 2 extra variables for fed funds and exchange rate. 

1st regression is f-stat is 2.355

intercept is significant and libor is not. 

r2= 0.0263

4. 2nd regression f stat is = 12.572
r2= 0.3072

2 variables are significant. fed funds and exchange rate. 

pairwise correlation between fed funds and libor is 0.9814 and fed funds and exchange rate is 0.6798. 

Geoffrey Small, a colleague of Charlent, comments on the results of the two regressions. Small states that the highly significant F-statistic of the second regression along with the increased R2 of the second regression means that the addition of the Fed funds rate and the $/£ exchange rate to the analysis provides more reliable estimates of linear associations than the first regression.

the q asks

Regarding Geoffrey Small’s statement about the second regression, which of the following is most accurate?

  1. It is true that the second regression has substantially greater explanatory power than the first regression.
  2. The second regression displays multicollinearity.
  3. The F-statistic of the second regression is likely underestimated.

    answer is 2. 

    this is really confusing me for the following reasons. 

    1. symptom of multicollinearity is very high f stat and very high r2 with none of the independent variables being significant. but here 2 independent variables are significant and r2 is not that higih. 
    2. the pairwise correlations are really high agreed but the cfa book specifically says “high pairwise correlations among the independent variables are not a necessary condition for multicollinearity, and low pairwise correlations do not mean that multicollinearity is not a problem. The only case in which correlation between independent variables may be a reasonable indicator of multicollinearity occurs in a regression with exactly two independent variables” 

    so why is it that multicollinearity is a problem here. the justification was that pairwise correlation is hgih but given the above statement this is really contradicting what has been said.

Top Instructors, Best-in-Class Content, Partner Until You Pass Guarantee

akrushn2 wrote:

This is really tripping me up. A question on the Paul Charlent Case Scenario in practice questions. Ill summarize whats happening here. we are basically using some variables to explain the variation in set index returns. 

1. 1st regression is 

ln(1 + SET) = α + β × ln(1 + Libor) + ε

2. 2nd regression is 

ln(1 + SET) = α + β × ln(1 + Libor) + β2 × ln(1 + Fed Funds) + β3 × £ + ε

 3. as you can see they have added 2 extra variables for fed funds and exchange rate. 

1st regression is f-stat is 2.355

intercept is significant and libor is not. 

r2= 0.0263

4. 2nd regression f stat is = 12.572
r2= 0.3072

2 variables are significant. fed funds and exchange rate. 

pairwise correlation between fed funds and libor is 0.9814 and fed funds and exchange rate is 0.6798. 

Geoffrey Small, a colleague of Charlent, comments on the results of the two regressions. Small states that the highly significant F-statistic of the second regression along with the increased R2 of the second regression means that the addition of the Fed funds rate and the $/£ exchange rate to the analysis provides more reliable estimates of linear associations than the first regression.

the q asks

 Regarding Geoffrey Small’s statement about the second regression, which of the following is most accurate?

  1. It is true that the second regression has substantially greater explanatory power than the first regression.
  2. The second regression displays multicollinearity.
  3. The F-statistic of the second regression is likely underestimated.

    answer is 2. 

    this is really confusing me for the following reasons. 

    1. symptom of multicollinearity is very high f stat and very high r2 with none of the independent variables being significant. but here 2 independent variables are significant and r2 is not that higih. 
    2. the pairwise correlations are really high agreed but the cfa book specifically says “high pairwise correlations among the independent variables are not a necessary condition for multicollinearity, and low pairwise correlations do not mean that multicollinearity is not a problem. The only case in which correlation between independent variables may be a reasonable indicator of multicollinearity occurs in a regression with exactly two independent variables” 

    so why is it that multicollinearity is a problem here. the justification was that pairwise correlation is hgih but given the above statement this is really contradicting what has been said. 

can post more in a bit, but CFAI removed this question set a few years ago when I discussed the issues with it. It seems they reinstated it with a variation and it’s still incorrect. Plain and simple, 1 is a correct statement as well. Google “regression explanatory power” you will see that R-squared is how explanatory power is assessed. The literal interpretation of R-squared is the percentage of sample variation in the DV explained by the model using X1, X2,….

1) https://en.wikipedia.org/wiki/Coefficient_of_determination

2) https://stats.stackexchange.com/questions/93933/explanatory-power-of-variables-in-multiple-regression

3) http://blog.minitab.com/blog/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

…. I can do this for days and so can you. The CFAI could also open a statistics textbook or journal article that discusses explanatory power… On any of those pages, control F for “explanatory” and it’ll take you to the relevant parts if you don’t want to read it all.

They argued that “explanatory power” refers to interpreting coefficients which is hard under multicollinearity—this is true but explanatory power is a word typically reserved for interpreting r-squared.

The answer they initially flagged as correct was 3, and they tried to claim multicollinearity causes overestimated f statistic. So they still must believe that’s correct because the new choice says underestimated (meaning they think overestimated is either too confusing, or correct, so they changed it to underestimated). They really demonstrated a poor understanding through the emails we had, and they demonstrated extreme unwillingness to admit fault until they escalated it to the curriculum author who then agreed with what I had said. (I believe there is at least one thread here where I discussed this.)

Again, the CFAI is not the place to learn statistics. Words in statistics have particular meaning that may be vastly different than use in common English. 

Multicollinearity is a term used to refer to a problematic level of interrelation between a group of x variables, so yes, it is generally incorrect to say that pairwise correlations alone are sufficient. You also need to assess the stability of the beta estimates (do the fluctuate vastly if one or more of the collinear variables is removed? is the sign of the relationship different than theory or expected from prior knowledge?).

so just to confirm here does the above display multicolinearity or not? from the information given in the problem set I would say it does not. am i correct in justifying the reasons for why there is no multicollinearity. i.e. I posted

“you cannot claim there is multicollinearity if at least one of the independent variables is statistically significant”

is this a correct thing to say. it seems to be correct based on another problem which used this as a justification for why there is no multi-collinearity. i.e. the Jordan Garfield Case Scenario q 30 of 36 which asks “In preparing his response to Samora’s second question, Garfield’s most appropriate conclusion is that the model:

  1. has multicollinearity but not serial correlation.
  2. has serial correlation but not multicollinearity.
  3. does not have either multicollinearity or serial correlation.” answer being 1

It may be correct that multicollinearity is present (again, a PROBLEMATIC degree of interrelation), but it depends on other information that’s not provided. I think the question isn’t a good question.

I don’t have access to cases anymore, so I can’t compare the two.

For the excerpt from the text, they’re saying that MC can still be an issue even with low pairwise correlations. Only in the case of 2 independent variables can a pairwise correlation be used to get some kind of idea, however MC may not be problematic as it depends on other things I discussed.

On the exam, use their train of thought. Explanatory power is impaired with multicollinearity (disgustingly untrue, according to people with degrees in statistics), and multicollinearity is present if any pair of x-variables has a large magnitude pearson correlation.

akrushn2 wrote:

 I posted”you cannot claim there is multicollinearity if at least one of the independent variables is statistically significant” is this a correct thing to say

No, it is not. There are other impacts of multicollinearity as I mentioned earlier. Multicollinearity (as a problem) is a constellation of things, not just artificially small t-stats.

In a strict technical sense of the word, however, multicollinearity is defined by any relationship among a group of predictors (strong or weak). Practically, it’s used in the sense of a potentially problematic issue.