Multcollinearity and SEs

keep_running · January 3, 2017, 4:56am

In the readings, they explain how multi collinear indepedent variables will causing standard errors to increase, while the F-stat value would also increase as well.

Do you know how this would happen? Can anyone explain to me why this happens (on a intuitive scale).

Thanks!

tickersu · January 3, 2017, 5:12pm

The F-statistic is not increased or decreased purely due to multicollinearity. I actually had this email discussion with the Institute a couple years ago. They vehemently disagreed at first, until they escalated it to the curriculum author, who then said he agreed with my position (that multicollinearity does not over or understate the F-statistic or R-squared). Assuming this is the same issue, they are incorrect. I’ll send you a PM.

The coefficient SEs can be inflated because they are multiplied by a factor of sqrt[1/(1-Rsquaredi)] where R-squaredi is the R-squared from a regression where Xi is the dependent variable and all other X variables are independent variables:

Xi = b0 + baXa +…+bkXk

So a high R-squared for this regression would mean a strong relationship between Xi and the group of other independent variables (possibly indicating multicollinearity in the original regression of Y with Xa,…Xi,Xj, Xk…). If this R-squared is large, the denominator of sqrt[1/(1-rsquaredi)] becomes small and the overall value of that factor increases (inflating the variance and SE of coefficient i). Note that the coefficient variances are inflated by a factor of 1/(r-squaredi) which is known as the VIF for coefficient i. Also notice that this inflation only occurs for those variables that are related to other independent variables in the main regression (in other words, X2 has an unaffted SE/variance if X2 is unrelated to the other Xs).

Hope this helps!

keep_running · January 10, 2017, 2:42am

tickersu:

keep\_running:

In the readings, they explain how multi collinear indepedent variables will causing standard errors to increase, while the F-stat value would also increase as well.

Do you know how this would happen? Can anyone explain to me why this happens (on a intuitive scale).

Thanks!

The F-statistic is not increased or decreased purely due to multicollinearity. I actually had this email discussion with the Institute a couple years ago. They vehemently disagreed at first, until they escalated it to the curriculum author, who then said he agreed with my position (that multicollinearity does not over or understate the F-statistic or R-squared). Assuming this is the same issue, they are incorrect. I’ll send you a PM.

The coefficient SEs can be inflated because they are multiplied by a factor of sqrt[1/(1-Rsquaredi)] where R-squaredi is the R-squared from a regression where Xi is the dependent variable and all other X variables are independent variables:

Xi = b0 + baXa +…+bkXk

So a high R-squared for this regression would mean a strong relationship between Xi and the group of other independent variables (possibly indicating multicollinearity in the original regression of Y with Xa,…Xi,Xj, Xk…). If this R-squared is large, the denominator of sqrt[1/(1-rsquaredi)] becomes small and the overall value of that factor increases (inflating the variance and SE of coefficient i). Note that the coefficient variances are inflated by a factor of 1/(r-squaredi) which is known as the VIF for coefficient i. Also notice that this inflation only occurs for those variables that are related to other independent variables in the main regression (in other words, X2 has an unaffted SE/variance if X2 is unrelated to the other Xs).

Hope this helps!

What do you exactly mean here? I am still confused. I did not read anywhere in the curriculum about multiplying the coefficients by that residual factor…

tickersu · January 10, 2017, 3:42am

Unfortunately, the book does a pretty poor job at covering statistics (I’ve said this before, and multicollinearity is one of their really weak points). Also, I wouldn’t really call it a “residual” factor (as it already has an unambiguous name “Variance Inflation Factor”). It tells you how much the variance on coefficient i is inflated (and it’s standard error is inflated by the square root of the VIF).

I sent you a PM when I made the first post to try and further clarify what the text says. I’ll repost some of what I wrote before and try to explain it further.

The F-statistic (and R-squared) is not increased or decreased purely due to multicollinearity. This is because OLS is unbiased in the presence of multicollinearity (i.e. model fit is unaffected). Anyone telling you otherwise is mistaken.

The coefficient SEs can be inflated because they are multiplied by a factor of sqrt[1/(1-Rsquaredi)] where R-squaredi is the R-squared from a regression where Xi is the dependent variable and all other X variables are independent variables.

If you are trying to predict Y with X1, X2, and X3, you can assess MC by running a regression of X1 (as a dependent variable) and X2 and X3 as predictor variables.

X1= b0 + b2X2 + b3X3

Get the R-squared value for this regression. This value will essentially tell you how much variation in X1 is shared by X2 and X3 together. Subtracting this value from 1 will give you variation in X1 that is unique, and not shared by X2, X3. The more unique information, the less the coefficient variance (and standard error) will be inflated in the original regression. To quantify the inflation, take the reciprocal of (1-r-squared) from the regression we just did. This is a VIF (not mentioned in the curriculum, because the curriculum gives insufficient coverage of the topic).

If the VIF for X1 is 5, for example, then the variance of coefficient b1 will be 5 times as large as it would be if X2 and X3 were not related at all to X1 in the regression of Y with X1, X2, X3 (and the standard error of b1 will be sqrt(5) times larger).

You can (and the computer software does) do this for X2 with X1 and X3 as predictors as well as a final regression of X3 with X1 and X2 as predictors.

Let me know if it’s still unclear, and I can probably find another resource. It’s partially because I’m being lazy, and partially because it’s a pain to explain using only text.