Multicollinearity

Even though multicollinearity does not affect the consistency of slope coefficients, such coefficients themselves tend to be unreliable. Additionally, the standard errors of the slop coefficients are artificially inflated. Hence there is a greater probability that we will incorrectly conclude that a variable is not statistically significant.

How does this make the standard error greater? I am trying to picture this but it is not working.

Thanks

Furthermore, can someone explain (in a step by step manner) why this occurs:

The most common way to detect multicollinearity is the situation where t-tests indicate that none of the individual coefficients is significantly different than zero, while the F tests is statistically significant and the R^2 is high.

First, the coefficients are unreliable for practical interpretations-- but the model will still have predictive power (i.e. the signs of the estimates may be different than expected, but the model’s predictive power will not necessarily be hindered from multicollinearity).

In multiple regression, there are diagnostic statistics called variance inflation factors, VIFs. These are calculated as : 1/(1-Rsquaredi) where the Rsquaredi is from the regression of Xi on all other independent variables. Therefore, if Xi is highly correlated with the remaining independent variables, Rsquaredi will be high and the VIF will be large. The square root of the VIF is how many times LARGER the standard error is due to the multicollinearity. Short of going through a derivation of the least squares estimates, this is the easiest explanation I can use to sort of show how the standard errors are inflated due to multicollinearity.

Now for your second post:

If the standard errors are inflated on the t-test (to a high degree) it is likely that the coefficients will all show non-significance based on how the t-statistic is calculated. However, the F-statistic (test for joint significance) is essentially comparing the complete model at hand to a model where only the intercept (y-bar) is used for prediction. It basically tells us how much better our model does compared to the average as the model (after accounting for degrees of freedom). If it is large enough (p-value less than alpha level), at least ONE of the terms in the model is statistically different than zero (this is the alternative hypothesis also shown as C1=C2=…=Ci=0 where Ci is the ith coefficient).

So now, if the F-test says at least one variable is statistically useful, but all t-tests say nothing is statistically useful, we have contradicting results. Given what we know about inflated variances and therefore, standard errors, we can say that multicollinearity is likely the cause. Also, the adjusted R-square sort of gets at the same info. If it is high, the model can explain a large proportion of the sample variation in the DV, which would contradict nonsignificant t-tests.

Hope this helps!

Going back to this:

In multiple regression, there are diagnostic statistics called variance inflation factors, VIFs. These are calculated as : 1/(1-Rsquaredi) where the Rsquaredi is from the regression of Xi on all other independent variables. Therefore, if Xi is highly correlated with the remaining independent variables, Rsquaredi will be high and the VIF will be large. The square root of the VIF is how many times LARGER the standard error is due to the multicollinearity. Short of going through a derivation of the least squares estimates, this is the easiest explanation I can use to sort of show how the standard errors are inflated due to multicollinearity.

Shouldn’t the standard error be smaller (as a proportion to total variation) if R^2 is high? So if independent variables are highly correlated, a movement of a single variable will also move the other variables… and if all of them are predictors of the dependent variable as well, then the R^2 will increase (am I making sense here?). Now if the R^2 increases, the total variability stays the same (correct or no?) and this should increase the proportion of R^2 in the total variability, and thus the Stanard Error of Estimate (SEE) should be smaller… Any clarity on this would be great.

I see the confusion. When I said standard error, I was referring to the standard error on a regression coefficient. I will say standard error from now on when referring to one for a regression coefficient. When I speak of the SEE, as you called it, I will say model standard deviation (_ square root of the _ sum of squared residuals divided by degrees of freedom).

First, you are correct in the idea that if the model standard deviation declines (SEE), the R-square value will increase. I would caution, though, saying that correlated x-variables cause a movement in eachother. Think more that they move similarly, but also that they contribute similar information to the regression.

Now, if you go back to the paragraph about the VIF, this link (page 19) look at the formula. Sorry, my picture wouldn’t work but google did have some PDFs. http://fmwww.bc.edu/EC-C/F2012/228/EC228.F2012.nn04.pdf

Sigma squared is (SEE-squared) the model _ variance _ (sorry had a typo, just edited). We will consider this fixed for this explanation. Thinking logically about estimation, we know that we can (usually) obtain more accurate estimates by increasing the size of a random sample (increase sampling variation). In this case, if we want to more accurately estimate the variance of beta j we can increase SSTj (sum of squared deviations of jth independent variable about its mean) by taking a larger random sample. Let’s assume our sample is fixed and that xj is UNCORRELATED with any of the other regressors, i.e., R-square j is zero. Recall R-square j is xj regressed on all other independent variables with an intercept. Plugging this into the equation would give us some value, Var(Bj-hat). However, if xj is CORRELATED with some of the other independent variables, R-squarej will be different from zero, 0.5, for this example (arbitrary number). Given that SSTj is not all UNIQUE sampling variation, we must look at how much UNIQUE sampling variation we have from Xj to estimate Var(Bj-hat). If R-squarej is 0.5, we actually have half as much unique sampling variation in Xj as we thought (plug into formula to see). This means we overestimate Var(Bj-hat) by a factor of 2 (VIF formula). Finally, the standard error on a regression coefficient is the square root of the variance for the coefficient. S.E. (Bj-hat) = [Var(Bj-hat)]^0.5

Let me know if this helps, or if I muddied the water. Im hoping the former!

I apologize for the typos in here (I think all are fixed now), when I wrote this my mind was still groggy in the AM.

It’s amazing how many small things slip your mind early on a Sunday.

and you are only a level 1 candidate… are you a quant?

Here is a very simple question:How can the standard error increase if the independent variables are correlated? If they are correllated (say +1), then a movement of one will increase the others in the same directions, and as that number increase, so with the independent variables… this shouldn’t increase standard error since they are moving together, but in this case, it says that standard error increeases and the t statistic is small. Thoughts?

I’ll take the quant question as a compliment. I have had some exposure to statistics, though (very helpful). Keep in mind, we are not talking about the standard error of the estimate (model standard deviation), but the standard error for a regression coefficient that relates an x-variable to the dv.

I apologize that my previous post was a little long-winded in trying to explain this. If you have the time and haven’t done so already, I thought the PDF (pg 19 and the few following) was pretty concise to read.

So, it isn’t necessarily the fact that they are moving together that causes the standard error on the coefficient, SE(Bi-hat) to increase. In the case you gave, (perfect correlation or R-square of 1 between xi and all other independent vars), the model would not be estimable (matrix used would be singular). Remember, an assumption for regression is no perfect collinearity of independent variables.

Try to focus on the idea that redundant information from x-variables is the cause of this issue (correlated x-variables means redundant info into the model). But, let’s say that xi is highly correlated to the other x-variables (R-square is .9). This is saying that about 90% of the sampling variation in xi can be explained by the other independent variables. In a way, only 10% of the sampling variation in xi is unique to xi and not explained (not shared with/not contributed) by the other x-variables. Recall, that unique (non-redundant) sampling variation in xi will help us increase the accuracy when estimating Var(Bi) (we start with the variance to get to the standard error).

So, when we calculate Var(Bi-hat) and then the standard error of Bi-hat, we must look at only the unique (non-redundant) sampling variation of xi, which is only 10% of the total sampling variation in xi (start with total sampling variation for xi and take out the 90% redundant info, leaving a number 10% of the original). Looking at the formula for Var(Bi-hat), we can see the denominator is decreasing as the correlation between xi and the other x-variables increases, making Var(Bi-hat) and SE(Bi-hat) increase.

Based on how the t-statistic is calculated, ([Bi-hat]-0)/[Se(Bi-hat)], we can see that an inflated standard error of Bi-hat will decrease the t-statistic.

Please, do let me know if I’m not helping so I don’t flood your mind with something that won’t help (I wouldn’t take offense).

hmm… these answers make sense but it confuses me a bit.

Its simple… Here is a formula:

E® = Rf + C1 (Variable X) + C2 (Variable Y) + C3 ( Variable Z) + e

So in multicollinearity… Variable X , Y and Z are correlated correct? So if this happens, lets say that X goes up 100% and since the other variables are positively correlated to X, they all move up, so this causes E® to move up… but if this happens a lot, then it will continuously move up uniformly, thus the standard error should be small (similar to autocorrelation, but this is in the variables). Sp, why is the standard error greater when they are all moving together (if correlatd)? That I don’t know. E stays the same in this case… It would make sense if say th variables were negatively correlated, but I dont get it when it is positive.

If you can answer this, it would be great. Please use the previous formula as a based so I can visually understand.

I think that CFAI will not require you to know why, but I can’t be sure since I’m not studying that material yet. However, it is good to tackle these topics for personal understanding.

Aside from my bold text above, I am not sure how else I can explain it. I do apologize if I couldn’t be of more help.

WTF!

You got it! wink