Multicollinearity

I understand this – sort of the only thing i don’t get is why is it so bad. if all of the individual independent variables are not significantly different from zero, but the total regression equation has a high F-test and high correlation, why not use it? why is it a problem to have the value of the whole be more than the value of the parts? it says it creates type II errors, how so?

garbage in, garbage out

type II error is the possibility of accepting false hypothesis. I think the book is saying that you might risk taking investment strategies that actually do not generate returns.

There are two (at least) goals of regression - explanation and prediction. Multicollinearity messes up explanation but not prediction. For example, suppose that I do a regression analysis on MPG of cars based on Num Cylinders, Car weight, horsepower, torque. Those independent variables are highly correlated but it’s very likely that my regression equation is going to be significant, decent r^2, etc. If I’ve fit this model and want to predict MPG for some other car (the new Alfa Romeo that I lust for) I should use the model and not worry about multicollinearity. My prediction is not affected by it. Now if I want to determine whether torque is useful in predicting mpg, I’ve got a problem. The effect of torque (or any other iv) is obscured and mixed up with the effects of all the similar variables. It would be foolish for me to say that the t-statistic on torque says that torque is not useful for predicting mpg. In fact, it’s just that I can’t see the effect because the other variables are very similar. That’s the Type II error they are talking about.

JoeyD, if you got a dollar for every good explanation you have posted on here, you would have that Alfa Romeo so the big problem with multicollinearity is that someone might throw out a good regression equation because the independent variables alone do not look significant?

JoeyD doesnt know this…but last year for L1…i made a file of JoeyD responses and sorted them by topic. Just copied and pasted the question and his answer. Its like my own little rolodex of JoeyD information. Was very useful.

Mike, I think the bigger picture with multicollinearity isn’t so much that you’d throw away a good regression equation, but more that even if the equation as a whole were pretty decent in terms of yes those variables probably are important in determining/predicting the dependent variable, you’d have a hard time trying to figure out between one or another independent variable which one really explaining the dependent since they’re all very linked (if that’s the right word)/similar to eachother. JB- you are one OCD mo’ fo. Enjoy the cheesesteaks!

you can t let the variable mess with each other… if we say x as a function of y and z, and that y and z are already highly correlated (cc’s in an engine and powerfor example), then you mess yourself up, but your R2 will still be good.

I think I get what you guys are saying, thanks. I know what the texts are saying, and I think I get it for the test, but personally I just don’t see why you should worry about WHICH independent variable is explaining the dependent… as long as the overall regression explains the dependent, who cars which is the best independent and which is along for the ride? thanks to all of u

this is the point …if you fuck yourself like this the overall regression will INDICATE that it works, when in actuality it doesnt.