Omitted Variable bias vs Multicollinearity

In Multicollinearity, there are issues with the standard errors if the independent variables are correlated. However under Omitted variable bias, it says

[content removed by moderator]

I dont understand this-on one hand introducing two independent correlated variables can be a problem and then on the other hand, if an omitted variable is correlated then we also have an issue?

Right. In one case, the model is properly specified, but some independent variables are correlated with one another-- not a terrible problem (especially if the model is only going to be used for prediction purposes). In the other case, you have left out a necessary piece from the puzzle. Now your x-variable is correlated with the error term, which is also problematic. The regressors should be exogenous (mean error equal to zero and errors uncorrelated to regressors, E(e|x)=0). If you have multicollinearity, you can fix this issue a few ways. The other issue is more limited in it’s remedies. To satisfy the assumption you need to include the omitted variable.

But Multi-collinearity is a huge issue-coefficients and standard errors are unreliable and the recommendation to fix is to remove one of the variables (which is completely opposite to omitted variable bias)

Yes, but multicollinearity (excluding perfect) doesn’t violate any of the necessary assumptions. Multicollinearity can be mitigated by increasing the sample size to reduce the standard errors for the coefficients. Additionally, if the x-variables are so highly correlated, are you really in need of two variables? Combining the data from these two into one seems pretty reasonable (if the variables are SO correlated and similar).

Also, only variables that are highly correlated will be affected (unreliable) by this (other variables in the model that are uncorrelated can still be estimated properly). If you are trying to interpret the partial effect of x1 on y, and if x1 is uncorrelated (or lowly) with the other regressors, why fix something that isn’t affecting your analysis?

Edited, not necessary. As you can see, there are many (more than this) ways to fix the problem of multicollinearity, some are at the expense of practical interpretations, some are not.

So as I said, multicollinearity is not that big of a deal if you are using the model only for prediction purposes (getting predicted values, confidence intervals, etc.). Once you try to make practical/economic interpretations from the coefficients, then you might run into trouble, but then we have some solutions. It all comes down to evaluating the degree of multicollinearity and how it could be influencing your interpretations and how you can fix the issue.

You are right, though, if you drop the variable that solves one problem but creates another. The problem created by dropping a variable violates a basic regression assumption. This is usually percieved as a bigger deal (inconsistent estimators, bias doesn’t dissipate with large samples-not asymptotically unbiased) than some wonky coefficients and inflated variances…

tickersu: How is multicolinearity not a big deal if we are using the regression model only for prediction purposes? From what I understand, the whole point of regression models is to predict values.

From the tone of the CFAI curriculum about Misspecified Functional Form (Reading 10.5.1), where it discusses the problems associated with omitting a variable, it seems the fact that one independent variable (X1) is correlated to another (X2) isn’t as big a problem as omitting one of those variables. CFAI doesn’t elaborate much on that. Is this true in practice?

Prediction purposes --> plugging in values for independent variables and calculating a predicted value of the DV. That’s just one application of a regression model. Some people might fit a model only to gain insight into variable relationships (examining the estimated coefficients to see how the price of milk changes with the price of gas after accounting for other relevant variables, for example). Another person might want to use regression modeling to conduct a designed experiment such as determining the effect a “smart pill” has over a placebo on student exam scores while accounting for time studied, age, gender, etc., but all the researcher cares for the is the difference in mean exam scores. There are many, many uses for regression models aside from plug and chug prediction.

Multicollinearity does not, in any way, shape or form, bias the estimation of these coefficients. The estimated values of the DV are not biased. Therefore, if you only want to use the regression for prediction purposes, multicollinearity won’t pose an issue. If you wanted to look at size and direction of coefficients, you might have an issue (as discussed in the CFAI text).

Multicollinearity (less than perfect) doesn’t violate any regression assumptions. Omitting a necessary variable will violate a regression assumption, so this violates one of the conditions needed for the model to be valid. From what I’ve seen (econometricians and statisticians), multicollinearity isn’t the end of the world. There are many ways to work with it, and it doesn’t violate any assumptions.