Omitting a correlated variable - Paramater inconsitency

CFA Mutiple Reg EOC - Q 22

The passage suggests that an omitted variable that is correlated with variables already included in the model, causes coefficient estimates to be biased and inconsistent. Standard errors will also be inconsistent.

I am not sure why this is the case. Isn’t the issue with multicollinearity that we have an independent variable that is correlated and therefore needs to be removed?

Surely ‘omitting’ this correlated variable would cause the model to be better and therefore coefficients and standard errors to be more consistent ??

CFAI books (and kap lan and whoever else) are garbage for stats.

Multicollinearity does not cause anything to be biased or inconsistent (remember these words have technical meanings). The parameter estimates can be unstable in that they can vary greatly in magnitude and even direction when you drop some of the correlated independent variables, if you take another sample, or if you leave out a few data points. The standard errors can be inflated. However, the properties of consistency and unbiasedness are unaffected.

The issue with multicollinearity is situational, but again, the CFAI does a terrible job at explaining this. It’s only really a concern when your purpose is to interpret and make inferences on the beta estimates. R2 and other model based statistics are unaffected by multicollinearity. Removing variables is not always necessary. For example, if you are using the model just to get predictions, multicollinearity isn’t really a concern. There are also many different and often better ways to handle multicollinearity than just dropping some of the collinear independent variables.

Omitting the correlated variable can open the model up to other issues as you noted earlier. However, removing the variable doesn’t make the model better in the sense of R2, the F-statistic, or the model standard deviation (for the error term). It would only serve to alleviate some of the issues that arise when trying to interpret the magnitude and direction of the involved coefficients. The coefficients and standard errors are not “improved” in terms of unbiasedness or consistency if you omit a collinear variable (again, remember that unbiased and consistent have specific meanings that are different from how you might define them in common conversation).

rexthedog,

The issue here is that we should not look the model weaknesses in isolation.

Correct specification violation of the model is related to omitting a variable that is correlated with the independent variables (INDVAR) used at the time. When you omit a variable that is correlated with the INDVAR, the error term of the model is correlated with the INDVAR. You just violated an important assumption of OLS (the error term is not correlated with INDVAR, both vectors are ortogonal).

As tickersu said above, multicollinearity does not affect estimators consistency or biasness, so not bother too much on dropping variables. Keep them and aim for a model correct specification. This is more important.

Good addition. For the test, follow the curriculum word for word, but in real life, you will weigh the benefits and risks of how to handle the issues. As I mentioned, MC might not even be anything to worry about depending on the goal of for the model.

Yep, the treatment of issues will differ depending on the goal of the model.

I have been reading your replies, for sure correct and detailled, but since we have to pass a CFA test we should do what CFA curriculum explains even if not really correct (and there are a lot of other topics which aren’t). (OT: We could speak for days about the content of the curriculum… I would prefer less topics but more in detail, instead of like it is now, a lot of topics and things but done in a “bad” manner.)

Summarizing (correct me if I am wrong):

1. omitting a correlated variable: coefficients are biased and inconsistent (this is also clearly stated by CFA curriculum)

2. multicollinearity: if present, coefficients are still consistent and unbiased, but coefficients are unreliable. (standard errors are overestimated). Could be corrected by removing a variable (but here then the inconsitent thing with the first point).