regression misspecification

nitinsiwach · May 28, 2015, 10:34am

When we drop the variable to remove the multicollinearity does it not lead to the model misspecification where the error term is correlated to the independent variable?

tickersu · May 28, 2015, 11:51am

Yes-- you will be in voilation of the zero conditional mean assumption, and OLS will be biased and inconsistent.

Gebura · May 28, 2015, 12:09pm

Then why the curriculum recommend omiting variable instead of applying different method (GLS for example)?

And one unrelated question - if we can’t use Durbin-Watson statitstics for AR models why they continue to give us Durbin-Watson stat in the provided table?

tickersu · May 28, 2015, 3:12pm

Gebura:

Then why the curriculum recommend omiting variable instead of applying different method (GLS for example)?

A recommendation for fixing MC is to remove one or more of the highly correlated variables. Essentially, this _ is _ a solution for multicollinearity, but it causes another issue. This would be covered in a more comprehensive framework. There are other ways to “solve” or mitigate multicollinearity, but they all have tradeoffs (PCA is one way, increasing the sample size is another– there is no answer that is always right, it depends on your study) . You would use GLS as a solution to heteroscedasticity (when we know the form of the heteroscedasticity, typically we use a feasible GLS approach since we don’t know the form), but not as a fix for multicollinearity. I have one minor point (it should make sense once you read it); if you include a variable in your model, Xn, that is highly correlated with one or more of the other independent variables, then it is OK to remove Xn from the regression as long as Xn doesn’t truly influence the DV (Bn=0). Basically, if you omit a variable that isn’t in the _ real _ (population) regression, it doesn’t matter how correlated it is with the other variables.

And one unrelated question - if we can’t use Durbin-Watson statitstics for AR models why they continue to give us Durbin-Watson stat in the provided table? My guess is that they’re trying to see if you understand when it’s appropriate to use the DW test. Sometimes, a software package will provide you with information you don’t need, so you should know when not to use it.

Gebura · May 28, 2015, 4:55pm

Thank you, tickersu.

tickersu · May 28, 2015, 5:06pm

No problem!

Harrogath · May 28, 2015, 6:56pm

If 2 variables are highly correlated, then it is redundant to include both, so better drop one of them. You reduce MC but still explaining well your dependent variable (assuming the variables are correct for especification).

About D-W, Tickersu is right, the data is given in excess and you need to select the necessary from the whole pool.

tickersu · May 28, 2015, 7:34pm

Harrogath:

If 2 variables are highly correlated, then it is redundant to include both, so better drop one of them. You reduce MC but still explaining well your dependent variable (assuming the variables are correct for especification). This isn’t always correct-- it’s true they are redundant, but it doesn’t violate any assumptions (if less than perfect). If both (of the collinear) x-variables are in the (true) population regression, but you omit one from the sample regression, you are violating the assumption of E(U|X)=0; OLS will be biased and inconsistent. This is a crucial assumption, and violating it means we cannot use OLS. Dropping one of the variables is a _ proposed _ solution but it is _ not necessary _. The solution depends on purpose of your model. Overall, less than perfect collinearity does not violate any of the regression assumptions, but saying “you must drop one [of the collinear x-variables]” would cause you to violate a critical assumption. Respectfully, I suggest you reference a statistical/econometric text (CFAI book doesn’t cover this) if you need further clarification.

About D-W, Tickersu is right, the data is given in excess and you need to select the necessary from the whole pool.

Harrogath · May 28, 2015, 7:49pm

tickersu:

Harrogath:

If 2 variables are highly correlated, then it is redundant to include both, so better drop one of them. You reduce MC but still explaining well your dependent variable (assuming the variables are correct for especification). This isn’t always correct-- it’s true they are redundant, but it doesn’t violate any assumptions (if less than perfect). If both (of the collinear) x-variables are in the (true) population regression, but you omit one from the sample regression, you are violating the assumption of E(U|X)=0; OLS will be biased and inconsistent. This is a crucial assumption, and violating it means we cannot use OLS. Dropping one of the variables is a _ proposed _ solution but it is _ not necessary _. The solution depends on purpose of your model. Overall, less than perfect collinearity does not violate any of the regression assumptions, but saying “you must drop one [of the collinear x-variables]” would cause you to violate a critical assumption. Respectfully, I suggest you reference a statistical/econometric text (CFAI book doesn’t cover this) if you need further clarification.

About D-W, Tickersu is right, the data is given in excess and you need to select the necessary from the whole pool.

I clearly accept what you saying, less than perfect MC does not violate OLS assumptions, but affect efficency of the paramenter estimators, if you add many correlated variables with the objective of rising explanation (assuming model correctly especified) you are just blowing up variance and hence getting a very marginal better explanation in contrast to a stratospheric variance added… I have no expertise fixing MC in real life, but I think what you are saying could fit in a very uncommon case where explanation objective is much more important than extra variance added; and in the rest of cases we can just “drop” that variable and still having a good day.

tickersu · May 28, 2015, 9:12pm

Harrogath:

tickersu:

Harrogath:

If 2 variables are highly correlated, then it is redundant to include both, so better drop one of them. You reduce MC but still explaining well your dependent variable (assuming the variables are correct for especification). This isn’t always correct-- it’s true they are redundant, but it doesn’t violate any assumptions (if less than perfect). If both (of the collinear) x-variables are in the (true) population regression, but you omit one from the sample regression, you are violating the assumption of E(U|X)=0; OLS will be biased and inconsistent. This is a crucial assumption, and violating it means we cannot use OLS. Dropping one of the variables is a _ proposed _ solution but it is _ not necessary _. The solution depends on purpose of your model. Overall, less than perfect collinearity does not violate any of the regression assumptions, but saying “you must drop one [of the collinear x-variables]” would cause you to violate a critical assumption. Respectfully, I suggest you reference a statistical/econometric text (CFAI book doesn’t cover this) if you need further clarification.

About D-W, Tickersu is right, the data is given in excess and you need to select the necessary from the whole pool.

I clearly accept what you saying, less than perfect MC does not violate OLS assumptions, So, are you saying it is better to violate a regression assumption in order to reduce the variance on a coefficient? If you’re violating a key regression assumption, you want to fix it before using your model…

but affect efficency of the paramenter estimators, if you add many correlated variables with the objective of rising explanation (assuming model correctly especified) you are just blowing up variance and hence getting a very marginal better explanation in contrast to a stratospheric variance added –

What do you mean by explanation? Explaining variation in the DV or to explain the relationship (magnitude and direction) Y has with Xn? If your goal is to explain the variation in the DV and predict Y, then MC is not an issue. If you want to look at the relationship Y has with Xn (assuming Xn is collinear with another X), then you still can’t achieve this; if you leave the variables in, the parameter estimates could be terribly muddled (different signs, magnitudes), and if you remove one or more of the collinear variables (assuming it was supposed to be in the regression, as you said), then you have introduce bias and OLS is inconsistent. A larger sample won’t fix these issues. Consistency is a _ minimum _ requirement for an estimator-- you are eliminating this property by omitting a necessary variable that is correlated with any other x-variables remaining in the regression. You will be violating regression assumptions that are needed…

… I have no expertise fixing MC in real life, There are many, many techniques. As I have said, it depends on your purpose and what is important in your study. Then you will determine an approach to see if and how you will mitigate MC.

but I think what you are saying could fit in a very uncommon case where explanation objective is much more important than extra variance added; It is quite common (so, not rare, as you said) to build a model for prediction purposes. Even if you don’t want to use it to predict Y you still do not need to drop any collinear variables. Imagine you regress Y on X1, X2, and X3. Further, pretend that X1, X2, and X3, are the only variables that influence Y. Now, assume X2 and X3 are highly collinear, and X1 is uncorrelated with any combination of the others. As we have fit our model with X1, X2, and X3, the model is properly specified, it doesn’t violate any critical regression assumptions, and the only variances and coefficients affected by MC are those for X2 and X3. The estimated coefficient and variance for the coefficient of X1 are both unaffected by the collinearity**. If your goal was to examine the coefficient on X1, then no problem. If your goal was to examine the coefficient on X2 (X3), then you should be cautious. If you drop X2 or X3, then you have violated regression assumptions and can no longer rely on OLS as it is biased and inconsistent.**

and in the rest of cases we can just “drop” that variable and still having a good day. But, you can’t, because you have eliminated the desirable properties of OLS that are more important than the efficiency for the variances on a few coefficients. I’m going to refer you to page 357 of the CFAI Text #1. It discusses omitted variable bias and explicitly says [paraphrase] we can’t use the estimates of coefficient standard errors or the estimated coefficients, because OLS is biased and inconsistent. They don’t discuss the trade off between MC and OVB, but it’s pretty clear the priority here…

Harrogath · May 29, 2015, 12:02am

Ommited variable invalidates OLS because error terms caught that ommited variable, so errors are not normal anymore, thats ok. However, I don’t know how could you conclude that, when builing a model, (1) a highly correlated X variable with other X variable is extremely necessary to the model because if not used, the model invalidates (this looks like coincidence for me, this case is rare); (2) In which extent that ommited variable can effectively destroy your model knowing that the both variables are highly correlated, and (3) How can you rely on parameters that have very large confidence intervals even at very low alphas.

I personally believe that if you encounter such a case in real life, the most responsible conclusions you must declare is that your model has a very large range of prediction, so the forecasting would be a headache.

tickersu · May 29, 2015, 3:18am

Harrogath:

Ommited variable invalidates OLS because error terms caught that ommited variable, so errors are not normal anymore, thats ok. It has nothing to do with the errors being normally distributed, if that’s what you mean by normal. It’s invalidated because the error term is correlated with some of the independent variables that are still in the model.

However, I don’t know how could you conclude that, when builing a model, (1) a highly correlated X variable with other X variable is extremely necessary to the model because if not used, the model invalidates (this looks like coincidence for me, this case is rare); It is more common than you think… For example, predicting carbon monoxide content in a cigarette, using weight of the cigarette, tar content, and nicotine content (this is a real example) showed strong indications of MC, because the regressors were (highly) correlated. And, through science, we know that these factors are individually related to increases in the carbon monoxide content. So, here is a clear example (there are many others) where you have highly correlated regressors that we know (from prior knowledge) to are related to the DV.

(2) In which extent that ommited variable can effectively destroy your model knowing that the both variables are highly correlated, Read up on regression consistency and unbiasedness and then you’ll see how this works…

(3) How can you rely on parameters that have very large confidence intervals even at very low alphas. You shouldn’t if you are looking at the size and direction of them as I said earlier. If you want to use the model for prediction (many researchers do), you don’t have to worry about this at all. My question for you then, is how are you implying that we can rely on parameters that are biased and inconsistent (a much bigger problem)? As I said, you’re removing the desireable properties of OLS because you want lower variance, but your regression isn’t valid at that point.

I personally believe that if you encounter such a case in real life, the most responsible conclusions you must declare is that your model has a very large range of prediction, so the forecasting would be a headache. This is simply untrue. Multicollinearity does not, in any way, affect the predictions(forecasts) from our regression model (assuming all assumptions are satisfied and the model is correctly specified). You can find this in elementary regression texts that multicollinearity does not affect the model fit- the SER is unaffected, the F-test is unaffected, and the R-squared is unaffected by the presence of MC_._

I’ll leave it at this-- I don’t think this back and forth is productive. You keep mentioning your beliefs on a topic (that you said you have limited experience with), and I’m pointing you to text books.