Multicollinearity/Serial Correlation/Hederoskedasticity Description Summary

This_is_not_easy · May 20, 2015, 4:28pm

I’ve run into a lot of questions regarding the description of characteristics regarding coefficients, standard errors, etc… I’ve created a brief summary below to outline the characteristics of each and get a better grasp of it all.

I’m not confident in my interpretation and I’m looking for some verification/correction. Any and all help is immensely appreciated!

Conditional Heteroskedasticity

Coefficients: Unbiased & Consistent

Standard Errors: Biased, Consistent, and too small

t-Stat: Artificially High

Serial Correlation

Coefficients: Unbiased & Consistent

Standard Errors: Biased, Consistent, and too small

t-Stat: Artificially High

Multicollinearity

Coefficients: Unbiased & Consistent

Standard Errors: Biased, Consistent, and too large

T-Stat: Artficially Low

Model Misspecification

Coefficients are biased and inconsistent

Gebura · May 20, 2015, 5:27pm

I agree both conditional heteroskedasticity and serial correlation lead to bias standard errors, but t-stat can be either too low or too high in these cases.

TheLakeHouse · May 20, 2015, 5:33pm

In which case t-stat can be too low? As far as I know t-stat in the first 2 cases will be inflated because standard errors will be underestimated.

@OP: You should add solutions for each case because I feel they are likely to be tested.

tickersu · May 20, 2015, 5:43pm

This is correct. The bias depends on the type of heteroscedasticity and the type of serial correlation.

The OP’s statement about serial correlation will be incorrect (OLS is biased) if we have a lagged dependent variable as an independent variable (a violation of regressor exogeneity when serial correlation is present).

Also, multicollinearity does NOT bias anything in the regression, including the standard errors. It is the standard errors of the coefficients that will be inflated (not biased or inconsistent) by the square root of a quantity known as a Variance Inflation Factor (VIF).

This_is_not_easy · May 20, 2015, 5:45pm

I have to agree with Gebura, I don’t understand your logic regarding the conditional heteroskedasticity and serial correlation standard errors. Please provide an example. Everything I’ve read states the standard errors will be biased downwards which leads to the artificially high t-stats.

Also, multicollinearity leads to artifically low t-stats because the standard errors will be too high. Doesn’t that indicate biased standard errors (biased upwards)?

TheLakeHouse · May 20, 2015, 5:51pm

Hmm. Is it outside the curriculum? I know heteroskedasticity doesn’t affect the consistency of the regression parameter estimates because those estimates are from OLS. But I don’t remember about serial correlation things (i.e OLS is biased)

Gebura · May 20, 2015, 6:04pm

For serial correlation:

If it’s positive the standard errors will be underestimated, the t-stat overestimated

If it’s negative the opposite -standard errors too high, t-stat too low.

For conditional heteroskedasticity -it’s written in the CFAI text that typically when dealing with financial data the standard errors will be underestimated but that sometimes it will be the opposite. As far as I remember no example was given.

TheLakeHouse · May 20, 2015, 6:11pm

Page 349 - reading 10: in the footnote they say OLS standard errors need not to be underestimates of actual standard errors if negative serial correlation is present in the regression. I am a bit confused with “… need not to be underestimates…”. Does it mean “overestimated”?

tickersu · May 20, 2015, 6:25pm

This\_is\_not\_easy:

tickersu:

Gebura:

I agree both conditional heteroskedasticity and serial correlation lead to bias standard errors, but t-stat can be either too low or too high in these cases.

This is correct. The bias depends on the type of heteroscedasticity and the type of serial correlation.

The OP’s statement about serial correlation will be incorrect (OLS is biased) if we have a lagged dependent variable as an independent variable (a violation of regressor exogeneity when serial correlation is present).

Also, multicollinearity does NOT bias anything in the regression, including the standard errors.

I have to agree with Gebura, I’m agreeing with Gebura as well.

I don’t understand your logic regarding the conditional heteroskedasticity and serial correlation standard errors. It’s the same as Gebura’s logic. This is also stated in the curriculum or other statistical texts. OLS is biased and can be inconsistent if we use a lagged DV as an IV in the presence of serial correlation.

Please provide an example. Positive serial correlation will understate the standard errors. Negative serial correlation can lead to overstated or understated standard errors. This is beyond the curriculum, but you can certainly see this by deriving the true standard errors in the presence of serial correlation. The form of heteroscedasticity can cause the standard errors to be too high or too low. Again, this is outside the curriculum.

Everything I’ve read states the standard errors will be biased downwards which leads to the artificially high t-stats. The curriculum isn’t comprehensive, and isn’t anything close to a statistical text. Positive serial correlation is much more common than negative, which is (most likely) why it was focused on in the curriculum. In practice, the heteroscedasticity robust standard errors are usually, but not always, larger than the ordinary OLS standard errors (hence, some people will say that OLS underestimates the standard errors with heteroscedasticity).

Also, multicollinearity leads to artifically low t-stats because the standard errors will be too high. Doesn’t that indicate biased standard errors (biased upwards)? No-- unbiasedness is a statistical property meaning that, on average, our estimator is equal to the true parameter value. When calculating the coefficient variances (and standard errors) in multiple regression, we must account for the correlation between the regressors (to recognize the appropriate level of uncertainty arising in our estimation). This is included in the variance calculation as 1/(1-R-squaredj) = VIFj where R-squaredj is the r-squared from a regression using all other independent variables to predict the jth independent variable (basically it’s the r-squared from predicting X1 with X2,X3,…Xn**, and so on for each independent variable. **** This is also beyond the curriculum, but you can check it on something as basic as Wikipedia (or any entry-level econometrics/regression text, if you like credible sources).**

The statistics portion of the curriculum, when compared to the entire body of statistical knowledge (even on regression and time series alone), is infinitely dwarfed by the outside world.

tickersu · May 20, 2015, 6:32pm

Gebura:

TheLakeHouse:

Gebura:

I agree both conditional heteroskedasticity and serial correlation lead to bias standard errors, but t-stat can be either too low or too high in these cases.

In which case t-stat can be too low? As far as I know t-stat in the first 2 cases will be inflated because standard errors will be underestimated.

For serial correlation:

If it’s positive the standard errors will be underestimated, the t-stat overestimated

If it’s negative the opposite -standard errors too high, t-stat too low. This isn’t always true. Negative serial correlation can make the normal OLS variance formulas overstate or understate the variance. I won’t go for the details (since it’s wayyyyyy outside the curriculum, rather than a smidge, but also, it’s a little wordy or requires a bit of math-type).

For conditional heteroskedasticity -it’s written in the CFAI text that typically when dealing with financial data the standard errors will be underestimated but that sometimes it will be the opposite. As far as I remember no example was given. To support you here, most outside statistics references will tell you it depends on the actual form of the heteroscedasticity whether the standard errors will be under or overstated.

tickersu · May 20, 2015, 6:34pm

tickersu · May 20, 2015, 6:42pm

It’s loosely in the curriculum. Think about this; the errors are a function of the actual and predicted y-values. If the errors are correlated across time, then we can infer a correlation of Y-values through time. Now, if you remember, we need to have a zero conditional mean-- that is, E(u|x)=0. If we have correlated errors (u or e, whatever your notation), and we use a past value of Y, say Y(t-1), as an independent variable, then we have created an X that can tell us about the error (x-variable correlated with error, loss of exogeneity). We have violated the zero conditional mean assumption, which is needed for unbiasedness.

Let me know if this helps.

Gebura · May 20, 2015, 7:50pm

Thank you for the clarification,tickersu. It’s true statistics is too huge discipline to be represented correctly within the limited CFAI curriculum.

Harrogath · May 20, 2015, 8:20pm

Remember that negative heteroskedasticity is not common in economic and financial data, so it is highly unlikely that the exam asked that. Focus on positive het by the moment (just being simplistic right now).

This_is_not_easy · May 20, 2015, 9:23pm

_

This_is_not_easy · May 20, 2015, 9:23pm

tickersu:

This\_is\_not\_easy:

tickersu:

Gebura:

I agree both conditional heteroskedasticity and serial correlation lead to bias standard errors, but t-stat can be either too low or too high in these cases.

This is correct. The bias depends on the type of heteroscedasticity and the type of serial correlation.

The OP’s statement about serial correlation will be incorrect (OLS is biased) if we have a lagged dependent variable as an independent variable (a violation of regressor exogeneity when serial correlation is present).

Also, multicollinearity does NOT bias anything in the regression, including the standard errors.

I have to agree with Gebura, I’m agreeing with Gebura as well.

I don’t understand your logic regarding the conditional heteroskedasticity and serial correlation standard errors. It’s the same as Gebura’s logic. This is also stated in the curriculum or other statistical texts. OLS is biased and can be inconsistent if we use a lagged DV as an IV in the presence of serial correlation.

Please provide an example. Positive serial correlation will understate the standard errors. Negative serial correlation can lead to overstated or understated standard errors. This is beyond the curriculum, but you can certainly see this by deriving the true standard errors in the presence of serial correlation. The form of heteroscedasticity can cause the standard errors to be too high or too low. Again, this is outside the curriculum.

Everything I’ve read states the standard errors will be biased downwards which leads to the artificially high t-stats. The curriculum isn’t comprehensive, and isn’t anything close to a statistical text. Positive serial correlation is much more common than negative, which is (most likely) why it was focused on in the curriculum. In practice, the heteroscedasticity robust standard errors are usually, but not always, larger than the ordinary OLS standard errors (hence, some people will say that OLS underestimates the standard errors with heteroscedasticity).

Also, multicollinearity leads to artifically low t-stats because the standard errors will be too high. Doesn’t that indicate biased standard errors (biased upwards)? No-- unbiasedness is a statistical property meaning that, on average, our estimator is equal to the true parameter value. When calculating the coefficient variances (and standard errors) in multiple regression, we must account for the correlation between the regressors (to recognize the appropriate level of uncertainty arising in our estimation). This is included in the variance calculation as 1/(1-R-squaredj) = VIFj where R-squaredj is the r-squared from a regression using all other independent variables to predict the jth independent variable (basically it’s the r-squared from predicting X1 with X2,X3,…Xn**, and so on for each independent variable. **** This is also beyond the curriculum, but you can check it on something as basic as Wikipedia (or any entry-level econometrics/regression text, if you like credible sources).**

The statistics portion of the curriculum, when compared to the entire body of statistical knowledge (even on regression and time series alone), is infinitely dwarfed by the outside world.

I had meant to refer to “the Lake House,” not “Gebura.” Mixed up the usernames.

Unfortunately, I’m afraid my post here may have confused me more for the CFA L2 exam. It seems my mistake was trying to simplify a CFA curriculum topic that can’t be because in the broader world of statistics, these are in-depth topics that are barely touched on in CFAI curriculum.

You are obviously at least very knowledgeable regarding stat (whereas I’m not) so please correct me if I’m wrong in the following statements (it would be greatly appreciated):

My takewawy from your statements is that there is cond. het. and serial correlation that lead to biased SEs and incorrect t-scores (in the context of the CFAI curriculum they are honing in on the downard biased SEs that lead to artificially high t-scores). I should just operate under the assumption that the coefficients (when cond. het. and serial correlatio is present) are consistent and unbiased (again, in CFAI curriculum context)

Multicollinearity does not have biased SEs or inconsistent\biased coefficients.

I realize I’m ignoring the underlying logic and still trying to simplify everything. Quite frankly, there are 17 days left and I’m not sure how much room is left in my head.

tickersu · May 20, 2015, 11:07pm

This\_is\_not\_easy:

tickersu:

This\_is\_not\_easy:

tickersu:

Gebura:

I agree both conditional heteroskedasticity and serial correlation lead to bias standard errors, but t-stat can be either too low or too high in these cases.

This is correct. The bias depends on the type of heteroscedasticity and the type of serial correlation.

The OP’s statement about serial correlation will be incorrect (OLS is biased) if we have a lagged dependent variable as an independent variable (a violation of regressor exogeneity when serial correlation is present).

Also, multicollinearity does NOT bias anything in the regression, including the standard errors.

I have to agree with Gebura, I’m agreeing with Gebura as well.

I don’t understand your logic regarding the conditional heteroskedasticity and serial correlation standard errors. It’s the same as Gebura’s logic. This is also stated in the curriculum or other statistical texts. OLS is biased and can be inconsistent if we use a lagged DV as an IV in the presence of serial correlation.

Please provide an example. Positive serial correlation will understate the standard errors. Negative serial correlation can lead to overstated or understated standard errors. This is beyond the curriculum, but you can certainly see this by deriving the true standard errors in the presence of serial correlation. The form of heteroscedasticity can cause the standard errors to be too high or too low. Again, this is outside the curriculum.

Everything I’ve read states the standard errors will be biased downwards which leads to the artificially high t-stats. The curriculum isn’t comprehensive, and isn’t anything close to a statistical text. Positive serial correlation is much more common than negative, which is (most likely) why it was focused on in the curriculum. In practice, the heteroscedasticity robust standard errors are usually, but not always, larger than the ordinary OLS standard errors (hence, some people will say that OLS underestimates the standard errors with heteroscedasticity).

Also, multicollinearity leads to artifically low t-stats because the standard errors will be too high. Doesn’t that indicate biased standard errors (biased upwards)? No-- unbiasedness is a statistical property meaning that, on average, our estimator is equal to the true parameter value. When calculating the coefficient variances (and standard errors) in multiple regression, we must account for the correlation between the regressors (to recognize the appropriate level of uncertainty arising in our estimation). This is included in the variance calculation as 1/(1-R-squaredj) = VIFj where R-squaredj is the r-squared from a regression using all other independent variables to predict the jth independent variable (basically it’s the r-squared from predicting X1 with X2,X3,…Xn**, and so on for each independent variable. **** This is also beyond the curriculum, but you can check it on something as basic as Wikipedia (or any entry-level econometrics/regression text, if you like credible sources).**

The statistics portion of the curriculum, when compared to the entire body of statistical knowledge (even on regression and time series alone), is infinitely dwarfed by the outside world.

I had meant to refer to “the Lake House,” not “Gebura.” Mixed up the usernames.

Unfortunately, I’m afraid my post here may have confused me more for the CFA L2 exam. Sorry about that!

It seems my mistake was trying to simplify a CFA curriculum topic that can’t be because in the broader world of statistics, these are in-depth topics that are barely touched on in CFAI curriculum. You can simplify it, but you need a few stats classes first . To be fair, a lot of the curriculum is a basic overview of the covered topics (remember, there’s no calculus, linear algebra, or any other math that is frequently used in the academic setting for many of these topics).

You are obviously at least very knowledgeable regarding stat (whereas I’m not) so please correct me if I’m wrong in the following statements (it would be greatly appreciated): I’ve just studied (part of) it longer

My takewawy from your statements is that there is cond. het. and serial correlation that lead to biased SEs and incorrect t-scores (in the context of the CFAI curriculum they are honing in on the downard biased SEs that lead to artificially high t-scores). I should just operate under the assumption that the coefficients (when cond. het. and serial correlatio is present) are consistent and unbiased (again, in CFAI curriculum context). If you have time, try to revisit these areas in the CFAI text book for a general idea. If not, you can oversimplify it, but this will lend itself to some risk that they present you with the situation that you didn’t study. In my opinion, they’re going to focus on positive serial correlation and conditional heteroscedasticity both understating and having a tendency to understate the standard errors** , respectively. This is just my opinion, though. At a broad level, you should know the standard errors calculated with OLS are incorrect in either case. Think about it like this-- we have assumptions for OLS and these assumptions are used to calculated the regular standard errors; when these assumptions hold, the regular standard errors are good. If we have a “funky” error term (violated regression assumption for serial correlation or heteroscedasticity), then we know the regular OLS standard error calculation doesn’t accuractly account for the “funky” error term-- in this case, the standard error can’t be correct.**

Multicollinearity does not have biased SEs or inconsistent\biased coefficients. If you want, google search the phrase “OLS BLUE” for a further background on this. As long as there is no perfect collinearity, OLS is perfectly fine to use (there are some things to be careful about depending on the intent of your research, though). The standard errors (only) for the coefficients are “inflated” because we need to account for the fact that some information is redundant between the correlated independent variables (this is okay and can be mitigated by increasing the sample size).

I realize I’m ignoring the underlying logic and still trying to simplify everything. Quite frankly, there are 17 days left and I’m not sure how much room is left in my head. I can’t blame you-- I haven’t had the time or the motivation to keep studying (life has other plans). I studied basically 1.5 months (end Feb through March), with April until now basically off from studying (maybe one extra week of studying in there). There are other things that need to be done, but hopefully I can eke out some more studying to get me into a fighting spot for the 6th.

tickersu · May 20, 2015, 11:15pm

You’re welcome! My opinion is that the CFA program is geared more towards application, rather than theory. When you boil it down to teaching for application you might not be able to cover all possible scenarios, so you cover the most likely.

This_is_not_easy · May 21, 2015, 2:21am

Thanks for taking the time tickersu. Good luck on th 6th!

TheLakeHouse · May 21, 2015, 6:15am

[quote=“tickersu”]

Thanks for your explanation ! I read somewhere in this forum that you are a quantitative analyst? No surprise why you know the materials this detailed ;).

Btw, how is the excel score sheet you mentioned earlier going?