# I keep getting Heteroskedasticity, Serial Correlation, and Multicollinearity mixed up!

Below are definitions of heteroskedasticiy, serial correlation, and multicollinearity. But I keep getting them confused! For example, conditional heteroskedasticity is heteroskedasticity that is correlated with the values of the independent variables. Isn’t that the same thing as multicollinearity, which is high correlation of the independent variables in multiple regression? ~Could somebody please explain to me some of the fundamental differences between hetero, ser correlation, and multicollinearity? How do you tell them apart? Many Thanks!~ Heteroskedasticity is the situation in which the variance of the residuals is not constant across all observations? Conditional heteroskedasticity is heteroskedasticity that is correlated with the values of the independent variables? ------- Serial correlation refers to the situation in which the residual terms are correlated with one another? Serial correlation is a relatively common problem with time series data? ------ Multicollinearity occurs when two or more independent variables are highly but not perfectly correlated?

1. heteroskedasticity means the variance of the residuals is not constant. if this inconsistency has some definite relationship of independent vairables, it is called “conditional”; if it’s random, it is called “unconditional”. 2. Serial correlation refers to the situation in which the residual terms are correlated with one another. In linear models it effects the model specification. In time series data it’s normal - if the period N+1 data has no relationship with period N, how could you build a model using Xn to explain Xn+1? 3. multicollinearity is no like heteroskedasticity. You can image it as independent variable are “fighting over” the explanation effect. Under this circumstance, although the whole explanation power for the combined indenpent variables remains the same, the explanation effect of every single indenpent variable becomes weak. That’s also why the F-test is singnificant because the whole “b0+b1X1+b2X2+…” still has strong explanation for Y. But the t-test for coefficient of each “Xt” has become less singnificant.

Short answer: Conditional means there is a dependence on the LEVEL of independent variable, I.E. the value can biased the dependent. Multicollin means the independents are correlated with eachother, resulting in a significant model overall but the individual coefficients may not be significant (f test significant but t-tests of individuals insignificant)

You’re jumping across topics: Multi-Collinearity is dealt in the Multiple regression portion of CFAI Serial Correlation is dealt in the time-series analysis and it has to do with errors being correlated Conditional Heteroskedasticity is mainly dealt in the Regression ( i.e. single independent var regression), although dealt briefly again in Multiple

i dont remember what either is since i studied it in January! arghhhh its going to be a rough may!

Janak… I am pretty sure that serial correlation is not exclusive to time-series, it’s also in multi regression. Also Heteroskedacticity is in multi regression a fair amount.

serial correlation in Time series is Auto correlation. Serial Correlation is a Multi-regression thing.

this is how i look at it: talking about the residual error terms heteroskedasticiy-variance of the residual error terms is not constant (bad) note: only conditional heteroskedasticiy is bad serial correlation- residual terms are correlated with one another (bad) talking about the independent variables multicollinearity - the independent variables are correlated with one another (bad)

ok, maybe this helps, but I need some feedback: serial correlation is when the residual terms are correlated. got that. (no independent variables mentioned) but WHAT’S THE DIFFERENCE BETWEEN CONDITIONAL HETEROSKEDASTICITY AND MULTICOLLINEARITY? It seems to me that they are both when the independent variables are correlated???

Conditional heteroskedasticity - relates the error term to the independant variable Multicollinearity - relates one independant variable to another, no error term involved here.

1 Like

in multicollinearity: independent variables are not correlated. multicollinearity - one or more independent variables are functions of the others - so they really are NOT INDEPENDENT variables. As a result you have too many independent variables in your regression. You would get a much better picture if you eliminated those interdependent independent variables.

cpk123 Wrote: ------------------------------------------------------- > in multicollinearity: independent variables are > not correlated. > multicollinearity - one or more independent > variables are functions of the others - so they > really are NOT INDEPENDENT variables. As a result > you have too many independent variables in your > regression. You would get a much better picture if > you eliminated those interdependent independent > variables. Are you sure? First sentence from wikipedia reads: “Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated.”

ok. take that back. my bad. the linear combination of one or more independent variables causes them to be highly correlated with each other.

multicollinearity specifies correlation among INDEPENDENT VAIRABLE and INDEPENDENT VAIRABLE . conditional heteroskadasticity focus on correlation in INDEPENDENT VAIRABLE and ERROR TERM. serial correlation means correlation among ERROR TERMS. so if it’s correlation in independent variables,multi if it’s correlation in error terms,serial if it’s correlation between the two,hetero by the way,i am not quite clear with serial correlation,is it bad? how to deal with it to get reliable prediction power?

hansen/white corrected errors again I believe for SC too. [I am speaking from memory, need to review this bad, but not getting enough time for it].

thanks guys good luck come game day I just took a CFA sample exam and got spanked, went to bed last night and practically cried

cpk123 Wrote: ------------------------------------------------------- > hansen/white corrected errors again I believe for > SC too. . According to Secret Sauce, Hansen should be used for SC and could be used for CH, even though the White method is preferred if CH is the only problem.

This is how I view it. You have a company with 1000 employees. You decide to see what the relationship between years of experience and annual salary is. You put along your X axis years of experience ranging from 0 to 30 years, and salary as the dependent variable. You notice that salary of those between 2-5 years of experience are quite high, from 5-10 years a little low, between 10-13 years very high again, others are getting higher with more years of experience. You find a regression equation which describes this linear relationship, but you note that there is heteroskedacity in those early years because the variance of the residuals (what your equation says and what the actual points say) is quite large in those years, and small in other years. There is no pattern in this variance (it’s not rising and falling every say 5 years). So, it’s not conditional heteroskedacity, but unconditional because variance s not constant. However, you notice that those errors seem to be up and down in consecutive periods. For example, your equation says salaries for 2-5 years of experience goes from \$50k-\$75k in a rising linear fashion, while actual salaries are \$50k-\$60k also in a rising linear fashion in that period, with the difference between your equation output and the actual points positive for those 4 years. That’s serial correlation. Multicollinearity would occur if you decide to add another X axis, and you choose employee age. The problem is that years of experience and age are highly correlated. So your R2 might be unduly high. You should ignore age and find some other variable. Chime in if you don’t agree.

Dreary Thanks man, but still I’m a little off. I think memorizing the definitions, memorizing the effects (high R^2, high t-stats, low standard errors, etc) is my best solution.