ANOVA

Kerry1 · January 23, 2008, 12:52pm

Hey there, I’m trying to get to grips with the ANOVA formula: = (RSS/1) / (SEE/n-2) so it’s average explained variance devided by average unexplained variance. If I understand this correctly, you are calculating what % of unexplained variance is explained?? (something doesn’t quite fit there) I make sense of the formula this way: The explained variance is the difference between your regression and mean value, where the independent varible is fixed at X for instance. The distance between the two is fixed and therefore your result is a average. The unexplained variance (diff btween plot & regression line) where the independent variable is fixed at X would probably have more than one plot along that line and therefore you need to devide by n. So basically I am trying to get my head around why RSS devided by 1 is an average, and SSE is devided by n. If someone could clear that all up for me it would be appreciated. Kerry.

JoeyDVivre · January 23, 2008, 3:11pm

This is quite beyond the scope of the CFA exam, but… 1) Start with SST = Sum(x[i] - X-bar)^2. If the x[i] are all from the same normal distribution, this thing is a chi-square random variable with n - 1 df. That’s a result that would be proven in, say, a junior year mathematical stats class taught at most universities. 2) Decompose SST into SSR + SSE. Now we use Cochran’s theorem (which is a little deep) to show that SSR/sigma^2 and SSE/sigma^2 are both independent (!) chi-square r.v.'s. Clearly, if SSE is very large compared to SSR then this is evidence that the regression didn’t work. So in the usual hypothesis testing sense, we look for some way of creating a test statistic from two independent chi-square r.v.'s whose distribution is known or knowable. 3) Show that the df of SSE/sigma^2 = n - 2 which is a proof of very similar flavor to proof in 1) 4) So now we have chi-square (n - 1 df) = chi-square with unknown df + chi-square (n-2 df) and that means the unknown df = 1 which you could do with moment-generating functions 5) Now the definition of an F distribution = chisquare1/df1 divided by chisquare2/df2 where chisquare1 and chisqaure2 are independent. That gives us a very natural test statistic under the null hypothesis that the regression analysis did nothing for us and the observations are just independent normals. and then proceed to test the hypothesis…

Kerry1 · January 23, 2008, 4:16pm

Few, thanks for the efforts Joey… guess I’ll just learn that formula!!