Anova residual

I’m sure I’m having a brain freeze, but if someone could clarify this in a way I can remember easily, it would be much appreciated: When looking at ANOVA table results, what exactly is the difference between the residual SS (usually shown along with the regression SS and the MSS for both, as well as F stat) and the residual standard error (usually shown at the bottom of ANOVA tables along with multiple R squared and observations). Many thanks

your residual SS is your sum of square residuals; i.e., your sum of squares of the (actual value observed minus what your model predicts). so it’s your sum of squares of the error terms (I denote this SSE) and if it helps, remember that your total variation (SST - sum of square total) is equal to your explained variation (SSR - sum of square regression) + unexplained variation (SSE - sum of square errors) your residual standard error (or standard error of estimate) = sqrt [SSE / (n-2)] measures how well your regression model fits your observations. the smaller the better. and don’t confuse this with the “standard error of estimated coefficient.” I gotta revisit how they calculate this one as well. this is the SE next to your coefficient values. the one you use to determine the t-stat.

ok, adalfu is probably right, and my understanding doesn’t disagree, but i want to explain the way its recorded in my brain and see if anyone tells me I’m a moron. Regression SS is the sum of squares that is explained by your model. Regression MSS is regression SS divided by regression df (degrees of freedom). Residual SS is the sum of squares that is not explained by your model. Residual MSS is residual SS divided by residual df. The F statistic is the ratio of regression MSS / Residual MSS. This exposes the ability of your model to explain the distribution. If your model explained the distribution perfectly, there would near zero residual SS and MSS, and your F-stat would approach infinity. If your model was complete crap and unassociated with the distribution, you would have near zero regression SS and MSS and your F statistic approaches zero. Does that make sense ?

Furthermore… Standard error is the square root of Residual MSS. If you look up the equation for Standard Error of the Estimate (SEE) you’ll see why this makes sense in the context of ANOVA. R squared (much like the F-stat) tells you how much of the error is explained by your model. Its computed as the Regression SS divided by total SS (regression + residual). If the regression explained variation perfectly, the value of R^2 would be nearly 1. If it were total crap and didn’t explain a thing, the value would be near 0.

In some sick twisted way, I am starting to like this L2 Quant.

Residual standard error is like the standard deviation of the errors (actual minus predicted), aka SEE. It’s interesting to note the similarity between the corelation coefficient (r_xy) and the slope coefficient, b_1. r_xy = covariance/std dev of x and std dev of y, i.e., r_xy = COV_xy/s_x s_y) and b_1 = COV_xy/s_x s_x) Comments?

yeah, I would say that L2 quant is 30% as hard as L1’s.

Dreary, I think the minor difference you’re talking about relates to declaring one variable as independent. Correlation Coefficient includes both in the denom. since you’re just calculating the strength of the relationship. What really got me was the old adage: Slope = rise/run Multiply the correlation coefficient by sy/sx and you get the slope coeff.

philip.platt Wrote: ------------------------------------------------------- > In some sick twisted way, I am starting to like > this L2 Quant. I thought the same exact thing until I got through time series analysis! Post back when you’re done with that section.

I finished Time Series, but I did it through Schweser so I’m sure it’s much easier than what’s in the CFA curriculum. While I didn’t find it hard, I certainly couldn’t see how I could apply it. They just give you numbers and you plug in formulas. Everything is done for you. The concept checkers were also too easy. Does anyone know how I could ACTUALLY build a real time series with correct lags for seasonality, or for that matter multiple regressions? What software packages can I use in Excel? How do I actually go about building one?

> Multiply the correlation coefficient by sy/sx and you get the slope coeff. Interesting. If you know the slope coefficient, then the correlation coefficient is the slope coefficient times s_x/s_y. Also, if two variables are perfectly correlated (+1) , their slope coefficient is simply std dev of y divided by std dev of x, which means sy/sx = rise/run, weird, but looks correct.

dlpicket Wrote: ------------------------------------------------------- > If your model was > complete crap and unassociated with the > distribution, you would have near zero regression > SS and MSS and your F statistic approaches zero. > > > Does that make sense ? All good right until the last sentence - since you are fitting the model, the expected value of the F statistic doesn’t go to 0, it goes to denom df/(denom df -2).