a regression question

total variation=explained variation + unexplained variation total variation=sigma(Y-Y bar)^2 explained variation =sigma(Y hat- Y bar)^2 unexplained variation =sigma(Y- Y hat)^2 am I right about the equation of explained variation?

that is correct…

sigma(Y-Y bar)^2 = sigma(Y hat- Y bar)^2 + sigma (Y-Y hat)^2 this is not mathematically intuitive, how to explain this equation?

y-ybar = (y-yhat) - (Ybar-yhat) now you are summing up the above across all observations. and - given the y’s you have – ybar and yhat are known values. now you are summing up squares of these.

wolwol Wrote: ------------------------------------------------------- > total variation=explained variation + unexplained > variation > > total variation=sigma(Y-Y bar)^2 > explained variation =sigma(Y hat- Y bar)^2 > unexplained variation =sigma(Y- Y hat)^2 > > am I right about the equation of explained > variation? dont think so Y - Y_bar = (Y - Y_hat) + (Y_hat - Y_bar) (this is true we all know) Sigma of these will also be true if its true at individual points. so if the above is true than sigma of squares of these quantities will not be same i.e. (Y - Y_bar)^2 != (Y - Y_hat)^2 + (Y_hat - Y_bar) ^2 and hence there sigma will not be equal (in all cases)

lets take just one point in reg. Y = 10 Y_bar = 6 Y_hat = 4 100 != 36 + 16

i see your point, but that equation is in the textbook.

unexplained variation is not (in bold) sigma(Y- Y hat)^2 unexplained variation = total variation - explained variation

that is the definition of unexplained variation on the textbook 1 p249 rahulv Wrote: ------------------------------------------------------- > unexplained variation is not (in bold) sigma(Y- Y > hat)^2 > > unexplained variation = total variation - > explained variation

I am seeing page 249 again. They call Total Variation = Sigma(Yi-YBar)^2 and Sigma(Yi-YiHat)^2 as the unexplained variation. they then in the book go on to say Explained Variation = Total Variation - Unexplained Variation. No where are they saying that explained variation =sigma(Y hat- Y bar)^2

Yes, the equation is correct. An explanation for it is to think of total varation as your good old variance (without dividing by n-1 as usual) of the actual points. So, lets say you have a list of actual y values, nothing is being estimated, just the values that you have from a sample. The total variation of these y values is the variance, i.e., the measure of how each one of those y values differs from the mean, Ybar. That’s straightforward. Nothing new, just good old plain variance. Next, you run a linear regression and you find that the best line that fits those actual y values is one whose graph is Yhat. (Remember the only reason you are running this regression is to to find a formula which you can use to find any future y value based on the sample y values you have). The regression technique will guarantee for you that the line you get from this formula is the best possible line, i.e., it is the line which gets as close as possible to your original y values. It’s the line that results in the least amount of error, measured as difference from actual values. Good so far? Now if you go back to your original y values and see how they differ from this estimated line (which is the best line we can find), you will be looking at how “bad” is this best fit line compared to your actual points. That’s the residual error (Yactual - Yhat). Ok so far? Finally, if you look at how the points on the best fit line (the Yhat graph) differ from the mean actual values (Ybar), then you are looking at the variance (without denominator, i.e., only sum of differences part) between the estimated line and the *average* actual y value. That’s the power of your regression, as you will see next. Lets say you find that the estimated points obtained by your newly discovered equation (the Yhat) are exactly the same as your actual y values! Wow, that means you’ve come up with an equation which fits 100% to your actual values. That means two things: 1. The actual Y values = estimated Y values ((Y - Yhat = 0), and the residual error is zero. That means that your total variation (how much your actual points differ from their mean) is same as that obtained by the derived equation. No error. 2. The estimated y values differ from your *mean* *actual* y value by the exact same amount your actual y values differ from their mean. So the estimated y values are just as good as the actual values, *and* their variance is same as original data variance. They explain 100% of the actual variance in your original data. Your conclusion then is that all the variation observed in the actual data is contained in the regression part (Yhat - Ybar). That’s a perfect coefficient of determination (R^2)…all the variation in your actual data is explained by your equation. On the other hand, assume that you find that the estimated points obtained by your newly discovered equation (the Yhat) are drastically different from your actual y values. What does that mean? Lets leave that as homework.

check out page 259 TSS=SSE+RSS cpk123 Wrote: ------------------------------------------------------- > I am seeing page 249 again. > > They call Total Variation = Sigma(Yi-YBar)^2 > > and Sigma(Yi-YiHat)^2 as the unexplained > variation. > > > > they then in the book go on to say > > Explained Variation = Total Variation - > Unexplained Variation. > > No where are they saying that explained variation > =sigma(Y hat- Y bar)^2