Hi guys Got a question here regarding regression when there’s only 1 independent variable. This is part of question 7 in the eoc of reading 9. Why is the sse = unexplained variation? And why is the sample variance of the dependent variable = total variation / (n-1) Please shed some light Thanks!
Recall that an error is the difference between the actual y-value and the y-value predicted by the population regression function (true regression line). We estimate this line from our sample, and we estimate the errors as the difference between the actual y-value and our predicted y; this is called a residual. Since this is the difference between actual y-values and predicted y-values, we can say these errors (residuals) are the parts of the dependent variable observations that are unexplained by the regression function; if we had perfect predictions (deterministic relationship), there would be no errors. So, when you estimate the variance of this error term, you are estimating the variance of the dependent variable that is unexplained by the regression function.
The variance of the dependent variable can be expressed as two parts-- the part explained by the regression function, and the part that is not explained by the regression function. When we just calculate the variance of the dependent variable, it is the total variance of the DV (both explained and unexplained). You divide this sum of squares by n-1, since you are already utilizing the sample mean as an estimated parameter in your calculation.
Hope this helps.
The explained variation is the the variation of the estimated y-values: the y-values that fall on the regression line. (That’s why it’s called “explained”: the explanation is that, “they’re on the line”.) The unexplained variation is the variation of the real y-values compared to the estimated y-values; if you ask the analyst, “Why aren’t they on the line?”, he’ll reply, “Got me; I can’t explain it.” When you add the squared differences of the real y-values and the estimated y-values, that’s SSE: the unexplained variation.
Sample variance is Σ(Yi – Y-bar)² / (n – 1); you remember this from Level I. Well, Σ(Yi – Y-bar)² is the total variation. QED