# why SSE?

Why do we want to minimize the sum of squared differences when doing a linear regression…ie minimize SSE Why not find the line that minimizes the sum of the differences, without squaring them… Would not such a line be better at predicting…

… because sometimes you’re above estimate and sometimes below estimate, so the difference could be plus or minus.

why squared… why not try to draw the line the simply minimizes the sum of the differences, without squaring them…

If you do not square the difference, negative change and positive change will cancel each other. You can have 0 sum of the difference, but really each node is wildly different than the predict value, but coincidently cancel out each other. eg. prediction line A is Y = 6, 7, 8…when x = 1, 2, 3… prediction line B is Y = 1, 7, 13…when x = 1,2,3… turns out Y really is 6, 7, 8, when X = 1, 2, 3, both A & B’s sum of the total error = 0. But different SSE.

^great, but how about minimizing the absolute value of differences… seems like the same issue with MAD (mean absolute deviation) and standard deviation but if you draw the line the minimizes absolute value of differences would not that be better for trying to use for prediction…

I think QQQbee was onto the same line of reasoning before the CFAI had him taken out.

if QQQbee’s line of reasoning was to ask why you do things this way, and not just take a formula and plug numbers into it so be it.

You can do do either Least Squares, or Maximum Likelyhood…This curriculum deals only with OLS (ordinary least squares)… Look outside the curriculum on regression analysis if you want further explanations on this topic… Book by Gujarati ‘Basic Econometrics’ is excellent…it covers everything in a normal Econometrics course, going way beyond the Quantitative Methods section in CFAI…FYI: this would be a complete waste of time at this point in the exam studying time minimizing the sum of the squared differences is the only approach to look at in the curriculum (well, I’m pretty sure lol)…Just think about it, the other people explained in perfectly… My best advice is to take a look at the graphs that gives Y and Yhat… with the deviations above and below the line…you want to minimize the summation of the differences between the predicted and actual endogenous variable… but need to accurately reflect the differences by squaring them. minimizing the absolute differences doesn’t make any sense when you need to calculate all the other relationships in regressions…think about even calculating standard deviation: you need to take the squared summation of the difference between Xbar and Xi I don’t know if any of this made any sense lol but hopefully

"you want to minimize the summation of the differences between the predicted and actual endogenous variable… but need to accurately reflect the differences by squaring them. " If this means what I think it means, then that is all I needed to know… So basiclly by minimizing the squared differences, you are actually minimizing the differences as well. Come to think about it in math terms, it seems yes what i am saying is simple and should be true. Just someone verify it please.

Do you mean drawing line looking at the graph? That will be subjective. The line you draw will be different from the line I draw. There needs to be procedure and the procedure is SSE. We square the errors because the error/deviation may be +ve or -ve. We do not want the errors to cancel out. Squaring them will keep them additive, synonymous to variances that are also additive. Regression line with minimum SSE gives the best fit that represents all the points/data. This line gives us a sense of how the variables are moving and the strength/slope. By looking at the scatter plot we could easily guess the direction of the line. However, the regression line with minimum SSE gives the best estimate of the strength/slope. gulfcfa Wrote: ------------------------------------------------------- > why squared… > > why not try to draw the line the simply minimizes > the sum of the differences, without squaring > them…

Sum of squares is most robust. Sum of squares is more optimal than Sum of MAD because differences from values further away are given more weight resulting in more predictive regression line. Sum of ^4, e.g., is suboptimal as it results in outliers influencing the regression too much. And obviously as mentioned above any methods where differences cancel each other would result in less predictive regression line.

thanks everyone!

bid_shark is on the money on this one in his explanation. the other aspect is the mathematical elegance, and being able to relate everything in terms of variances. Squared differences are variances, then we can take the square root to determine the standard deviations. Having standard deviations allows to branch out and use confidence levels and other statistiscal analysis based on this platform. The point in LSE is we’re measuring covariance between the independent and dependent variable, since we only will have the independent variable in making predictions going forward. (recall with one variable, the correlations squared is R^2, our measure of fit). In this mathematical environment we can measure predictibilty and the strength better. With just the mean difference, I’m not sure of the ability to test significance of parameters determined (in explaning dependent variable) and impact of adding others. the statistical framework is lost. For MAD, great you’ve minimized distance, but is the X you are using even predictive? we need to measure dispersion for strength.