Calculating Regression Line

So am I crazy, or do we not actually need to know how to establish the regression line from a data set? It seems like it’s always given to us.

I understand that the regression line minimizes the squared residuals (if I’m correct), but if you gave me a data set I wouldn’t know how to get the regression line other than just getting excel to do it for me. And for multiple regression I don’t even know how to use excel to get the regression line. Is this an issue?

I think you just need to know that the regression line = the line of best fit.

Also, know what the formula of the line is for simple and multiple regression.

Y(dependent variable) = intercept + slope coefficient (X/independent variable) + error term.

I went back and found the formulas for the slope coefficient and intercept. Can’t believe I didn’t see this before.

But it leads me to another question. How can the intercept have a standard error? I understand how the independent variables can have standard error because there are many values for them and they will each have a residual with respect to the dependent variable. But how can the intercept have a standard error when there is only one intercept? Where is the information coming from to get a standard error for this?

It’s a straightforward calculus problem: compute the partial derivatives of the sum of the squared residuals with respect to each of the regression coefficients (intercept and each of the slopes), set those partials equal to zero, solve the linear system.

Voilà!

There is also only one slope coefficient for each independent variable. If you can have a standard error for each slope coefficient, it stands to reason that you can have a standard for the intercept as well.

Ok that makes sense, but here’s where I get confused…

The standard error for the slope coefficients come from, if I’m not mistaken, the calculated covariances of the dependent variable with respect to that specific independent variable.

However, my understanding is that there is only one value of the intercept. There are not multiple intercept data point with which to establish a covariance which can give us a standard error. Obviously this thinking is incorrect, but I am not sure where.

Amongst the properties of a least-squares, best fit line is that it passes through the point (μx, μy). Thus, if the y-values vary at all, the line will move up or down, which, in turn, will change the intercept. This is seen most easily when μx = 0, so that μy is the intercept; any change in any y-value will obviously change the intercept.

Ah yes that makes sense. Thank you so much.

Good to hear.

You’re quite welcome.