Degrees of freedom

Help! I basically skipped quant altogether in L1 and it’s not making much sense to me. Putting criticism aside, can someone please explain to me degrees of freedom? I understand why a regression line is used, and the error term, and most of everything else. The part of the puzzle that doesn’t fit for me is degrees of freedom. Why is is (n-2) degrees of freedom for a single variable? Shouldn’t it be n-1 since we can determine the average with all but 1 of the data points? Help!!

In a simple way, when calculating the sample variance, we need the sample mean. Assume our sample size was 30 and the mean was 100. For this case, 29 observations can take on random values, because we could choose a non-random value for the 30th observation to ensure the average equals 100. This is the idea (more or less) of degrees of freedom- how many independent obeservations do we have when calculating a statistic?

So, when you’re calculating the estimated variance in simple linear regression (with an intercept), we actually have two estimators-- the estimate of the y-intercept and the estimated slope coefficient. The degrees of freedom is the sample size less the number of estimators, so n-2. This could also be shown as n-k-1 or n-(k+1), where n is the sample size, and k is the number of independent variables and the 1 is for the y-intercept.

Also, remember that a residual (estimate of the error) is calculated as (Y(i) - Y-hat(i)). Here, Y(i) is the actual y for observation i, and Y-hat (i) is the predicted value of Y for observation i, which depends on the estimated slope and y-intercept (2 estimators, from your example of simple linear regression).

Hope this helps!

I wrote an article on degrees of freedom that might help: http://financialexamhelp123.com/degrees-of-freedom/

Thanks to both of you this is very helpful (especially the blog post you have great explanations in the blog). One additional question-- why is the slope coefficient considered a known estimator? Maybe I am going back a little too far to my “y=mx+b” days, but isn’t the slope coefficient just a multiplier of the estimated regression? I completely get why the y-intercept is an estimator, as the least squares line has to pass through that point, but the slope doesn’t have a point it passes through?

Thanks again for your help I greatly appreciate it!

I’m not certain that I understand your question. The objective in linear regression is to determine which of (uncountably) infinitely many lines fits the data best, in the sense of minimizing the sum of the squares of the differences between the actual y values and the estimated y values. It’s a straightforward 2-variable calculus problem, where the 2 variables are the slope and the intercept.

If you’re looking for another point through which the line passes to get that slope, note that it passes through the centroid of the data: (X-bar, Y-bar).

I’m sorry but this very much is not clicking for me at the moment. The degrees of freedom are the total number of points that can vary in a distribution to arrive at a partiular mean. If that is the case, why is it (n-2) for a simple linear regression? Shouldn’t it be (n-1) since the only known point is the y-intercept?

You compute the slope (one degree of freedom) and the intercept (one more degree of freedom); you’ve lost two degrees of freedom.

Let’s take a trivial example: I give you the points (1,1) and (2,2). You create the regression line:

y = x

So _b_0 = 0 and _b_1 = 1.

How many degrees of freedom do you have? Put another way, given two new points (1, _y_1) and (2, _y_2), Can _y_1 or _y_2 vary freely so that the line has _b_0=0 and _b_1=1? It should be pretty clear that neither _y_1 nor _y_2 have any freedom: _y_1 = 1 and _y_2 = 2 are the only values that’ll work. Zero degrees of freedom. And n = 2, so n – 2 = 0.

Edit: Just saw Magician’s example, so I’ll let that sit with you…

thank you all, that was very helpful!

i always knew the concept, but not to this degree of clarity. much appreciated! :slight_smile:

My pleasure.

Good to hear, and I’m glad to help!