Cheers guys. Why is it that the degrees of freedom everywhere used in the book (chapter 11) is n-2? I don’t get it, I thought degrees of freedom should be n-1. Any help is appreciated.
You’re probably looking at simple regression, that n-2 is in fact n-k-1, and since k=1, it is in fact n-2.
Thanks, but it’s been a while. Where did you get the k from? what is k? is it something i should have studied at L1 and didn’t? And I am looking at “testing the significance of correlation coefficient”, haven’t even got to simple regression yet.
oh great, pepp is back…
The way I understand is that degrees of freedom is how many variables are allowed to vary, in a given N dimensional random vector. So in the case of Correlation co-efficient hypothesis testing, whether if r=0? I see only one variable. How did textbook assume degrees of freedom then to be n-2??? i am obviously missing something here.
In a multiple regression, the formula for DF is N - k - 1. N is for the number of observations (but you knew that already), k is for the number of independent variables (usually 1). So with that said, n-k-1 would be n-1-1 or n-2. It’s been a while since I’ve looked at quant, but this is what comes to mind now.
My question is not related to degress of freedom with respect to linear regression or multiple regression. It is about hypothesis testing for correlation coefficient to see if it is actually different than 0. To do that hypothesis test you have to identify your t statistic, and for that book is using n-2 degrees of freedom (pg 238, 239 of example 9 and 10, book1). I just can’t figure out why n-2.
pepp Wrote: ------------------------------------------------------- > My question is not related to degress of freedom > with respect to linear regression or multiple > regression. > yes it is, you are testing the slope of a regression line
Sharp, so are you suggesting that correlation coefficient is the slope of the regression line? a) correlation coefficient = Cov (x,y) / s(x)*s(y) b) slope = Cov(x,y) / Var (x) How is a == b?? By the way I have not even studied regression lines. So I don’ t know why everyone thinks this question is related to regression. I am on page 237, and lost with degrees of freedom. I can’t see any regression lines here.
it is n - k -1 as so many others have pointed out before. k = number of “Independent” variables. here in a simple regression you have k=1 - so n - 2. It always was n-2 even in L1 material, you go back and check on it.
I remember that you use n-1 only when you are dealing with a set of observtions where you need to subtract 1 from your n observtions to take out the effect of the one observation you are comaring against, which is the mean value of teh observations. In your example above you have x and y, so you need to take out 1 for each of them, a total of 2. Munch on this explanation till JoeyD comes back on board again.
I appreciate all your opinions, but I am not convinced. i) all of your are talking in relation to regression lines, and yes I agree with all that for a simple regression, df = n-2. ii) however, correlation coefficient r = Cov (x,y) / s(x)*s(y) and if you have construct hypothesis test on r = 0? you compute the t statistic, I don’t see why the df = n-2 now? I’ve referred to the following already, L1, book 1 pages 436-439 And I am wondering how on L2, book1, pages 237-239 (examples, 7,8,9) have n-2 degrees of freedom. THERE IS NO MENTION TO REGRESSION LINES!! it is about CORRELATION, not REGRESSION.
Correlation is a simple regression issue, you are comparing movements of two variables, so deduct 1 df for each variable, even I know that
dreary, what you say makes sense. but I have no clue what is going on. do i want to take this exam?? lol.
lol i remember you now. you had the bombardment of q’s & topics on this board last time for L1 June 08. Good luck.
pepp Wrote: ------------------------------------------------------- > dreary, what you say makes sense. but I have no > clue what is going on. do i want to take this > exam?? lol. Let me take a crack at it. The reason it’s n-2 is because two degrees of freedom are lost in the calculation: one for Y, and the other for the slope of the regression coefficient. Now, wait a sec; though there is no mention of regression lines and slopes (and yes, as you pointed out pepp, the slope isn’t even used in the calculation), the regression line theory does fold into the significance test for the regression coefficient. Specifically, the slope of the regression line comes in through the sum of products used in the r calc: sigma[(xi-x_mean)*(yi-y_mean)]. So, basically, r=cov(x,y)/[stdev(x)*stdev(y)] = [slope*var(x)]/[stdev(x)*stdev(y)]. The readings (and most readings on the topic) gloss over this linkage, but it is buried in there. So, short story: the 2 d.f. lost are: 1 for Y, and 1 for sigma[(xi-x_mean)*(yi-y_mean)]. Hope that helps.