For the Quantophiles

Here goes the questions 1) An analyst is estimating a regression equation with three independent variables, and calculates the R2, the adjusted R2, and the F-statistic. The analyst then decides to add a fourth variable to the equation. Which of the following is most accurate? A) The R2 and F-statistic will be higher, but the adjusted R2 could be higher or lower. B) The adjusted R2 will be higher, but the R2 and F-statistic could be higher or lower. C) The R2 will be higher, but the adjusted R2 and F-statistic could be higher or lower. 2) A dependent variable is regressed against a single independent variable across 100 observations. The mean squared error is 2.807, and the mean regression sum of squares is 117.9. What is the correlation coefficient between the two variables? A) 0.55. B) 0.30. C) 0.99.

1: A 2: MSE: 2.807, MSR: 117.9, so Total variation:120.70 R2 = 120.70 - 2.807 / 120.70 = .9767 Correlation coeffiecent = sqrt(r2) = .9883 So C is the answer

actually for part 2, all that computation was not necessary… UGH I HATE MYSELF FOR NOT READING THE CUES in the question. 117.9 and 2.807

Let’s hear everyone’s reasoning… pls for q1. If addition of another variable is actually good for the model, then what impact on F stat? If addition of another variable is introducing collinearity, then what impact on F stat? No matter what, we know R2 will go up, hence eliminate B becaues it says it could go lower.

1)A 2)B For two you have to multiply the MSE by 98 to convert it to the SSE

Pepp: R sqr = Corelation coff square R sqr = RSS/SST U need to multiply MSE and MSR with the relevant multiples ( k and n-k-1)

oh SHOOOTTT!! you are right. FML FML!

For 2 the answer is A)0.55 2.807 = MSE. SSE = 2.807 * 98 = 275.096 TSS = 275.096 + 117.9 = 392.986 R^2 = 117.9/392.986 = .30 R = 0.5477 ~= 0.55

1 - C 2 - C 1 - you can throw out B as we know adding variables will definitely increase r2…however, we don’t know anything about how the variable will affect F (msr/mse) and adjusted r2…

I think it is 1)A. 2)A. and for 2 I have given the proof above…

for 2, I say A too. I go with cpk’s calculations. That’s what I essentially did, except i forgot to multiply SSE with the df.

CP is right on the 1st one- i just had breezed through it and got C though also… definitely a READ THE QUESTION question to note it was MSE not SSE. 2nd one is A i think- adding variables always helps the r sq and F, no? but the adjusted you couldn’t say for sure. i think anyways.

Hey CP Could you reason for 1 - A I mean how can we arrive on conclusion about the F and adjusted R2

Here is the answer for the 1st one… The correct answer was C) The R2 will be higher, but the adjusted R2 and F-statistic could be higher or lower. The R2 will always increase as the number of variables increase. The adjusted R2 specifically adjusts for the number of variables, and might not increase as the number of variables rise. As the number of variables increases, the regression sum of squares will rise and the residual T sum of squares will fall—this will tend to make the F-statistic larger. However, the number degrees of freedom will also rise, and the denominator degrees of freedom will fall, which will tend to make the F-statistic smaller. Consequently, like the adjusted R2, the F-statistic could be higher or lower. Adjusted R2 is always less than R2… As we keep on adding the independent variables, the difference decreases…This is what i read…can anyone substantiate the solution

F statistic tells us about whether atleast one of the coefficients is non zero. With addition of another dependent variable one would think that the f statistic should improve but remember that the coefficients change when another variable is added and hence the f statistic could be higher or lower. I think tye best explanation is through the formula and that’s the way to go.