A minor confusion on assumptions of linear regression

naivejoe · October 29, 2007, 6:33pm

assume that the independent variable in a linear regression is represented by X, and that the dependent variable is represented by Y. All of the following are assumptions underlying linear regression except: A. Y values for each X are normally distributed B. the means of the distributions of Y values lie on the regression lines. C. the standard deviations of the distribution of Y values are equal D. Y value are statistically significant. I think A & D both wrong. Please verify!

sv102307 · October 29, 2007, 6:38pm

I am not sure what D means, but i would say A is wrong since in linear regression, the relation between Y and X is linear (not normally distributed) otherwise you would not be able to use the linear regression eqn of Y + b0 + b1X + e

naivejoe · October 29, 2007, 6:56pm

that’s what I was thinking. it is not logical to say Y is normal distributed. Only residual (se) is normal distributed. but the answer is D. and it says it should be “Y value are statistically independent”. I am pretty confused. Can I confidently say that the answer is wrong?

dinesh.sundrani · October 29, 2007, 7:06pm

yes, the residuals are assumed to be normally distributed… but option B, is also not an assumption of Linear Regression (that’s what I think)… Though this might be a (rare) case that the Y-bar co-incidentally plots on the Regression line of Y = b0 + b1X in the scatter map. Also the sum of squared differences between Y-bar (mean of all Y’s) and Y-cap (estimated Y on the regression line) is what EXPLAINS the variation of dependent variable w.r.t. the independent variable and that’s what we call Sum of Squared Regression (SSR) and no where in the Linear Regression we assume that SSR = 0 - Dinesh S

sv102307 · October 29, 2007, 7:17pm

i think b will be true for a linear regression Y1 = b0 + b1X1 + e. Y2 = b0 + b1X2 + e. Ybar = (y1+ y2)/2 = b0 + (x1+ x2) /2 + e also satisfies the regression equation (adding it up)

dinesh.sundrani · October 29, 2007, 7:17pm

I read it wrong… “means of the distributions of Y values” is what is being considered and it’s assumed to lie on the Regression line - true what I though was “the mean of Y value i.e. Y-cap” … which is a horizontal line on the Scatter Plot, so was wondering how it’s assumed to lie on the Regression line… sorry for the confusion… Answer should be ‘A’ as… The relation between the Independent Variable (X) and a Dependent Variable (Y) is a linear relation, as it’s equation is a slope-intersept form of a straight line. - Dinesh S

JoeyDVivre · October 29, 2007, 7:56pm

naivejoe Wrote: ------------------------------------------------------- > assume that the independent variable in a linear > regression is represented by X, and that the > dependent variable is represented by Y. > All of the following are assumptions underlying > linear regression except: > A. Y values for each X are normally distributed Well, If the residuals are normally distributed then the conditional distribution of Y given X is Normal. The least squares estimators have all kinds of good properties if the residuals aren’t normal, but most of the test statistics you learn in CFA studies (t’s and F’s) are based on normal residuals. They probably think this one is true. > B. the means of the distributions of Y values lie > on the regression lines. Unambiguously true if the model is right. > C. the standard deviations of the distribution of > Y values are equal This is kind of assumption of least squares regression, but even if the s.d.'s aren’t equal the regression estimators have good properties. > D. Y value are statistically significant. Say what? Test statistics are statistically significant not dependent variables. This one is defintely wrong. > > I think A & D both wrong. Please verify!

naivejoe · October 29, 2007, 8:06pm

Quote JoeyDVivre: Well, If the residuals are normally distributed then the conditional distribution of Y given X is Normal. This explanation makes sense! Thank you very much.