A minor confusion on assumptions of linear regression

assume that the independent variable in a linear regression is represented by X, and that the dependent variable is represented by Y. All of the following are assumptions underlying linear regression except: A. Y values for each X are normally distributed B. the means of the distributions of Y values lie on the regression lines. C. the standard deviations of the distribution of Y values are equal D. Y value are statistically significant. I think A & D both wrong. Please verify!

I am not sure what D means, but i would say A is wrong since in linear regression, the relation between Y and X is linear (not normally distributed) otherwise you would not be able to use the linear regression eqn of Y + b0 + b1X + e

that’s what I was thinking. it is not logical to say Y is normal distributed. Only residual (se) is normal distributed. but the answer is D. and it says it should be “Y value are statistically independent”. I am pretty confused. Can I confidently say that the answer is wrong?

yes, the residuals are assumed to be normally distributed… but option B, is also not an assumption of Linear Regression (that’s what I think)… Though this might be a (rare) case that the Y-bar co-incidentally plots on the Regression line of Y = b0 + b1X in the scatter map. Also the sum of squared differences between Y-bar (mean of all Y’s) and Y-cap (estimated Y on the regression line) is what EXPLAINS the variation of dependent variable w.r.t. the independent variable and that’s what we call Sum of Squared Regression (SSR) and no where in the Linear Regression we assume that SSR = 0 - Dinesh S

i think b will be true for a linear regression Y1 = b0 + b1X1 + e. Y2 = b0 + b1X2 + e. Ybar = (y1+ y2)/2 = b0 + (x1+ x2) /2 + e also satisfies the regression equation (adding it up)

I read it wrong… “means of the distributions of Y values” is what is being considered and it’s assumed to lie on the Regression line - true what I though was “the mean of Y value i.e. Y-cap” … which is a horizontal line on the Scatter Plot, so was wondering how it’s assumed to lie on the Regression line… sorry for the confusion… Answer should be ‘A’ as… The relation between the Independent Variable (X) and a Dependent Variable (Y) is a linear relation, as it’s equation is a slope-intersept form of a straight line. - Dinesh S

naivejoe Wrote: ------------------------------------------------------- > assume that the independent variable in a linear > regression is represented by X, and that the > dependent variable is represented by Y. > All of the following are assumptions underlying > linear regression except: > A. Y values for each X are normally distributed Well, If the residuals are normally distributed then the conditional distribution of Y given X is Normal. The least squares estimators have all kinds of good properties if the residuals aren’t normal, but most of the test statistics you learn in CFA studies (t’s and F’s) are based on normal residuals. They probably think this one is true. > B. the means of the distributions of Y values lie > on the regression lines. Unambiguously true if the model is right. > C. the standard deviations of the distribution of > Y values are equal This is kind of assumption of least squares regression, but even if the s.d.'s aren’t equal the regression estimators have good properties. > D. Y value are statistically significant. Say what? Test statistics are statistically significant not dependent variables. This one is defintely wrong. > > I think A & D both wrong. Please verify!

Quote JoeyDVivre: Well, If the residuals are normally distributed then the conditional distribution of Y given X is Normal. This explanation makes sense! Thank you very much.