One question for the F-test

In Correlation and Regression, it says F-statistic tests whether the slope coefficient in a regression is 0. But later it says F test is the ratio of average RSS to average SSE. I understand the second sentence, but not sure why it’s a test for determining whether b1 is 0 or not? if b1=0, then in Y=b0+b1X+e, b1X=0, meaning X will explain none of Y?

The regression and correlation chapter teaches simple linear regression only (not multivariate). So for simple regression (only one X variable) the F-test is like a T-test but squared, so in this case (in simple reg), the F and T tests test the same, that Ho: b1 = 0, because there is no other independet variables.

For multivariate regression, F-test has a Ho: b1=b2=…=bn= 0 In this case F-test is not the squared T-test for each coefficient anymore.

Remember that we look for to reject or not the null hypothesis (bi = 0), not stating that a coefficient is zero. If we reject Ho, that indep variable is good explaining the dependent variable, if we fail to reject Ho, we conclude this indep variable must be replaced with another until get a good one.

Thanks for reply! I know that F-test is squared T-test. But why they wanna test if b1=0? You said if H0(b1=0) is not rejected, then it means that independent variable can explain well the dependent variable. Why is that?

You’re interested in whether the model as a whole is worth anything.

If it fails the F-test, then you cannot reject the null hypothesis that the slope coefficient(s) is (are) zero: the model may very well be useless.

Thanks. That’s not what I mean. F-test is a ratio test between two variances and for the one variable regression it’s testing whether the explained variation is high compared to the unexplained. If it’s high enough and is generating a statistic exceeding the critical value, then H0(b1=0) will be rejected, which means a high explained variation is the same thing as “b1 not =0”. I just wanna know why having a high RSS/SSE is related with b1 not =0.

If you have a high RSS compared to SSE it means that the set of variables or a variable is good explaining the dependent variable. Remember RSS + SSE = SST which is the variance of the dependent. So, it is logic that, if your set of independent variables makes possible a high RSS, they be good explanning the dependent. Therefore AT LEAST ONE of them is statistically significant explainning the dependent. If F is high, at least one T is high.

If there is no high multicollinearity, this must happen, but if high multicollinearity is present, the T-tests reveal no individual significance but high global significance (F-test strongly reject null) and a high R2, wchich is contradictory. Multicollinearity is a violation of the OLS assumptions and must be corrected dropping one variable.

Regards

Only perfect collinearity is a violation of an OLS assumption. Any other level of multicollinearity is not a violation of any OLS assumption.

You also do not necessarily need to drop one or more of the correlated indep. variables. It is a proposed remedy, but it depends on the purpose and intent of your research.

Still don’t understand… indecision

Think of it like this:

The F-statistic is a measure of how much the model explains in comparison to how much the model doesn’t explain. Given a model with two parameters and an intercept, for example, we would like to see how well our model is accounting for the variation in the DV. The F-statistic will show us how the explained variation in our DV compares to the unexplained variation in the DV, using this particular model. A high ratio would indicate that our model is doing a pretty good job at explaining the variation in the DV (in comparison to what isn’t being explained). In other words-- at least one of the parameters we estimated (2, aside from the intercept) is useful for explaining some variation in the DV. By logic, as our independent variable values change, the value of the DV is also changing in a nonrandom manner. At least one of the parameters (non-intercept) is different from zero (statistically different from zero). If the parameters did a bad job to explain the DV-- a low ratio of explained variance to unexplained variance-- then we could conclude that the changes in our IVs are unrelated to changes in our DV (random). If this is the case, then the estimated (non-intercept) parameters are zero.

It might help to think of this in the context of the regression equation you are using. Write it down (use above example) and say, “If b1 and/or b2 are not zero, what can I expect of how this model explains the DV? It must mean that the explained variation in the DV is statistically large enough, in comparison to the unexplained variation of the DV, to conclude that b1 and/or b2 are different from zero, statistically speaking.” Then, think of the alternative: “If b1 and b2 are both zero, what can I expect of this model in terms of explaining the DV? Well, if I plug zero in to the regression for b1 and b2, I can see that they don’t influence the DV. In this case, the ratio of explained variation in the DV to the unexplained variation is relatively low.”

Hope this example helps!

Thanks! I thought the same as you did. As it’s not explained in the curriculum, I assumed this is how it is. And it’s right based on what you said.

Glad to help!

Yes, you right on that, the curriculum covers the topic of regression as you knew many things, some of them come from L1 and others you must already know from other sources, but is not much I think.

Yes, if you have perfect multicollinearity you are, in deed, having no regression calculation. The result is ###!, nothing. Thats why it is a violation.

But a severe multicollinearity is a violation in a practical way. High M (around r > 0.7 for many variables) is just not healthy for your regression, and not even healthy, it turns it just useless . We already explained the problems of having high M.

If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion.

Excerpt:

"If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion. I’m not sure what you’re saying here. The only time multicollinearity is an issue with the matrices is when you have perfect collinearity (singular matrix), as we both said earlier."


I meant that if the multicollinearity is high, the results are inaccurate and unreliable, so what can you conclude with a model with this problem? Certainly you can’t do much regardless of your research purpose. In the practice, if we use a set of independent variables with high multicollineariy (X matrix with high M), we need to drop variables until get an acceptable level of multicollinearity, at least to the level when the F, R2 and T-tests show a desirable behavior.

Anyway, I’m agree with you that there is no threshold when detecting multicollinearity, but we must keep eye to the problem of high levels of M.

What you discussed gives me the impression you are Quantitative majors. To read through the curriculum and understand the major points are enough to tackle the practice problems behind and the exam, but to really digest each detail like being able to explain why for each phenomenon is something not quite possible for me who has 0 background knowledge and only has read Quantatitive of level 1 and 2 so far.

For example, the book says all the consequences for violation of each assumption for multiple regression models, but it might leave readers wonder why this consequence is caused, or how it can be explained. MC makes regression coefficient estimates extremely unreliable, meaning that the relationship between DV and each individual IVs may not be correctly explained by the slope coefficients, but the equation fits rather well. I took some time to digest it and finally took the previous example of the relationship between vocabulary, height and age. It’s maybe a lousy analoge but it helps to some degree. It could be a MC case taking vocabulary as DV and height and age as IVs. So it’s not that vocabulary has a direct relationship with height, as the tight relationship between height and age makes the relationship between vocabulary and height very unreliable, still the whole relationship might generate expected prediction.

Conditional heteroskedasticity (CH) results in consistant coefficient estimates but biased standard errors of them. This is harder to digest cause when it comes to standard errors it takes more effort to understand. It may be easier to imagine two IVs are related, but to think how standard errors are related with IVs is brain exhausting… the curriculum did not explain it using more easily understood examples and I guess that’s to compress knowledge and to avoid making the already bulky book bulkier.

To understand that it’s necessary to understand firstly what standard errors of coefficients really mean. For me with no background education I can only imagine how it means which can’t be varified.

Also serial correlation leads to small standard errors for regression coefficients. It’s a very abstract piece of information.

Admittedly, the exams aren’t responsible for making each person feel like what they learned from the books is comparable to that learned four years in college. It’s more of a open door to more explorations for people who wanna explore more.

Lisaliu you going good, keep that way! Combine reading, understanding and practice to be ready for the exam, we will success :smiley:

Think of standard errors as standard deviations for the estimator of a parameter. For example: let’s say you wanted to estimate the slope relating y to x1, B1 (the true value). Your regression would estimate b1 (the estimator). Now, if we did this many times (taking all possible samples from the population) to calculate all possible estimates of B1, we would have a distribution of b1 values. This distribution would have a variance and a standard error (again, think standard deviation).

A small note about serial correlation: it doesn’t necessarily lead to underestimated standard errors (only for pos. serial corr.). If the serial correlation is negative, the standard errors can be overestimated. The reason for underestimation (pos. serial corr.) and overestimation (neg. serial. corr.) is that the traditional calculation of the standard error does not account for the autocorrelation.

Hope this helps!

smileyI certainly hope so! Add oil!