One question for the F-test

Lisaliu · March 13, 2015, 3:41pm

In Correlation and Regression, it says F-statistic tests whether the slope coefficient in a regression is 0. But later it says F test is the ratio of average RSS to average SSE. I understand the second sentence, but not sure why it’s a test for determining whether b1 is 0 or not? if b1=0, then in Y=b0+b1X+e, b1X=0, meaning X will explain none of Y?

Harrogath · March 13, 2015, 3:59pm

The regression and correlation chapter teaches simple linear regression only (not multivariate). So for simple regression (only one X variable) the F-test is like a T-test but squared, so in this case (in simple reg), the F and T tests test the same, that Ho: b1 = 0, because there is no other independet variables.

For multivariate regression, F-test has a Ho: b1=b2=…=bn= 0 In this case F-test is not the squared T-test for each coefficient anymore.

Remember that we look for to reject or not the null hypothesis (bi = 0), not stating that a coefficient is zero. If we reject Ho, that indep variable is good explaining the dependent variable, if we fail to reject Ho, we conclude this indep variable must be replaced with another until get a good one.

Lisaliu · March 13, 2015, 4:18pm

Thanks for reply! I know that F-test is squared T-test. But why they wanna test if b1=0? You said if H0(b1=0) is not rejected, then it means that independent variable can explain well the dependent variable. Why is that?

S2000magician · March 13, 2015, 4:34pm

You’re interested in whether the model as a whole is worth anything.

If it fails the F-test, then you cannot reject the null hypothesis that the slope coefficient(s) is (are) zero: the model may very well be useless.

Lisaliu · March 13, 2015, 5:20pm

Thanks. That’s not what I mean. F-test is a ratio test between two variances and for the one variable regression it’s testing whether the explained variation is high compared to the unexplained. If it’s high enough and is generating a statistic exceeding the critical value, then H0(b1=0) will be rejected, which means a high explained variation is the same thing as “b1 not =0”. I just wanna know why having a high RSS/SSE is related with b1 not =0.

Harrogath · March 13, 2015, 8:06pm

If you have a high RSS compared to SSE it means that the set of variables or a variable is good explaining the dependent variable. Remember RSS + SSE = SST which is the variance of the dependent. So, it is logic that, if your set of independent variables makes possible a high RSS, they be good explanning the dependent. Therefore AT LEAST ONE of them is statistically significant explainning the dependent. If F is high, at least one T is high.

If there is no high multicollinearity, this must happen, but if high multicollinearity is present, the T-tests reveal no individual significance but high global significance (F-test strongly reject null) and a high R2, wchich is contradictory. Multicollinearity is a violation of the OLS assumptions and must be corrected dropping one variable.

Regards

tickersu · March 14, 2015, 3:10am

Only perfect collinearity is a violation of an OLS assumption. Any other level of multicollinearity is not a violation of any OLS assumption.

You also do not necessarily need to drop one or more of the correlated indep. variables. It is a proposed remedy, but it depends on the purpose and intent of your research.

Lisaliu · March 14, 2015, 12:09pm

Still don’t understand…

tickersu · March 14, 2015, 1:10pm

Think of it like this:

The F-statistic is a measure of how much the model explains in comparison to how much the model doesn’t explain. Given a model with two parameters and an intercept, for example, we would like to see how well our model is accounting for the variation in the DV. The F-statistic will show us how the explained variation in our DV compares to the unexplained variation in the DV, using this particular model. A high ratio would indicate that our model is doing a pretty good job at explaining the variation in the DV (in comparison to what isn’t being explained). In other words-- at least one of the parameters we estimated (2, aside from the intercept) is useful for explaining some variation in the DV. By logic, as our independent variable values change, the value of the DV is also changing in a nonrandom manner. At least one of the parameters (non-intercept) is different from zero (statistically different from zero). If the parameters did a bad job to explain the DV-- a low ratio of explained variance to unexplained variance-- then we could conclude that the changes in our IVs are unrelated to changes in our DV (random). If this is the case, then the estimated (non-intercept) parameters are zero.

It might help to think of this in the context of the regression equation you are using. Write it down (use above example) and say, “If b1 and/or b2 are not zero, what can I expect of how this model explains the DV? It must mean that the explained variation in the DV is statistically large enough, in comparison to the unexplained variation of the DV, to conclude that b1 and/or b2 are different from zero, statistically speaking.” Then, think of the alternative: “If b1 and b2 are both zero, what can I expect of this model in terms of explaining the DV? Well, if I plug zero in to the regression for b1 and b2, I can see that they don’t influence the DV. In this case, the ratio of explained variation in the DV to the unexplained variation is relatively low.”

Hope this example helps!

Lisaliu · March 14, 2015, 1:58pm

Thanks! I thought the same as you did. As it’s not explained in the curriculum, I assumed this is how it is. And it’s right based on what you said.

tickersu · March 14, 2015, 7:41pm

Glad to help!

Harrogath · March 14, 2015, 8:10pm

Yes, you right on that, the curriculum covers the topic of regression as you knew many things, some of them come from L1 and others you must already know from other sources, but is not much I think.

Harrogath · March 14, 2015, 8:20pm

Yes, if you have perfect multicollinearity you are, in deed, having no regression calculation. The result is ###!, nothing. Thats why it is a violation.

But a severe multicollinearity is a violation in a practical way. High M (around r > 0.7 for many variables) is just not healthy for your regression, and not even healthy, it turns it just useless . We already explained the problems of having high M.

If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion.

tickersu · March 14, 2015, 11:34pm

Harrogath:

tickersu:

Harrogath:

contradictory. Multicollinearity is a violation of the OLS assumptions and must be corrected dropping one variable.

Regards

Only perfect collinearity is a violation of an OLS assumption. Any other level of multicollinearity is not a violation of any OLS assumption.

You also do not necessarily need to drop one or more of the correlated indep. variables. It is a proposed remedy, but it depends on the purpose and intent of your research.

Yes, if you have perfect multicollinearity you are, in deed, having no regression calculation. The result is ###!, nothing. Thats why it is a violation. Right, I stated that perfect collinearity is a violation. As you mentioned, you can’t estimate the regression.

But a severe multicollinearity is a violation in a practical way. I would avoid calling it a violation, since there actually are assumptions that can be violated. Besides, you wouldn’t call a low R-squared a “practical violation”; it’s just something that might cause you to adjust your model. If you are fitting the regression equation for prediction purposes, then it’s okay to have multicollinearity (all you want to do is predict the DV**). If you want to make inferences and examine relationships from your model, then you need to address the multicollinearity issue.**

High M (around r > 0.7 for many variables) is just not healthy for your regression, and not even healthy, it turns it just useless. We already explained the problems of having high M. Not true that the model becomes “useless” (depends on your intent for the model). Go consult the literature on this. There is no threshold for multicollinearity being a problem. You are right that higher bivariate and “group” correlations are more likely to pose an issue, but there isn’t a guaranteed cutoff. In other words, you can’t look at the correlation and say, “Ah, it’s higher than X, so we have multicollinearity problems”…The severity of multicollinearity is diagnosed by looking at many factors [signs and magnitude of beta estimates, variance inflation factors (VIFs-- rule of thumb: if VIF is at least 10, you probably have an issue), etc.]. Multicollinearity can pose a problem with a bivariate correlation of 0.2 or it could not. It’s just something that needs to be diagnosed with a few tools.

If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion. I’m not sure what you’re saying here. The only time multicollinearity is an issue with the matrices is when you have perfect collinearity (singular matrix), as we both said earlier.

Harrogath · March 15, 2015, 4:58pm

Excerpt:

"If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion. I’m not sure what you’re saying here. The only time multicollinearity is an issue with the matrices is when you have perfect collinearity (singular matrix), as we both said earlier."

I meant that if the multicollinearity is high, the results are inaccurate and unreliable, so what can you conclude with a model with this problem? Certainly you can’t do much regardless of your research purpose. In the practice, if we use a set of independent variables with high multicollineariy (X matrix with high M), we need to drop variables until get an acceptable level of multicollinearity, at least to the level when the F, R2 and T-tests show a desirable behavior.

Anyway, I’m agree with you that there is no threshold when detecting multicollinearity, but we must keep eye to the problem of high levels of M.

tickersu · March 15, 2015, 9:08pm

Harrogath:

Excerpt:

"If your research purpose needs a X matrix with high M, and you need that X matrix at all cost; well, you can not conclude nothing so. That will be your conclussion. I’m not sure what you’re saying here. The only time multicollinearity is an issue with the matrices is when you have perfect collinearity (singular matrix), as we both said earlier."

I meant that if the multicollinearity is high, the results are inaccurate and unreliable (the coefficient estimates are), so what can you conclude with a model with this problem? You can still utilize your model for prediction purposes, without nearly as much concern as if you were using it for inferences (MC doesn’t interfere with the model’s fit to the data).

Certainly you can’t do much regardless of your research purpose. I’m going to leave it at this: use the internet to do some research on how multicollinearity influences model predictions.

In the practice, if we use a set of independent variables with high multicollineariy (X matrix with high M), we need to drop variables until get an acceptable level of multicollinearity, at least to the level when the F, R2 and T-tests show a desirable behavior. This is a good idea if you want to examine DV/IV relationships, but it’s not needed for predictions (actually plugging in IV values to get predicted DV values)

Lisaliu · March 27, 2015, 4:39pm

What you discussed gives me the impression you are Quantitative majors. To read through the curriculum and understand the major points are enough to tackle the practice problems behind and the exam, but to really digest each detail like being able to explain why for each phenomenon is something not quite possible for me who has 0 background knowledge and only has read Quantatitive of level 1 and 2 so far.

For example, the book says all the consequences for violation of each assumption for multiple regression models, but it might leave readers wonder why this consequence is caused, or how it can be explained. MC makes regression coefficient estimates extremely unreliable, meaning that the relationship between DV and each individual IVs may not be correctly explained by the slope coefficients, but the equation fits rather well. I took some time to digest it and finally took the previous example of the relationship between vocabulary, height and age. It’s maybe a lousy analoge but it helps to some degree. It could be a MC case taking vocabulary as DV and height and age as IVs. So it’s not that vocabulary has a direct relationship with height, as the tight relationship between height and age makes the relationship between vocabulary and height very unreliable, still the whole relationship might generate expected prediction.

Conditional heteroskedasticity (CH) results in consistant coefficient estimates but biased standard errors of them. This is harder to digest cause when it comes to standard errors it takes more effort to understand. It may be easier to imagine two IVs are related, but to think how standard errors are related with IVs is brain exhausting… the curriculum did not explain it using more easily understood examples and I guess that’s to compress knowledge and to avoid making the already bulky book bulkier.

To understand that it’s necessary to understand firstly what standard errors of coefficients really mean. For me with no background education I can only imagine how it means which can’t be varified.

Also serial correlation leads to small standard errors for regression coefficients. It’s a very abstract piece of information.

Admittedly, the exams aren’t responsible for making each person feel like what they learned from the books is comparable to that learned four years in college. It’s more of a open door to more explorations for people who wanna explore more.

Harrogath · March 28, 2015, 4:23am

Lisaliu you going good, keep that way! Combine reading, understanding and practice to be ready for the exam, we will success

tickersu · March 28, 2015, 4:41am

Think of standard errors as standard deviations for the estimator of a parameter. For example: let’s say you wanted to estimate the slope relating y to x1, B1 (the true value). Your regression would estimate b1 (the estimator). Now, if we did this many times (taking all possible samples from the population) to calculate all possible estimates of B1, we would have a distribution of b1 values. This distribution would have a variance and a standard error (again, think standard deviation).

A small note about serial correlation: it doesn’t necessarily lead to underestimated standard errors (only for pos. serial corr.). If the serial correlation is negative, the standard errors can be overestimated. The reason for underestimation (pos. serial corr.) and overestimation (neg. serial. corr.) is that the traditional calculation of the standard error does not account for the autocorrelation.

Hope this helps!

Lisaliu · March 28, 2015, 4:01pm

I certainly hope so! Add oil!