Quant concepts

what does the statement “test whether an estimated slope co-efficient is significantly different from zero” mean? and why do we build a confidence interval for that?

It means exactly what it says. You are testing to see whether the true population parameter is different form zero given what the sample data tells you it is from running the regression.

If you construct a confidence interval and the confidence interval CONTAINS zero, you can say that there is not sufficient evidecne to support rejecting the null hypothesis that the slope coefficent is different from zero.

If the condience interval DOES NOT CONTAIN zero, you can reject the null hypothesis that the slope coefficient iss zero. This is because the evidence (confidence interval) says that (1-alpha)% of the time, the values of the point estimates of the slope coefficients will not lie within that interval (which means it will not be zero since the interval does not contain zero).

Thanks dwheats. It really helped

Can you also share one simple and daily life example of autocorrelation. Also, please tell that how autocorrelation is a problem and distorts the regression model. I am working on clearing my concepts so I am digging into the details.

Autocorrelation will violate the Guass Markov theorem, so your estimates are no longer the Best Linear Unbaised Estimators (BLUE) and they will tend to underestimate the standard errors of the coefficients, but usually not the coefficients themselves. This will inflate the t-value of a hypothesis test and result in an increase of Type 1 error (rejecting the null when it is true).

Any example of autocorrelation please.

Monthly sales for an established retailer – Amazon, say – probably has a high positive autocorrelation with a lag of 12 months: they probably have a surge in sales in November and December, followed by an ebb in January and, possibly, February.

If you have zero in the confidence interval, you have insufficient evidence to support the alternative hypothesis that the coefficient is different from zero. In other words, you have insufficient evidence to conclude that the true slope coefficient is different from zero. Your wording is confusing

Also, this part is _ incorrect _ and confusing: “This is because the evidence (confidence interval) says that (1-alpha)% of the time, the slope coefficients population value will not lie within that interval (which means it will not be zero since the interval does not contain zero).”

Just above, you gave a common, _ incorrect _ interpretation of a confidence interval. The true slope is either in the interval you calculate or it is not- this is a binary situation. If, say, the true slope is 2, it is always 2, 100% of the time, not 99%, not 95% (or any (1-alpha)*100%) of the time. So, for a given calculated interval, the true slope is either contained in the interval, or it is not.

Theoretically, the level of confidence means that (1-alpha)*100% of all possible similarly created confidence intervals will contain the true slope. In other words, if we replicate our sample from that population again and again and again (for all possible samples), same size and identical sampling, (1-alpha)*100% of the intervals we calculate will contain the value of the true slope, and alpha*100% will not capture the true slope value.

To the original question: Why would you do this test?

Your objective is to see if the given independent variable has a statistically significant relationship with the dependent variable. In other words, can this independent variable explain changes in the dependent variable?

It means exactly what it says. You are testing to see whether the true population parameter is different form zero given what the sample data tells you it is from running the regression.

If you construct a confidence interval and the confidence interval CONTAINS zero, you can say that there is not sufficient evidecne to support rejecting the null hypothesis that the slope coefficent is different from zero.

If the condience interval DOES NOT CONTAIN zero, you can reject the null hypothesis that the slope coefficient iss zero. This is because the evidence (confidence interval) says that (1-alpha)% of the time, the values of the point estimates of the slope coefficients will not lie within that interval (which means it will not be zero since the interval does not contain zero).

Wow, having a very hard time editing my comment.

I think the above post should have fixed it.

Thanks for catching the error, tickersu.

Tickersu - A very simple and comprehensive answer. You should be a professor man. laugh

Thanks a lot!.

One thing I havent been able to understand up till now. Say there is autocorrelation and the today’s stock price is correlated with previous day stock price. What is actually the problem in it which does not renders the regression equation BLUE? Why does this autocorrelation distort the regression equation? and then what does the corrective procedure do to solve this autocorrelation? please elaborate this point for me

I don’t intend to sound rude, so please don’t take offense.

This statement is not correct: “This is because the evidence (confidence interval) says that (1-alpha)*100% of the time, the values of the point estimates of the slope coefficients will not lie within that interval (which means it will not be zero since the interval does not contain zero).”

This is what I was hoping to convey. The level of confidence, (1-alpha)*100%, does not refer to any single interval exactly, and it does not refer to the values of the point estimates (Point estimate values are known with certainty, since we observe and calculate them. True values are not known with certainty, hence we use a measure of reliability when we make a statement about them). The level of confidence is a theoretical idea that refers to all possible intervals that are calculated from samples of equal size from the same population. Of these possible intervals (infinitely many), (1-alpha)*100% of them will have the true (population) value of the parameter of interest (slope coefficient in this case) contained within them. For any single interval we calculate, the true value is either in the interval or it is not-- we do not know for cetain. However, given the theoretical concept of a confidence level, we can say we are (1-alpha)*100% confident that the true value of the parameter is within our single interval, and we can make a conclusion about the value of the parameter. So, we can see that the (random) interval we calculate has a probability of capturing the true parameter value. This probability is equal the confidence level.

Edit: I just wanted to make sure I tied together the theoretical idea of a confidence level to the practical idea.

^ You are right. It is better to say then that if repeated, the calculated confidence intervals will encompass the true population parameter (1-alpha)% of the time.

No problem, glad I could help!

Say there is autocorrelation and the today’s stock price is correlated with previous day stock price. What is actually the problem in it which does not renders the regression equation BLUE? Why does this autocorrelation distort the regression equation?

Serial (auto) correlation usually underestimates the standard errors for the estimated coefficients (given positive autocorrelation, which is most common). This is because the dependency of the errors causes them to be less random in their distribution, which reduces the variation in their distribution (and therefore, understates their variance, which is used to calculate standard errors for the coefficients). If, say, you use yesterday’s stock price to predict today’s stock price, you will (likely) cause an independent variable to become correlated with the error term. This would violate one of the regression assumptions that the error term is unrelated to the independent variables. This would introduce bias into the estimated slope coefficient that is correlated with the error term. (Going into this further would probably be easier with a more quantitative aspect, which would require deriving the OLS slope coefficients to show that the bias in the presence of a lagged dependent variable(s).) Given the biased estimates and the deflated standard errors for the estimates, OLS is no longer BLUE, and the statistical significance will appear distorted.

and then what does the corrective procedure do to solve this autocorrelation?

A common method to solve this problem is to fit an autoregressive model. If we have first-order autocorrelation, we can fit a first-order autoregressive model. Essentially, our new model will have a component that accounts for the correlation in the error terms (a model within our model, if you will). There are other methods that also work, such as differencing. I am unsure what methods the CFAI LII material covers.

Guys… when do we use the test statistic or the hypothesized value for hypothesis testing in regards to the population values of the regression coefficients? I’m having trouble with example 16 on p.291 for the hypothesis testing for the null hypothesis that the slope = 1. The test statistic is outside 5 the confidence interval but they don’t reject the null because the hypothesized value is within the confidence interval. What’s the rule of when to use what?

I’m not sure exactly what the question says, but you are not seeing if the t-statistic is inside/outside of the CI. You compare the t-statistic to a rejection region. More simply, though, just look up the p-value and compare it to the selected/given alpha. When you look at the confidence interval, you compare your hypothesized true value to the confidence interval.

You can use either a confidence interval or a test statistic/p-value to get the answer. If done properly, you will get the same conclusion. If you are using the test statistic, you usually need to calculate it yourself when you are testing for a “true” parameter value different than zero. The regression output t-statistics are typically calculated to determine if the coefficient is statistically different from zero. If you want to see if it is different from 1, for example, you must calculate:

(Beta coefficient-1)/(standard error for beta coefficient) = t-statistic.

The confidence interval is better, though. The confidence interval does not need to be adjusted based on the hypothesized true parameter value. You can interpret the confidence interval, and it will tell you more information than the t-statistic alone.

For example, a 95% CI of (1.5, 2.2) [just made that up for the example] allows us to conclude with 95% confidence that the true slope coefficient is different from zero and is also greater than 1, since neither are inside the interval. In fact, we are 95% confident that the true slope is between 1.5 to 2.2.

However, if my interval was (.95, 1.5) [again, chosen for my example], we could not conclude that the true slope is any different from 1 (if we were testing to see if it was different from 1). In fact, we cannot say the true slope is any different from any of the values between 0.95 to 1.5, at a level of 95% confidence.

Hope this helps!

So if we use test statistics use them relative to the critical t-values and if comparing hypothesized values use them relative to the confidence intervals…?

More or less… Assuming you are doing a t-test: t-statistics are standardized values, so compare them against the critical t-values, as you said. A confidence interval gives you a range where the true parameter value is contained (at a given level of confidence), so you can compare your hypothesis for the true parameter value to the confidence interval, again as you said.

Dude how the f*ck are you a level 1 candidate lol?