Quant concepts

I appreciate the compliment! But seriously, I’ve had a good deal of exposure to statistics, so I try to help where I can. The topics I’ve spent time with just happen to be what the CFAI likes to test!

Tickersu - strong concepts I must say.

Just to clarify my confusion once again, by testing the significance of the slope co-efficient, we intend to say that to does the independent variable significantly explain the variation in dependent variable or not. and in order to test the significance, we employ two methods i.e confidence interval and t-stat. For the CI, if the interval does not include zero in it, then we can say with x% (Whatsoever the CI is) that slope co-efficient is significantly different from zero which in turn means that it significantly explains the variation in dependent variable. In the case of t-test, is the calculted t stat exceeds the critical t-value, then we can say that independent variable significantly explains the variation in dependent variable. Am I correct?

I want to discuss autocorrelation a bit further.

Suppose yesterday’s stock price is $10. Today, based on yesterday’s stock price, the price predicted by regression is $12. Now, can you please share with me two sets of data which shows the prices of stock x at day 1 to day 7. One data set has no autocorrelation and the other data set should reflect the behavior of autocorrelation. that will help me compare the difference between the two

The difference between the R(square) and F-stat is that R(square) explains the variation in dependent variable due to one independent variable whereas F-stat tells us about the significance of a group of independent variables as a whole. Correct?

In heteroskedasticity(HR), the variance of residual is not constant across all observations. I am sharing an example, pls let me know if my concept is right. First, I want to tell you what exactly variance is. If the variance in stock price 4, it means that the price of the stock can move (+)(-) 4. If the variance is 6 it means the price can be (+)(-)6. hence the variance has increased from 4 to 6. Now coming to HR, by this we mean that if the variance of the observation changes then HR is present in regression. Here one question arises, why does the variance needs to be constant? If the variance is changing, what effect will it have on regression which makes it unfit for forecasting? Also, tell me what is actually meant by the variance of the residual? very simple and daily life example would help me a lot here

In multicollinearity(MC), we say that independent variables are correlated with another. Can somebody please share an example of this case? How does the correlation of independent variables effect the standard errors and co-efficient of standard errors? why are the standard errors are artificially inflated in MC problem? Going through a example is the best way for me to understand any concept. Try and share a daily-life example.

What is a ‘estimator’ in a regression equation?

what is meant by an unbiased estimator and a consistent estimator? Pls give examples of each.

To answer the question “Is X1 (IV) a statistically useful predictor of Y (DV)?” We can use either the t-test or the confidence interval, or both. The confidence interval gives you more information because it gives you a range for the true value, whereas the test statistic merely lets you make a conclusion for your hypothesis test.

So a confidence interval can let you avoid doing any t-tests. For example, you want to see if the slope is different from zero, so you do a t-test. Significant-- now you want to see if the slope is greater than 1, and you conduct another t-test. Your chance of making a Type I Error (compound) has gone up since you are conducting more tests. However, a confidence interval would tell us the answer to both questions in one step.

I think you have the idea!

Not sure entirely what you mean here. The important thing about autocorrelation is that it can only occur in time-series data. However, the tests (Durbin-Watson), can “detect” autocorrelation in non time-series data (a false alarm). This is why it is important to ONLY test for autocorrelation in time-series data that are likely to exhibit the autocorrelation.

A good way to see autocorrelation is to look at a plot of the model errors (residuals) against time. Since time series data are ordered through time, a plot of the residuals will show a trend if strong autocorrelation is present. For example, the plot may show a long positive trend (positive residuals), followed by a long negative trend (negative residuals), and this can reverse. This would be an example of positive autocorrelation. If the data do not exhibit autocorrelation, the residuals plotted against time would likely not show a pattern like this (again autocorrelation can “appear” in cross sectional data, but this is a false alarm-- only do this with time-series data to avoid incorrect conclusions).

R-square tells us the percentage of sample variation in the DV explained by the model, however many independent variables are in the model-- can be one or many.

The F-statistic for the entire model could be used for a model with one IV or many, it just tells us in a statistical sense how much “better” our model (IVs) are at explaining the DV, than a model using only the average DV value for prediction.

Both can be referring to one or many. However, the F-statistic in regression typically is used for joint hypotheses, as you are saying, many IVs, as a group.

I’ve made some posts on these in the past (long posts).Try using the search function, or filter through my posts first. If you still have some questions let me know. This would be a little easier since I am at work.

It is our “best guess and attempt” at the true value of the parameter of interest (the true intercept, the true slope, etc). An unbiased estimator implies that the expected value of the estimator is equal to the true value of the parameter of interest–

E(Beta i-hat) = Beta i

where E(Beta i-hat) is the expected value of the estimated slope relating the DV to Xi (ith independent variable) and Beta i is the true value of the slope relating the DV to Xi, the ith independent variable.

A consistent estimator means that as the sample size, n, approaches infinity, the value of the estimator gets closer and closer to the parameter’s true value. In other words, the difference between the estimator and the true value approaches zero. This is because as n grows, the standard error on the sampling distribution shrinks (inversely related to square root of n), and the sampling distribution becomes tighter, eventually converging (in probability) to the true value. Basically, the probability that our estimator is very (very very) close to the true value of the parameter increases towards 1 as n gets larger.

If an estimator is biased AND the bias approaches zero as n grows, then the estimator is still consistent.

You’re conflating two ideas here.

A consistent estimator is one for which the standard error approaches zero as the sample size approaches infinity. However, _ a consistent estimator does not have to converge to the value of the population parameter it is estimating _.

An _ unbiased _ estimator’s value converges to the value of the population parameter it is estimating.

For example, the statistic that CFA Institute calls the sample standard deviation (in which the denominator is n – 1) is properly known as the _ bias-adjusted sample standard deviation_; the (not bias-adjusted) sample standard deviation has a denominator of n. Both are consistent, but only the bias-adjusted sample standard deviation converges to the value of the population’s standard deviation.

See below…

Yes, the expected value of an unbiased estimator is the value of the parameter it estimates. I was trying to follow your lead about sample size increasing.

I see-- but I mentioned the value of the estimator, not its expected value. I was referring to an increasing sample size with respect to consistency, as it applies to consistency rather than (un)biasedness. If I recall, the decreasing standard error implies that our estimator is getting closer to the true value.

I’ve also found another source without using limits to show that a consistent estimator is an estimator that does converge to the true value of the parameter. https://people.richland.edu/james/lecture/m170/ch08-def.html

can you narrate one example in which the slope co-efficient is statistically significant but it is not practically/economically useful as you mentioned in your previous post.

i searched a bit but didnt find relevant posts.

It is okay you take your time. I can wait for the answer. But please do take out some time for this. Quant is a tough topic for me to handle

First, getting statistical significance is much “easier” as you use larger samples. It’s a result that stems from how standard errors are calculated. So, you might find that a result is statistically significant because you used a really large sample (shrinking the standard error and increasing the test statistic). So you should look into (give a meaningful interpretation) the result you find.

For example: We are predicting an exam score measured in points out of 100 (DV) based on the number of hours a student spends studying (X). Let’s say the estimated slope is 0.001 and we find it is statistically significant. But, what does the slope tell us? It says: we expect (on average) a student’s exam score to increase by 0.001 points out of 100 for each additional hour the student studies for the exam. In other words, the student (on average) can expect a 1 point increase (out of 100 points total) on his or her exam score by studying an addition 1,000 hours for the exam (I just multiplied the slope estimate by 1,000 to make it a whole number). Now, we have a statistically significant slope estimate, but the practical implication is that studying has nearly no effect on the exam score (unlikely, but this is just my example). Economic significance is the same idea as practical significance. Is there much value in what we found (the impact of x on y)? Does it really help us better understand what influences the DV?