Quant-Regression

Harrogath wrote:

Tickersu, are you saying that the book is wrong at calculating the Standard Error of a Regression Coefficient (i.e SEE)?

I don’t have the book, so I can’t see it, but it wouldn’t surprise me at all. They have been wrong on plenty of things in the past. Many of their answers or interpretations of confidence intervals or p-values have been wrong when I skimmed answer solutions; they often suggested that there was a 95% chance the true value is in a specific CI say (2,10) and they had also suggested p-values represent the probability the null hypothesis is true or that a Type I error was made– these are both elementary mistakes that would be covered in early undergraduate coursework. However, the SEE is different from the standard error of say the slope relating Y and X1. The SE for a slope is the model standard deviation for the errors (SEE) divided by the square root of the (basically) variance of X1.

Harrogath wrote:
Because the sources I can consult say that SEx can be calculated as SD(x) / n0.5 (for 1 independent variable regression).
You can share some or send me a picture of the book (if it’s a book) in the PM. The Standard error of X-bar would be the standard deviation of X divided by the square root of N. As far as I know, this isn’t correct for the standard error of X’s regression coefficient (like I’ve never seen anything suggested that).

Harrogath wrote:
How would you use “model standard deviation for error” to arrive at coefficient standard error? Can you describe better the calculation?
If Seestimated standard deviation of model error (SEE some people call it), and SSxxis the sum of squared deviations of X around it’s mean, (X-Xbar)2, then to calculate the standard error of betaxin the regression

E(Y) = b0 + betax(X)  then Var(betax)= Se2 / SSxxand SE(betax)= [Var(betax)]0.5= [Se2 / SSxx]0.5

This is for simple linear regression. In multiple regression, the denominator of each estimated coefficient variance also is adjusted by (1-Rj2) where Rj2 is the R2 from a regression of Xj as the dependent variable predicted by all other independent variables from the main model. This makes it really straight forward to see why and how higher degrees of multicollinearity between Xj and the other IVs (Rj2 is larger) inflate the variance and standard error of betaj.

https://stats.stackexchange.com/questions/88461/derive-variance-of-regre...

https://stats.stackexchange.com/questions/342632/how-to-understand-se-of...

Harrogath wrote:

Have you ever tried to derive the standard error of Pearson correlation coefficient? Have you seen the links I provided? Those are not intro calculations :)

I haven’t tried to derive the SE for a Pearson correlation, no. I have had to derive other things in regression (variances, slope estimators), but mostly these are things an undergraduate statistics major would be expected to do in class. However, I was not saying they should put a derivation in the book, I was saying they should use the simple algebraic manipulation and substitution that you, S2000, and I have used in posts to show why the t-statistic for rho is not so weird and different, it’s just the CFAI hasn’t taken two seconds to show algebra to the reader.

Harrogath wrote:
I think we have had this discussion before and here we go again: CFAI quant book is not meant for statisticians, rather for students that will probably use regressions and other related calculations in their professional careers. We will probably use software that will do the calculation for us, so the problem lays in correctly interpreting the results that a machine provides, understanding real life and passing that reality to numbers the most correct possible.
I agree the CFAI is not “for statisticians” but that doesn’t give them a pass to press incorrect material as correct, especially given the whole “we strive for excellence and world-class education”– why not hire one legitimate statistician (honestly, someone with an undergraduate statistics degree would suit most of their purposes but ideally at least an MS in statistics) for reviewing the quant curriculum? This is how universities hire statisticians to teach intro to stats to non-stats majors and it works well when you pick someone who can teach. I agree with you that this is for non-statisticians, but the CFAI shouldn’t be puffing out it’s chest trying to show more mathematical detail when they do it poorly (either carelessness or inability to do it better). If they’re going to do it, it should be done correctly. People often “run stats” without knowing which way is up but only how to get a p-value that they likely misinterpret as an error probability or as a probability that the null is true. When the software gives a warning or an error, they haven’t the slightest clue to go about fixing it, and I have seen people just use what the program gave despite a serious error (until they can’t move forward with a project anymore and end up asking for help). No more rant :) but I just think the CFAI likes to puff out it’s chest but this is one area they need to improve if they want to purport “world class education.”

Harrogath wrote:
Are you an actuarial analyst btw?
I am not. :p
"Using Wiley for my CFA journey was by far the best option… I was able to pass on my first attempt.”– Moe E., Canada

Can’t recall if I mentioned this, but the standard error of x-bar would be (SDx)*(n)-0.5  where the SDx is standard deviation of X, but each sample statistic has its own calculation for standard error, as a general rule.

tickersu wrote:

Can’t recall if I mentioned this, but the standard error of x-bar would be (SDx)*(n)-0.5  where the SDx is standard deviation of X, but each sample statistic has its own calculation for standard error, as a general rule.

LOL you just read badly my original post. You arrived to the beginning.

I agree with most of your comments btw, however keep in mind that CFAI tries to train students in a wide range of disciplines rather than 1 only discipline which is the one most experience you have. I know it is easy to see incomplete or not rigorous enough definitions at this time.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Harrogath wrote:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5    The SEx is always given in an ANOVA table………

I’m quoting this to show that I don’t believe I misread your initial post. You clearly make the assertion that the standard error for a slope coefficient in linear regression is equal to SD(x) / n0.5 it’s right in your post. This is incorrect. That formula is not how you calculate the standard error for any slope coefficient in linear regression. I stand by my posts to explain all of this. My most recent post was only to show that this is the standard error of the mean (x-bar), so it is one calculation for a standard error of something but not a regression slope; that something is x-bar. If they claim that is the formula for a slope standard error, they’re flat out wrong. It’s not incorrect. It’s not incomplete.

I would also say that again, the CFAI is trying to teach the material; it should be correct. Saying that they want to teach a broad curriculum does not excuse them from doing it correctly. It’s inappropriate to claim you’re world class yet don’t have the right people in the right places. It’s even worse when they won’t admit to when they have things outright incorrect by any resource they could use to verify. Mistakes happen, but not correcting them or admitting to them is far worse.

tickersu wrote:

Harrogath wrote:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5    The SEx is always given in an ANOVA table………

I’m quoting this to show that I don’t believe I misread your initial post. You clearly make the assertion that the standard error for a slope coefficient in linear regression is equal to SD(x) / n0.5 it’s right in your post. This is incorrect. That formula is not how you calculate the standard error for any slope coefficient in linear regression. I stand by my posts to explain all of this. My most recent post was only to show that this is the standard error of the mean (x-bar), so it is one calculation for a standard error of something but not a regression slope; that something is x-bar. If they claim that is the formula for a slope standard error, they’re flat out wrong. It’s not incorrect. It’s not incomplete.

I would also say that again, the CFAI is trying to teach the material; it should be correct. Saying that they want to teach a broad curriculum does not excuse them from doing it correctly. It’s inappropriate to claim you’re world class yet don’t have the right people in the right places. It’s even worse when they won’t admit to when they have things outright incorrect by any resource they could use to verify. Mistakes happen, but not correcting them or admitting to them is far worse.

With all due respect, but you have misunderstood the whole thread. We were talking about T-statistics.

OP asked why the T-statistic value of the slope coefficient in a simple regression equation is exactly (or very approximated) to the T-statistic value of a Pearson’s correlation coefficient between the same variables used in the simple regression.

If you read carefully again what I wrote:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5

In any moment I said that SEx is the SE for a slope coefficient as a general rule, but a part of the calculation of T-statistic for the slope coefficient in simple regression.

Just to say it in your own notation: Any “x” you see in the formulas above is “x-bar” which is a vector of data observations. Are we agree with that?

Also, in your first reply you said the demonstration of the variance of Pearson’s correlation coefficient was a introductory calculation, and that any student would be asked to derive in class; however, you haven’t calculated it so far [in your life]. I must tell you that it is, in fact, advanced statistics.

Hope this helps.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Harrogath wrote:

With all due respect, but you have misunderstood the whole thread. We were talking about T-statistics.

There is no misunderstanding here. The t-statistic for the slope coefficient is not calculated in the manner you claim. It does not use the standard error of x-bar.

Harrogath wrote:
OP asked why the T-statistic value of the slope coefficient in a simple regression equation is exactly (or very approximated) to the T-statistic value of a Pearson’s correlation coefficient between the same variables used in the simple regression.
Again, there is no misunderstanding here.

Harrogath wrote:
If you read carefully again what I wrote:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5

In any moment I said that SEx is the SE for a slope coefficient as a general rule, but a part of the calculation of T-statistic for the slope coefficient in simple regression.

Your final sentence is unclear. Further, the bold is factually, mathematically incorrect regarding the t-stat for a slope (re: standard error calculation). There is no disputing what you have written because you defined the variable in question (and have done so incorrectly).

Harrogath wrote:
Just to say it in your own notation: Any “x” you see in the formulas above is “x-bar” which is a vector of data observations. Are we agree with that?
X-bar usually represents the sample mean, a scalar, not a vector of observed values. In either case, read again the variables you defined above.

Harrogath wrote:
Also, in your first reply you said the demonstration of the variance of Pearson’s correlation coefficient was a introductory calculation, and that any student would be asked to derive in class; however, you haven’t calculated it so far [in your life]. I must tell you that it is, in fact, advanced statistics.

Hope this helps.

I believe I said an undergraduate statistics major would be asked to derive it. I was not a statistics major and my lack of derivation of this item does not make it advanced statistics. Open an actual statistical text used at the undergraduate level for statistics majors; it’s in there. Your argument of “tickersu hasn’t done it, so it’s advanced” is flattering, although illogical.

I won’t comment further on this. The solution is to read what you wrote earlier and how you defined your variables. Then look at a statistical resource that shows the standard error of a slope in OLS is different from what you’ve said. https://stats.stackexchange.com/questions/91750/how-is-the-formula-for-t...