Quant-Regression

I was referring to the below problem 4, pg 398, Multiple regression.

In early 2001, US equity marketplaces started trading all listed shares in minimal increments (ticks) of $0.01 (decimalization). After decimalization, bid–ask spreads of stocks traded on the NASDAQ tended to decline. In response, spreads of NASDAQ stocks cross-listed on the Toronto Stock Exchange (TSE) tended to decline as well. Researchers Oppenheimer and Sabherwal (2003) hypothesized that the percentage decline in TSE spreads of cross-listed stocks was related to company size, the predecimalization ratio of spreads on NASDAQ to those on the TSE, and the percentage decline in NASDAQ spreads. The following table gives the regression coefficient estimates from estimating that relationship for a sample of 74 companies. Company size is measured by the natural logarithm of the book value of company’s assets in thousands of Canadian dollars.

The average company in the sample has a book value of assets of C$900 million and a predecimalization ratio of spreads equal to 1.3. Based on the above model, what is the predicted decline in spread on the TSE for a company with these average characteristics, given a 1 percentage point decline in NASDAQ spreads?

Now here, percentage is given, but in solution, we used 1…and not 1%…

Percentage decline in TSE spread = –0.45 + 0.05(ln 900,000) – 0.06(1.3) + 0.29(1). Because I used -0.01, my answer was wrong. So, I understood that since decline is asked, I should not use -1. But still, even if I use 0.01, my answer is incorrect.

You need to pay attention to the building of the variable. The variable is “Percentage change in NASDAQ spreads”, so if you see the raw data of that variable you will see: 2 , 1 , -5 , 4 , etc, because the unit is in % already. The same with company size, the raw data is not on $ like 900,000 or 1,200,000, but in napierian logarithm: ln(900,000)

Pay attention to the description of the variables in order to interpret well the outcomes of the model.

Hope this helps.

This is a natural logarithm.

Properly, a Napierian logarithm is defined as:

NapLog(x) = −107ln(x/107)

Indeed, my friend. Wanted to say natural log :stuck_out_tongue:

Got that, thanks…

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula…?

Not a foolish question.

A line is that, a line. A known and defined line which passes through at least 2 points of the space (n-dimensional).

A regression line, on the other hand, is a simplification of a dispersion of observations and is subject to an error. This is why, the slope of regression is calculated based on the relative relation of dispersion measures (covariances and variances).

Thanks… :slight_smile:

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula…?

Calculus.

When you determine the formula for the line that minimizes the sum of the squared deviations from the line to the data points, the slope you get is given by the covariance/variance formula.

This is not the calculation of the standard error for a regression coefficient. The standard error for the coefficient is given by the residual standard deviation (SEE) divided by the sum of squares of that x-variable about it’s mean: basically, the model standard deviation for error is scaled by total variation in that particular x-variable to arrive at the standard error for that x-variable’s coefficient estimate. As a general rule, standard errors for different statistics are calculated differently.

I wouldn’t call this “advanced statistics”-- it’s just the CFAI is pretty pathetic and you can tell by 1) the way they ask questions, 2) the way they try to phrase explanations, and 3) the way they argue to defend incorrect topics/questions despite real statistical references. I don’t think it’s advanced for the books, I just don’t think they saw the connection so easily otherwise they would have made a note to try explaining the quick connection (as they do in other areas).

By the way, I remember when I floated this really simple way to show why the t-stat for the Pearson correlation is the exact same form of the other test statistics.

Finally, I can’t recall, but do they also point out that the t-statistic for a slope in SLR is the same as the square root of the F-statistic in that same SLR model?

Tickersu, are you saying that the book is wrong at calculating the Standard Error of a Regression Coefficient (i.e SEE)? Because the sources I can consult say that SEx can be calculated as SD(x) / n0.5 (for 1 independent variable regression).

How would you use “model standard deviation for error” to arrive at coefficient standard error? Can you describe better the calculation?

Have you ever tried to derive the standard error of Pearson correlation coefficient? Have you seen the links I provided? Those are not intro calculations :slight_smile:

I think we have had this discussion before and here we go again: CFAI quant book is not meant for statisticians, rather for students that will probably use regressions and other related calculations in their professional careers. We will probably use software that will do the calculation for us, so the problem lies in correctly interpreting the results that a machine provides, understanding real life and passing that reality to numbers the most correct possible.

Are you an actuarial analyst btw?

I don’t have the book, so I can’t see it, but it wouldn’t surprise me at all. They have been wrong on plenty of things in the past. Many of their answers or interpretations of confidence intervals or p-values have been wrong when I skimmed answer solutions; they often suggested that there was a 95% chance the true value is in a specific CI say (2,10) and they had also suggested p-values represent the probability the null hypothesis is true or that a Type I error was made-- these are both elementary mistakes that would be covered in early undergraduate coursework. However, the SEE is different from the standard error of say the slope relating Y and X1. The SE for a slope is the model standard deviation for the errors (SEE) divided by the square root of the (basically) variance of X1.

You can share some or send me a picture of the book (if it’s a book) in the PM. The Standard error of X-bar would be the standard deviation of X divided by the square root of N. As far as I know, this isn’t correct for the standard error of X’s regression coefficient (like I’ve never seen anything suggested that).

If Se estimated standard deviation of model error (SEE some people call it), and SSxx is the sum of squared deviations of X around it’s mean, (X-Xbar)2, then to calculate the standard error of betax in the regression

E(Y) = b0 + betax (X) then Var(betax)= Se2 / SSxx and SE(betax)= [Var(betax)]0.5= [Se2 / SSxx]0.5

This is for simple linear regression. In multiple regression, the denominator of each estimated coefficient variance also is adjusted by (1-Rj2) where Rj2 is the R2 from a regression of Xj as the dependent variable predicted by all other independent variables from the main model. This makes it really straight forward to see why and how higher degrees of multicollinearity between Xj and the other IVs (Rj2 is larger) inflate the variance and standard error of betaj.

https://stats.stackexchange.com/questions/88461/derive-variance-of-regression-coefficient-in-simple-linear-regression

https://stats.stackexchange.com/questions/342632/how-to-understand-se-of-regression-slope-equation

I haven’t tried to derive the SE for a Pearson correlation, no. I have had to derive other things in regression (variances, slope estimators), but mostly these are things an undergraduate statistics major would be expected to do in class. However, I was not saying they should put a derivation in the book, I was saying they should use the simple algebraic manipulation and substitution that you, S2000, and I have used in posts to show why the t-statistic for rho is not so weird and different, it’s just the CFAI hasn’t taken two seconds to show algebra to the reader.

I agree the CFAI is not “for statisticians” but that doesn’t give them a pass to press incorrect material as correct, especially given the whole “we strive for excellence and world-class education”-- why not hire one legitimate statistician (honestly, someone with an undergraduate statistics degree would suit most of their purposes but ideally at least an MS in statistics) for reviewing the quant curriculum? This is how universities hire statisticians to teach intro to stats to non-stats majors and it works well when you pick someone who can teach. I agree with you that this is for non-statisticians, but the CFAI shouldn’t be puffing out it’s chest trying to show more mathematical detail when they do it poorly (either carelessness or inability to do it better). If they’re going to do it, it should be done correctly. People often “run stats” without knowing which way is up but only how to get a p-value that they likely misinterpret as an error probability or as a probability that the null is true. When the software gives a warning or an error, they haven’t the slightest clue to go about fixing it, and I have seen people just use what the program gave despite a serious error (until they can’t move forward with a project anymore and end up asking for help). No more rant :slight_smile: but I just think the CFAI likes to puff out it’s chest but this is one area they need to improve if they want to purport “world class education.”

I am not. :stuck_out_tongue:

Can’t recall if I mentioned this, but the standard error of x-bar would be (SDx)*(n)-0.5 where the SDx is standard deviation of X, but each sample statistic has its own calculation for standard error, as a general rule.

LOL you just read badly my original post. You arrived to the beginning.

I agree with most of your comments btw, however keep in mind that CFAI tries to train students in a wide range of disciplines rather than 1 only discipline which is the one most experience you have. I know it is easy to see incomplete or not rigorous enough definitions at this time.

I’m quoting this to show that I don’t believe I misread your initial post. You clearly make the assertion that the standard error for a slope coefficient in linear regression is equal to SD(x) / n0.5 it’s right in your post. This is incorrect. That formula is not how you calculate the standard error for any slope coefficient in linear regression. I stand by my posts to explain all of this. My most recent post was only to show that this is the standard error of the mean (x-bar), so it is one calculation for a standard error of something but not a regression slope; that something is x-bar. If they claim that is the formula for a slope standard error, they’re flat out wrong. It’s not incorrect. It’s not incomplete.

I would also say that again, the CFAI is trying to teach the material; it should be correct. Saying that they want to teach a broad curriculum does not excuse them from doing it correctly. It’s inappropriate to claim you’re world class yet don’t have the right people in the right places. It’s even worse when they won’t admit to when they have things outright incorrect by any resource they could use to verify. Mistakes happen, but not correcting them or admitting to them is far worse.

With all due respect, but you have misunderstood the whole thread. We were talking about T-statistics.

OP asked why the T-statistic value of the slope coefficient in a simple regression equation is exactly (or very approximated) to the T-statistic value of a Pearson’s correlation coefficient between the same variables used in the simple regression.

If you read carefully again what I wrote:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5

In any moment I said that SEx is the SE for a slope coefficient as a general rule, but a part of the calculation of T-statistic for the slope coefficient in simple regression.

Just to say it in your own notation: Any “x” you see in the formulas above is “x-bar” which is a vector of data observations. Are we agree with that?

Also, in your first reply you said the demonstration of the variance of Pearson’s correlation coefficient was a introductory calculation, and that any student would be asked to derive in class; however, you haven’t calculated it so far [in your life]. I must tell you that it is, in fact, advanced statistics.

Hope this helps.

There is no misunderstanding here. The t-statistic for the slope coefficient is not calculated in the manner you claim. It does not use the standard error of x-bar.

Again, there is no misunderstanding here.

Your final sentence is unclear. Further, the bold is factually, mathematically incorrect regarding the t-stat for a slope (re: standard error calculation). There is no disputing what you have written because you defined the variable in question (and have done so incorrectly).

X-bar usually represents the sample mean, a scalar, not a vector of observed values. In either case, read again the variables you defined above.

I believe I said an undergraduate statistics major would be asked to derive it. I was not a statistics major and my lack of derivation of this item does not make it advanced statistics. Open an actual statistical text used at the undergraduate level for statistics majors; it’s in there. Your argument of “tickersu hasn’t done it, so it’s advanced” is flattering, although illogical.

I won’t comment further on this. The solution is to read what you wrote earlier and how you defined your variables. Then look at a statistical resource that shows the standard error of a slope in OLS is different from what you’ve said. https://stats.stackexchange.com/questions/91750/how-is-the-formula-for-the-standard-error-of-the-slope-in-linear-regression-deri