Quant-Regression

gargijain · October 24, 2018, 1:02am

I was referring to the below problem 4, pg 398, Multiple regression.

In early 2001, US equity marketplaces started trading all listed shares in minimal increments (ticks) of $0.01 (decimalization). After decimalization, bid–ask spreads of stocks traded on the NASDAQ tended to decline. In response, spreads of NASDAQ stocks cross-listed on the Toronto Stock Exchange (TSE) tended to decline as well. Researchers Oppenheimer and Sabherwal (2003) hypothesized that the percentage decline in TSE spreads of cross-listed stocks was related to company size, the predecimalization ratio of spreads on NASDAQ to those on the TSE, and the percentage decline in NASDAQ spreads. The following table gives the regression coefficient estimates from estimating that relationship for a sample of 74 companies. Company size is measured by the natural logarithm of the book value of company’s assets in thousands of Canadian dollars.

The average company in the sample has a book value of assets of C$900 million and a predecimalization ratio of spreads equal to 1.3. Based on the above model, what is the predicted decline in spread on the TSE for a company with these average characteristics, given a 1 percentage point decline in NASDAQ spreads?

Now here, percentage is given, but in solution, we used 1…and not 1%…

Percentage decline in TSE spread = –0.45 + 0.05(ln 900,000) – 0.06(1.3) + 0.29(1). Because I used -0.01, my answer was wrong. So, I understood that since decline is asked, I should not use -1. But still, even if I use 0.01, my answer is incorrect.

Harrogath · October 24, 2018, 2:07am

You need to pay attention to the building of the variable. The variable is “Percentage change in NASDAQ spreads”, so if you see the raw data of that variable you will see: 2 , 1 , -5 , 4 , etc, because the unit is in % already. The same with company size, the raw data is not on $ like 900,000 or 1,200,000, but in napierian logarithm: ln(900,000)

Pay attention to the description of the variables in order to interpret well the outcomes of the model.

Hope this helps.

S2000magician · October 24, 2018, 2:27am

This is a natural logarithm.

Properly, a Napierian logarithm is defined as:

NapLog(x) = −10⁷ln(x/10⁷)

Harrogath · October 24, 2018, 3:12am

Indeed, my friend. Wanted to say natural log

gargijain · October 24, 2018, 2:48pm

Got that, thanks…

gargijain · October 27, 2018, 4:02pm

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula…?

Harrogath · October 28, 2018, 4:40am

Not a foolish question.

A line is that, a line. A known and defined line which passes through at least 2 points of the space (n-dimensional).

A regression line, on the other hand, is a simplification of a dispersion of observations and is subject to an error. This is why, the slope of regression is calculated based on the relative relation of dispersion measures (covariances and variances).

gargijain · October 28, 2018, 8:56am

Thanks…

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula…?

S2000magician · October 28, 2018, 4:39pm

Calculus.

When you determine the formula for the line that minimizes the sum of the squared deviations from the line to the data points, the slope you get is given by the covariance/variance formula.

tickersu · October 29, 2018, 12:05pm

This is not the calculation of the standard error for a regression coefficient. The standard error for the coefficient is given by the residual standard deviation (SEE) divided by the sum of squares of that x-variable about it’s mean: basically, the model standard deviation for error is scaled by total variation in that particular x-variable to arrive at the standard error for that x-variable’s coefficient estimate. As a general rule, standard errors for different statistics are calculated differently.

I wouldn’t call this “advanced statistics”-- it’s just the CFAI is pretty pathetic and you can tell by 1) the way they ask questions, 2) the way they try to phrase explanations, and 3) the way they argue to defend incorrect topics/questions despite real statistical references. I don’t think it’s advanced for the books, I just don’t think they saw the connection so easily otherwise they would have made a note to try explaining the quick connection (as they do in other areas).

By the way, I remember when I floated this really simple way to show why the t-stat for the Pearson correlation is the exact same form of the other test statistics.

Finally, I can’t recall, but do they also point out that the t-statistic for a slope in SLR is the same as the square root of the F-statistic in that same SLR model?

Harrogath · October 29, 2018, 4:24pm

tickersu:

Harrogath:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T_(slope) = ( b_x - 0 ) / SE_x

where: SE_x = Standard Error of x = SD(x) / n^0.5 The SE_x is always given in an ANOVA table.

This is not the calculation of the standard error for a regression coefficient. The standard error for the coefficient is given by the residual standard deviation (SEE) divided by the sum of squares of that x-variable about it’s mean: basically, the model standard deviation for error is scaled by total variation in that particular x-variable to arrive at the standard error for that x-variable’s coefficient estimate. As a general rule, standard errors for different statistics are calculated differently.

Tickersu, are you saying that the book is wrong at calculating the Standard Error of a Regression Coefficient (i.e SEE)? Because the sources I can consult say that SEx can be calculated as SD(x) / n^0.5(for 1 independent variable regression).

How would you use “model standard deviation for error” to arrive at coefficient standard error? Can you describe better the calculation?

tickersu:

Harrogath:

The calculation of (2) is exactly the same as (1):

T_{(correlation coefficient r)} = ( r - 0 ) / SE_r

where Standard Error of r is = [(1 - r²) / (n-2)]^0.5…The book lacks to specify this formula because it is too much deep in the subject of advanced statistics.

So replacing the SE_r in the T_® we get the same T-statistic the book shows off:

T® = ( r - 0 ) / [(1 - r²) / (n-2)]^0.5= r * (n-2)^0.5 / (1 - r²)^0.5

I know, we are at the same initial point… why those T-statistics are the same value. The demonstration relies on the calculation of the SE_r, which is not an easy task and probably not of your business for now, unless you are really interested in those kind of demonstrations. You can check these links for help:

I wouldn’t call this “advanced statistics”-- it’s just the CFAI is pretty pathetic and you can tell by 1) the way they ask questions, 2) the way they try to phrase explanations, and 3) the way they argue to defend incorrect topics/questions despite real statistical references. I don’t think it’s advanced for the books, I just don’t think they saw the connection so easily otherwise they would have made a note to try explaining the quick connection (as they do in other areas).

By the way, I remember when I floated this really simple way to show why the t-stat for the Pearson correlation is the exact same form of the other test statistics.

Finally, I can’t recall, but do they also point out that the t-statistic for a slope in SLR is the same as the square root of the F-statistic in that same SLR model?

Have you ever tried to derive the standard error of Pearson correlation coefficient? Have you seen the links I provided? Those are not intro calculations

I think we have had this discussion before and here we go again: CFAI quant book is not meant for statisticians, rather for students that will probably use regressions and other related calculations in their professional careers. We will probably use software that will do the calculation for us, so the problem lies in correctly interpreting the results that a machine provides, understanding real life and passing that reality to numbers the most correct possible.

Are you an actuarial analyst btw?

tickersu · October 30, 2018, 12:54pm

I don’t have the book, so I can’t see it, but it wouldn’t surprise me at all. They have been wrong on plenty of things in the past. Many of their answers or interpretations of confidence intervals or p-values have been wrong when I skimmed answer solutions; they often suggested that there was a 95% chance the true value is in a specific CI say (2,10) and they had also suggested p-values represent the probability the null hypothesis is true or that a Type I error was made-- these are both elementary mistakes that would be covered in early undergraduate coursework. However, the SEE is different from the standard error of say the slope relating Y and X₁. The SE for a slope is the model standard deviation for the errors (SEE) divided by the square root of the (basically) variance of X₁.

You can share some or send me a picture of the book (if it’s a book) in the PM. The Standard error of X-bar would be the standard deviation of X divided by the square root of N. As far as I know, this isn’t correct for the standard error of X’s regression coefficient (like I’ve never seen anything suggested that).

If S_eestimated standard deviation of model error (SEE some people call it), and SS_xx is the sum of squared deviations of X around it’s mean, (X-Xbar)², then to calculate the standard error of beta_x in the regression

E(Y) = b₀ + beta_x(X) then Var(beta_x)= S_e² / SS_xx and SE(beta_x)= [Var(beta_x)]^0.5= [S_e² / SS_xx]^0.5

This is for simple linear regression. In multiple regression, the denominator of each estimated coefficient variance also is adjusted by (1-R_j²) where R_j²is the R² from a regression of X_j as the dependent variable predicted by all other independent variables from the main model. This makes it really straight forward to see why and how higher degrees of multicollinearity between X_j and the other IVs (R_j²is larger) inflate the variance and standard error of beta_j.

https://stats.stackexchange.com/questions/88461/derive-variance-of-regression-coefficient-in-simple-linear-regression

https://stats.stackexchange.com/questions/342632/how-to-understand-se-of-regression-slope-equation

I haven’t tried to derive the SE for a Pearson correlation, no. I have had to derive other things in regression (variances, slope estimators), but mostly these are things an undergraduate statistics major would be expected to do in class. However, I was not saying they should put a derivation in the book, I was saying they should use the simple algebraic manipulation and substitution that you, S2000, and I have used in posts to show why the t-statistic for rho is not so weird and different, it’s just the CFAI hasn’t taken two seconds to show algebra to the reader.

I agree the CFAI is not “for statisticians” but that doesn’t give them a pass to press incorrect material as correct, especially given the whole “we strive for excellence and world-class education”-- why not hire one legitimate statistician (honestly, someone with an undergraduate statistics degree would suit most of their purposes but ideally at least an MS in statistics) for reviewing the quant curriculum? This is how universities hire statisticians to teach intro to stats to non-stats majors and it works well when you pick someone who can teach. I agree with you that this is for non-statisticians, but the CFAI shouldn’t be puffing out it’s chest trying to show more mathematical detail when they do it poorly (either carelessness or inability to do it better). If they’re going to do it, it should be done correctly. People often “run stats” without knowing which way is up but only how to get a p-value that they likely misinterpret as an error probability or as a probability that the null is true. When the software gives a warning or an error, they haven’t the slightest clue to go about fixing it, and I have seen people just use what the program gave despite a serious error (until they can’t move forward with a project anymore and end up asking for help). No more rant but I just think the CFAI likes to puff out it’s chest but this is one area they need to improve if they want to purport “world class education.”

I am not.

tickersu · October 30, 2018, 6:49pm

Can’t recall if I mentioned this, but the standard error of x-bar would be (SDx)*(n)^-0.5 where the SDx is standard deviation of X, but each sample statistic has its own calculation for standard error, as a general rule.

Harrogath · November 1, 2018, 8:43pm

LOL you just read badly my original post. You arrived to the beginning.

I agree with most of your comments btw, however keep in mind that CFAI tries to train students in a wide range of disciplines rather than 1 only discipline which is the one most experience you have. I know it is easy to see incomplete or not rigorous enough definitions at this time.

tickersu · November 2, 2018, 3:15am

I’m quoting this to show that I don’t believe I misread your initial post. You clearly make the assertion that the standard error for a slope coefficient in linear regression is equal to SD(x) / n^0.5 it’s right in your post. This is incorrect. That formula is not how you calculate the standard error for any slope coefficient in linear regression. I stand by my posts to explain all of this. My most recent post was only to show that this is the standard error of the mean (x-bar), so it is one calculation for a standard error of something but not a regression slope; that something is x-bar. If they claim that is the formula for a slope standard error, they’re flat out wrong. It’s not incorrect. It’s not incomplete.

I would also say that again, the CFAI is trying to teach the material; it should be correct. Saying that they want to teach a broad curriculum does not excuse them from doing it correctly. It’s inappropriate to claim you’re world class yet don’t have the right people in the right places. It’s even worse when they won’t admit to when they have things outright incorrect by any resource they could use to verify. Mistakes happen, but not correcting them or admitting to them is far worse.

Harrogath · November 2, 2018, 5:50pm

tickersu:

Harrogath:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T_(slope) = ( b_x - 0 ) / SE_x

where: SE_x = Standard Error of x = SD(x) / n^0.5 The SE_x is always given in an ANOVA table…

I’m quoting this to show that I don’t believe I misread your initial post. You clearly make the assertion that the standard error for a slope coefficient in linear regression is equal to SD(x) / n^0.5 it’s right in your post. This is incorrect. That formula is not how you calculate the standard error for any slope coefficient in linear regression. I stand by my posts to explain all of this. My most recent post was only to show that this is the standard error of the mean (x-bar), so it is one calculation for a standard error of something but not a regression slope; that something is x-bar. If they claim that is the formula for a slope standard error, they’re flat out wrong. It’s not incorrect. It’s not incomplete.

I would also say that again, the CFAI is trying to teach the material; it should be correct. Saying that they want to teach a broad curriculum does not excuse them from doing it correctly. It’s inappropriate to claim you’re world class yet don’t have the right people in the right places. It’s even worse when they won’t admit to when they have things outright incorrect by any resource they could use to verify. Mistakes happen, but not correcting them or admitting to them is far worse.

With all due respect, but you have misunderstood the whole thread. We were talking about T-statistics.

OP asked why the T-statistic value of the slope coefficient in a simple regression equation is exactly (or very approximated) to the T-statistic value of a Pearson’s correlation coefficient between the same variables used in the simple regression.

If you read carefully again what I wrote:

T_(slope) = ( b_x - 0 ) / SE_x

where: SE_x = Standard Error of x = SD(x) / n^0.5

In any moment I said that SE_x is the SE for a slope coefficient as a general rule, but a part of the calculation of T-statistic for the slope coefficient in simple regression.

Just to say it in your own notation: Any “x” you see in the formulas above is “x-bar” which is a vector of data observations. Are we agree with that?

Also, in your first reply you said the demonstration of the variance of Pearson’s correlation coefficient was a introductory calculation, and that any student would be asked to derive in class; however, you haven’t calculated it so far [in your life]. I must tell you that it is, in fact, advanced statistics.

Hope this helps.

tickersu · November 2, 2018, 10:01pm

There is no misunderstanding here. The t-statistic for the slope coefficient is not calculated in the manner you claim. It does not use the standard error of x-bar.

Again, there is no misunderstanding here.

Your final sentence is unclear. Further, the bold is factually, mathematically incorrect regarding the t-stat for a slope (re: standard error calculation). There is no disputing what you have written because you defined the variable in question (and have done so incorrectly).

X-bar usually represents the sample mean, a scalar, not a vector of observed values. In either case, read again the variables you defined above.

I believe I said an undergraduate statistics major would be asked to derive it. I was not a statistics major and my lack of derivation of this item does not make it advanced statistics. Open an actual statistical text used at the undergraduate level for statistics majors; it’s in there. Your argument of “tickersu hasn’t done it, so it’s advanced” is flattering, although illogical.

I won’t comment further on this. The solution is to read what you wrote earlier and how you defined your variables. Then look at a statistical resource that shows the standard error of a slope in OLS is different from what you’ve said. https://stats.stackexchange.com/questions/91750/how-is-the-formula-for-the-standard-error-of-the-slope-in-linear-regression-deri