# Quant-Regression

In a F test, if we have to test significance at 5% and 2.5%, and we reject null at 2.5%, then does it mean that we will also reject at 5%?

Make the most of your CFA® Progam prep in one weekend! Join renowned instructors, Peter Olinto, Darren Degraaf & David Hetherington in May for a live, two-day intensive final review class.

Got it..If null is getting rejected at 97.5%, then it surely is getting rejected at 95%…

Yup. You can see this in other perspective.

The lower the threshold (alpha), the narrower the confidence interval (CI) for the true value of the parameter to be within the interval. If the calculated statistic (F, T, Z, etc) rejects the null (you are inside the CI), then you are also inside a wider CI.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Got it..Have you also studied the Black and Scholes. I am not quite getting the intuition, What to do?

You meant, lower the alpha, wider the CI… :)

gargijain wrote:
Got it..Have you also studied the Black and Scholes. I am not quite getting the intuition, What to do?

Do you want intuition, or do you want understanding?

Simplify the complicated side; don't complify the simplicated side.

Financial Exam Help 123: The place to get help for the CFA® exams
http://financialexamhelp123.com/

gargijain wrote:

You meant, lower the alpha, wider the CI… :)

Oops, you got me. I think you need no help ; )

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Both magician….Pls help me here…

Pls help me in Black and Scholes…

It is mainly about volatility of stock prices, if I remember well.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

CFAI EOC 6th Questn, page 316-Suppose that you deleted several of the observations that had small residual values. If you re-estimated the regression equation using this reduced sample, what would likely happen to the standard error of the estimate and the R-squared?

Standard Error of the Estimate

R-Squared

A
Decrease

Decrease

B
Decrease

Increase

C
Increase

Decrease

Answer is C. Is it because the observations had small residual values? Had the residual values been higher, it would have been the other way round, ie. it SEE would decrease and R squared would increase?

Or is it because, as n decreases, SEE increases. And as SEE increases, SSE increases, which further increases Total variation. Hence, R2 decreases?

It is mainly because the lower the residual value for an observation, the lower SEE of the whole model and higher R2 (lower error of estimate implies better fit)

If we retire the best errors (the lowest ones), then the SEE of the model will be higher, therefore worse fit / lower R2.

To make this more intuitive, recall the graphic where you have the observations dispersion and an “average line” between the dispersion. This line is linear model. The distance between any observation point and the linear model is an observation error. Now, take out the observations with the lowest distance from the linear model (the lowest errors). Yup, the sum of errors will be higher (ie. SEE), therefore lower R2.

Also, the standard deviation of errors will increase.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Thank you so much Harrogath….:)

It is mainly because the lower the residual value for an observation, the lower SEE of the whole model and higher R2 (lower error of estimate implies better fit)

If we retire the best errors (the lowest ones), then the SEE of the model will be higher, therefore worse fit / lower R2.

To make this more intuitive, recall the graphic where you have the observations dispersion and an “average line” between the dispersion. This line is linear model. The distance between any observation point and the linear model is an observation error. Now, take out the observations with the lowest distance from the linear model (the lowest errors). Yup, the sum of errors will be higher (ie. SEE), therefore lower R2.

Also, the standard deviation of errors will increase.

[/quote]

Black and Scholes - It is about pricing a call option…And we have that formula where we use probabilities…ND1, ND2…My question was on ND1..which is the delta of the call option…

It is mainly about volatility of stock prices, if I remember well.

[/quote]

Pg 324,CFAI-Solution of 12th problem, EOC-For a regression with one independent variable, the t-value (and significance) for the slope coefficient should equal the t-value (and significance) of the correlation coefficient.

Why???

gargijain wrote:

Pg 324,CFAI-Solution of 12th problem, EOC-For a regression with one independent variable, the t-value (and significance) for the slope coefficient should equal the t-value (and significance) of the correlation coefficient.

Why???

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5    The SEx is always given in an ANOVA table.

The calculation of (2) is exactly the same as (1):

T(correlation coefficient r) = ( r - 0 ) / SEr

where Standard Error of r is = [ (1 - r2) / (n-2) ]0.5 ….The book lacks to specify this formula because it is too much deep in the subject of advanced statistics.

So replacing the SEr in the T(r) we get the same T-statistic the book shows off:

T(r) = ( r - 0 ) / [ (1 - r2) / (n-2) ]0.5 = r * (n-2)0.5 / (1 - r2)0.5

I know, we are at the same initial point… why those T-statistics are the same value. The demonstration relies on the calculation of the SEr, which is not an easy task and probably not of your business for now, unless you are really interested in those kind of demonstrations. You can check these links for help:

https://www.jstor.org/stable/2277400?seq=1#page_scan_tab_contents

I recommend you to stick to the CFAI books and no more research for now. Train for the exam instead.

BR.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Noted with thanks…

I have a doubt when we calculate the percent decline in dependent variable given percent increase in independent variable (say 2%). In the equation, in some problems, 2 is used while in some, 0.02 is used. Pls advise.

I have the same concern. However, it is usually mentioned in the problem. They will either say in percent or in decimals. This is what I have been seeing in the EOC questions in the multiple regression chapter. Hopefully it is the same thing in the exam.

I was referring to the below problem 4, pg 398, Multiple regression.

In early 2001, US equity marketplaces started trading all listed shares in minimal increments (ticks) of \$0.01 (decimalization). After decimalization, bid–ask spreads of stocks traded on the NASDAQ tended to decline. In response, spreads of NASDAQ stocks cross-listed on the Toronto Stock Exchange (TSE) tended to decline as well. Researchers Oppenheimer and Sabherwal (2003) hypothesized that the percentage decline in TSE spreads of cross-listed stocks was related to company size, the predecimalization ratio of spreads on NASDAQ to those on the TSE, and the percentage decline in NASDAQ spreads. The following table gives the regression coefficient estimates from estimating that relationship for a sample of 74 companies. Company size is measured by the natural logarithm of the book value of company’s assets in thousands of Canadian dollars.

The average company in the sample has a book value of assets of C\$900 million and a predecimalization ratio of spreads equal to 1.3. Based on the above model, what is the predicted decline in spread on the TSE for a company with these average characteristics, given a 1 percentage point decline in NASDAQ spreads?

Now here, percentage is given, but in solution, we used 1….and not 1%…

Percentage decline in TSE spread = –0.45 + 0.05(ln 900,000) – 0.06(1.3) + 0.29(1). Because I used -0.01, my answer was wrong. So, I understood that since decline is asked, I should not use -1. But still, even if I use 0.01, my answer is incorrect.

cfageist wrote:

I have the same concern. However, it is usually mentioned in the problem. They will either say in percent or in decimals. This is what I have been seeing in the EOC questions in the multiple regression chapter. Hopefully it is the same thing in the exam.

You need to pay attention to the building of the variable. The variable is “Percentage change in NASDAQ spreads”, so if you see the raw data of that variable you will see: 2 , 1 , -5 , 4 , etc, because the unit is in % already. The same with company size, the raw data is not on \$ like 900,000 or 1,200,000, but in napierian logarithm: ln(900,000)

Pay attention to the description of the variables in order to interpret well the outcomes of the model.

Hope this helps.

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Harrogath wrote:
… napierian logarithm: ln(900,000)

This is a natural logarithm.

Properly, a Napierian logarithm is defined as:

NapLog(x) = −107ln(x/107)

Simplify the complicated side; don't complify the simplicated side.

Financial Exam Help 123: The place to get help for the CFA® exams
http://financialexamhelp123.com/

S2000magician wrote:

Harrogath wrote:
… napierian logarithm: ln(900,000)

This is a natural logarithm.

Properly, a Napierian logarithm is defined as:

NapLog(x) = −107ln(x/107)

Indeed, my friend. Wanted to say natural log

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Got that, thanks…

Harrogath wrote:

You need to pay attention to the building of the variable. The variable is “Percentage change in NASDAQ spreads”, so if you see the raw data of that variable you will see: 2 , 1 , -5 , 4 , etc, because the unit is in % already. The same with company size, the raw data is not on \$ like 900,000 or 1,200,000, but in napierian logarithm: ln(900,000)

Pay attention to the description of the variables in order to interpret well the outcomes of the model.

Hope this helps.

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula..?

gargijain wrote:

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula..?

Not a foolish question.

A line is that, a line. A known and defined line which passes through at least 2 points of the space (n-dimensional).

A regression line, on the other hand, is a simplification of a dispersion of observations and is subject to an error. This is why, the slope of regression is calculated based on the relative relation of dispersion measures (covariances and variances).

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates

Thanks… :)

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula..?

Quote:

Not a foolish question.

A line is that, a line. A known and defined line which passes through at least 2 points of the space (n-dimensional).

A regression line, on the other hand, is a simplification of a dispersion of observations and is subject to an error. This is why, the slope of regression is calculated based on the relative relation of dispersion measures (covariances and variances).

gargijain wrote:
Quote:
Not a foolish question.

A line is that, a line. A known and defined line which passes through at least 2 points of the space (n-dimensional).

A regression line, on the other hand, is a simplification of a dispersion of observations and is subject to an error. This is why, the slope of regression is calculated based on the relative relation of dispersion measures (covariances and variances).

Thanks… :)

Hi, I have a rather foolish doubt. What is the difference between slope of line and slope of regression line? I think the concept is the same, then why the difference in the formula? Slope of a line is rate of change in Y/ rate of change in X. Then, in slope of regression line, why do we have the covariance formula..?

Calculus.

When you determine the formula for the line that minimizes the sum of the squared deviations from the line to the data points, the slope you get is given by the covariance/variance formula.

Simplify the complicated side; don't complify the simplicated side.

Financial Exam Help 123: The place to get help for the CFA® exams
http://financialexamhelp123.com/

Harrogath wrote:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5    The SEx is always given in an ANOVA table.

This is not the calculation of the standard error for a regression coefficient. The standard error for the coefficient is given by the residual standard deviation (SEE) divided by the sum of squares of that x-variable about it’s mean: basically, the model standard deviation for error is scaled by total variation in that particular x-variable to arrive at the standard error for that x-variable’s coefficient estimate. As a general rule, standard errors for different statistics are calculated differently.

Harrogath wrote:

The calculation of (2) is exactly the same as (1):

T(correlation coefficient r) = ( r - 0 ) / SEr

where Standard Error of r is = [ (1 - r2) / (n-2) ]0.5 ….The book lacks to specify this formula because it is too much deep in the subject of advanced statistics.

So replacing the SEr in the T(r) we get the same T-statistic the book shows off:

T(r) = ( r - 0 ) / [ (1 - r2) / (n-2) ]0.5 = r * (n-2)0.5 / (1 - r2)0.5

I know, we are at the same initial point… why those T-statistics are the same value. The demonstration relies on the calculation of the SEr, which is not an easy task and probably not of your business for now, unless you are really interested in those kind of demonstrations. You can check these links for help:

I wouldn’t call this “advanced statistics”– it’s just the CFAI is pretty pathetic and you can tell by 1) the way they ask questions, 2) the way they try to phrase explanations, and 3) the way they argue to defend incorrect topics/questions despite real statistical references. I don’t think it’s advanced for the books, I just don’t think they saw the connection so easily otherwise they would have made a note to try explaining the quick connection (as they do in other areas).

By the way, I remember when I floated this really simple way to show why the t-stat for the Pearson correlation is the exact same form of the other test statistics.

Finally, I can’t recall, but do they also point out that the t-statistic for a slope in SLR is the same as the square root of the F-statistic in that same SLR model?

tickersu wrote:

Harrogath wrote:

I would not try to go deeper in this demonstration unless you are really interested in advanced statistics. For the scope of the L2 exam, we rely on the formulas provided.

In this case, the book says that the T-statistic for the slope coefficient (…1) of a single variable regression (1 indep variable) should be equal to the T-statistic of the correlation coefficient (…2) between the dependent variable and the independent variable.

The calculation of (1) is:

T(slope) = ( bx - 0 ) / SEx

where: SEx = Standard Error of x = SD(x) / n0.5    The SEx is always given in an ANOVA table.

This is not the calculation of the standard error for a regression coefficient. The standard error for the coefficient is given by the residual standard deviation (SEE) divided by the sum of squares of that x-variable about it’s mean: basically, the model standard deviation for error is scaled by total variation in that particular x-variable to arrive at the standard error for that x-variable’s coefficient estimate. As a general rule, standard errors for different statistics are calculated differently.

Tickersu, are you saying that the book is wrong at calculating the Standard Error of a Regression Coefficient (i.e SEE)? Because the sources I can consult say that SEx can be calculated as SD(x) / n0.5 (for 1 independent variable regression).

How would you use “model standard deviation for error” to arrive at coefficient standard error? Can you describe better the calculation?

tickersu wrote:

Harrogath wrote:

The calculation of (2) is exactly the same as (1):

T(correlation coefficient r) = ( r - 0 ) / SEr

where Standard Error of r is = [ (1 - r2) / (n-2) ]0.5 ….The book lacks to specify this formula because it is too much deep in the subject of advanced statistics.

So replacing the SEr in the T(r) we get the same T-statistic the book shows off:

T(r) = ( r - 0 ) / [ (1 - r2) / (n-2) ]0.5 = r * (n-2)0.5 / (1 - r2)0.5

I know, we are at the same initial point… why those T-statistics are the same value. The demonstration relies on the calculation of the SEr, which is not an easy task and probably not of your business for now, unless you are really interested in those kind of demonstrations. You can check these links for help:

I wouldn’t call this “advanced statistics”– it’s just the CFAI is pretty pathetic and you can tell by 1) the way they ask questions, 2) the way they try to phrase explanations, and 3) the way they argue to defend incorrect topics/questions despite real statistical references. I don’t think it’s advanced for the books, I just don’t think they saw the connection so easily otherwise they would have made a note to try explaining the quick connection (as they do in other areas).

By the way, I remember when I floated this really simple way to show why the t-stat for the Pearson correlation is the exact same form of the other test statistics.

Finally, I can’t recall, but do they also point out that the t-statistic for a slope in SLR is the same as the square root of the F-statistic in that same SLR model?

Have you ever tried to derive the standard error of Pearson correlation coefficient? Have you seen the links I provided? Those are not intro calculations :)

I think we have had this discussion before and here we go again: CFAI quant book is not meant for statisticians, rather for students that will probably use regressions and other related calculations in their professional careers. We will probably use software that will do the calculation for us, so the problem lies in correctly interpreting the results that a machine provides, understanding real life and passing that reality to numbers the most correct possible.

Are you an actuarial analyst btw?

Las almas de todos los hombres son inmortales, pero las almas de los justos son inmortales y divinas.
Sócrates