John has just performed a hypothesis test and has calculated the p-value to be 0.13, which of the following is most likely to be true? A: Do not reject the null hypothesis at the 10% significance level, and do not reject the null hypothesis at the 5% significance level. B: Do not reject the null hypothesis at the 10% significance level, and reject the null hypothesis at the 5%significance level. C: Reject the null hypothesis at the 10% significance level, and do not reject the null hypothesis at the 5%significance level. D: Reject the null hypothesis at the 10% significance level, and reject the null hypothesis at the 5%significance level. If John tosses a two sided coin 6 times, which of the following is closest to the probability of obtaining exactly three heads? A: 0.02 B: 0.16 C: 0.31 D: 1.88 In testing the significance of the correlation coefficient of 0.4, assuming that there are 90 observations, which of the following is most likely to be true? A: A t-statistic of 5.09 and 88 degrees of freedom B: A t-statistic of 5.09 and 89 degrees of freedom C: A t-statistic of 4.09 and 88 degrees of freedom D: A t-statistic of 4.09 and 89 degrees of freedom Which of the folowing is least likely to be required in order to perform an F-test? A: The sum of the squared errors B: The total number of observations C: The mean regression sum D: The total number of parameters to be estimated Which of the following best describe observations that significantly reduce what would otherwise be a high correlation? A: Outliers B: Non-linear relations C: Independent variables D: Spurious correlations

If John tosses a two sided coin 6 times, which of the following is closest to the probability of obtaining exactly three heads? A: 0.02 B: 0.16 C: 0.31 D: 1.88 6c3 * .5^3 * /5^3 = .3125 Choice C In testing the significance of the correlation coefficient of 0.4, assuming that there are 90 observations, which of the following is most likely to be true? A: A t-statistic of 5.09 and 88 degrees of freedom B: A t-statistic of 5.09 and 89 degrees of freedom C: A t-statistic of 4.09 and 88 degrees of freedom D: A t-statistic of 4.09 and 89 degrees of freedom tstat = r * sqrt (n-2) / sqrt ( 1-r^2) = .4 * sqrt(88) / sqrt ( 1 - .4^2) = 4.094 therefore Choice: C --> t-statistic of 4.09 and 88 dof.

John has just performed a hypothesis test and has calculated the p-value to be 0.13, which of the following is most likely to be true? A: Do not reject the null hypothesis at the 10% significance level, and do not reject the null hypothesis at the 5% significance level. B: Do not reject the null hypothesis at the 10% significance level, and reject the null hypothesis at the 5%significance level. C: Reject the null hypothesis at the 10% significance level, and do not reject the null hypothesis at the 5%significance level. D: Reject the null hypothesis at the 10% significance level, and reject the null hypothesis at the 5%significance level. Given that p-value is the smallest level of significance at which the Null hyp cannot be rejected Choice A – do not reject at 10% and do not reject at 5% seems to be the choice. Which of the folowing is least likely to be required in order to perform an F-test? A: The sum of the squared errors B: The total number of observations C: The mean regression sum D: The total number of parameters to be estimated For a F test => There should be 2 parameters? I am guessing D would be not required. Which of the following best describe observations that significantly reduce what would otherwise be a high correlation? A: Outliers B: Non-linear relations C: Independent variables D: Spurious correlations B : Non-linear relations.

- ?? Don’t know what is a p-value 2. C 6c3 * 0.5^3 * 0.5^3 = 0.3125 3. C t-statistic = r * sqrt (n-2) / sqrt ( 1-r^2) = 0.4 * sqrt(90 -2) / sqrt ( 1 - 0.4^2) = 4.094 DoF = n - 2 = 90 - 2 = 88 4. ?? F-test related to Regression??? Is this ANOVA?? do we have it for L1?? 5. A The presence of Outliers (extreme observations, either +ve or -ve) reduces the correlation coefficient. - Dinesh S

CPK and dinesh, could u guys pls explain how do u derive the answer for that coin tossing question in detail? Which LOS is this and any reading material that may help? I am totally confused. Thanks in advance.

Coin tossing is the binomial probability distribution and for the correlation coefficient – this article outlines correlation: http://irp.savstate.edu/irp/glossary/correlation.html and these were the two lines there with my comments preceded by CPK below: (Maybe Joey can help identify the right answer) If the relationship is curvilinear, the “r” will give false and misleading readings that substantially underestimate the relationship. CPK: Based on the above – a non-linear relationship would reduce a otherwise STRONG Relationship The easy way to test and see whether the relationship is linear is to plot a scatter diagram and see if the “points” scatter in a more or less linear direction. On a scatter diagram, the coefficient measures the slope of the general pattern of points plotted and the width of the ellipse that encloses those points. The width of the ellipse indicates the extent of the relationship and hence, the magnitude, or absolute value of the coefficient. Some analysts advise removing any “outlier” cases from consideration and treat them a priori as aberrations so that they do not bias the relationship remaining among the more “normal” cases. CPK: OUTLIER biases the relation. It could end up being a HIGHER CORRELATION if there are outliers. And Dinesh p-value is the smallest level of significance at which the Null hyp cannot be rejected HTH CPK

First three are handled above, yes? Which of the folowing is least likely to be required in order to perform an F-test? A: The sum of the squared errors B: The total number of observations C: The mean regression sum D: The total number of parameters to be estimated A, B, and D are almost certainly required. I don’t even know what C is, so it’s not required. Which of the following best describe observations that significantly reduce what would otherwise be a high correlation? A: Outliers B: Non-linear relations C: Independent variables D: Spurious correlations Answer has got to be B because that certainly lowers the correlation but A isn’t a bad answer either. Outliers can either increase or decrease the correlation depending on what kind of outlier we are talking about. C and D are definitely out.

These are the answers guys: 1) A The p-value is the smallest value of alpha for which the null hypothesis is rejected. If the p-value is greater than or equal to the significance level , the null hypothesis is not rejected. If the p-value is smaller than the significance level, the null hypothesis is rejected. As the p-value in this question is 0.13, the null hypothesis is not rejected at the 5% or 10% significance level. 2) C Using the binomial distribution: Probability of obtaining exactly three heads = 6 C 3 x 0.5^3 x 0.5^3 (Where 6 C 3 is number of combinations drawing 3 from 6) Probability of obtaining exactly three heads = 20 x 0.125 x 0.125 = 0.31 3) C When testing the correlation coefficient: Degrees of freedom = n-2 = 90-2 = 88 T-statistic = [r(n-2)^1/2]/[(1-r)^2] = [0.4(90-2)^1/2]/[(1-0.4)^2] = 4.09 4) C The F-test uses the regression sum of squares, not the mean regression sum. 5) A Outliers are observations that would significantly reduce what would otherwise be a high correlation.

They are wrong on #5. Outliers have an ambiguous effect on correlation. Non-linearity always reduces correlation.

Thanks cpk, so the p-value is the smallest value of alpha for which the null hypothesis is rejected. ------- FAIL TO REJECT-------|-------REJECTION REGION------- -------|---------------|----------|--------------------------------------- ------0.05----------0.10-----0.13------------------------------------ Since our alpha1 is 0.05 (5%) and alpha2 is 0.10 (10%) are below 0.13 (our p-value), we do not reject the Null. Any significance level beyond 0.13, the Null is rejected. So answer ‘A’ makes sense. maparam, Question 2 is a BRV, where I labeled getting a HEAD as the success probability ‘p’ n tries = 3 total tries = 6 n - x = 6 -3 = 3 P(Head) = 1/2 = 0.5 P(Tail) = 1/2 = 0.5 ---------------------------------------------------------------------------------------- P(X = x) = (number of ways to choose x from n) * §^x * (1- p)^(n-x) ---------------------------------------------------------------------------------------- It’s so intuitive, than to remember this messy formula Joey, Non-linear relations are sure to reduce the correlation, but what they probably want us to assume, here, for this question, that we are in a capsule of L1 and we know no more than Linear Regression, probably a similar question on L2 (where we have the non-linear regression stuff) would surely be an answer of B. Guys, and I still don’t understand, how is a F-Test related to Linear Regression… Schweser notes never talk of this… is there something that I am missing? - Dinesh S

joey – since the question specifically asks Which of the following best describe observations that significantly reduce what would otherwise be a high correlation? A: Outliers B: Non-linear relations C: Independent variables D: Spurious correlations which best describes – the observations – would Outliers be the answer – because nothing else corresponds to Observations themselves

That’s a point. It’s hard to take the grammar of these very seriously though. So Dinesh - The idea is that in linear regression you just check to see if the slope is significant using a t-test. In every other kind of regression (e.g., multivariate regression, polynomial regression) you need a test that asks whether or not the whole regression model is significant or not. You use an F-statistic to do this in most cases (assume normal errors, blah, blah).

JoeyDVivre Wrote: ------------------------------------------------------- > So Dinesh - The idea is that in linear regression > you just check to see if the slope is significant > using a t-test. In every other kind of regression > (e.g., multivariate regression, polynomial > regression) you need a test that asks whether or > not the whole regression model is significant or > not. You use an F-statistic to do this in most > cases (assume normal errors, blah, blah). thanks Joey, so Schweser has just explained us the t-test for detecting the significance of the slope/intersept of the regresion line, but they have never talked about significance of the complete model. I’ll need to check if this and ANOVA tables are there for the L1 LOS. - Dinesh S

I don’t think it’s part of the L1 LOS and I think they have even phased it out on the LII LOS. Don’t take my word for it though.

maybe I am missing something here guys… but in question 2 how did you get 20 = 6 c 3 Thanks

there are 6C3 ways to choose exacly 3 heads from a possibility of 6 trials (6 heads) - Dinesh S

Dinesh… thanks… I understand that… but I was wondering if there is a formula for this or is this common sense that I am not picking up on… Thanks

There is a formula available too… ---------------------------------------------------------------------------------------- P(X = x) = (number of ways to choose x from n) * §^x * (1- p)^(n-x) ---------------------------------------------------------------------------------------- but hope you don’t use it too much… it’ better to understand ‘why’ than to plug-and-chug. - Dinesh S

I used 6C3 / 2^6 But I realise this only works because the probabilities are equal. I don’t get how the formula above works

Consider something which is binomial probability of success = .2 Probability of failure = .8 Now you have 3 chances to repeat the experiment. Your experiment outcomes are SSS SSF SFS FSS SFF FSF SFF FFF 3 Successes: SSS = 1 * .2 ^ 3 = 3C0 (0.2^3) * (0.8 ^ 0) Whereever the S appears – there is only 1 way that you can get 3 S Possibility of 2 Successes SSF = 3 * (.2 ^ 2) * (0.8 ^ 1) = 3C2 * (.2 ^ 2) * (0.8 ^ 1) This needs to be multiplied by 3 because SSF, SFS and FSS are 3 distinct possibilities. and so on.