Hypothesis testing - two tailed test

Example: Researcher gathered data on portfolio of call options over recent 250 day period. Mean daily return is 0.1%, sample standard deviation of daily portfolio returns is 0.25%. The researcher believes mean daily portfolio return is not equal to zero.

So this should be a two tail test either (H0: M = 0) VS. (HA: M not equal to 0).

At 5% level of significance, critical z-values for two tail test is +/- 1.96, thus reject H0 if test statistic < -1.96 or > +1.96.

Standard error of sample mean for a sample size of 250 is 0.0025 / square root of 250 = 0.000158.

Our test statistic: 0.1% / 0.0158% = 6.33 > 1.96, thus reject the null hypothesis.

Ok, there are two things I don’t get in this example, please help me out:

  1. How did they come up with significance level of 5%, and why not 10% or 1%?

  2. Why use mean daily return to divide by standard error? 0.1%/0.0158% = 6.33? Shouldn’t the comparison be between 0.000158 vs. 1.96?

Hypothesis testing is driving me crazy, and what is the real life implication of this testing.

Thanks a lot in advance to those who comments.

  1. The choice of significance level is subjective, and one could also test the hypothesis at 10% / 1% confidence. This example chooses the 5% level.

  2. The null hypothesis here is that mean daily return = 0, the researcher is testing to see if that hypothesis can be rejected. Since the test here is for the mean daily returns (and not the daily returns), the central limit theorem applies. According to the theorem, the sample standard deviation of mean daily portfolio returns = (sample standard deviation of daily portfolio returns) / sqrt(sample size). i.e. if you take several random 250-day return samples and compute the mean for each sample, the resulting means will be normally distributed with expected value = 0 (per the null hypothesis) and stdev = 0.0158% in this example. The observed mean (0.1%) is 6.33 standard deviations away from the expected mean (expected mean =0). Since the z value is 1.96, the occurrence of a sample mean of 0.1% is statistically significant (at the 5% level) and cannot be attributed to random chance. Thus the null hypothesis is rejected in this case. Comparing 0.0158% to 1.96 doesnt make sense since 0.0158% is the stdev whereas 1.96 measures the distance (in terms of standard deviations) from the mean.