Confidence intervals

Hi Guys, I am experiencing confusion with the confidence interval formula. I don’t understand why the sample error is used rather than the population variation. At first i thought it may be because the population variance is unknown however the sample error contains the population standard deviation so the variance would easily be found and therefore known. Does anyone know the difference and why it is used.

Initially it is stated as x+/- stddev * variance.

sometimes as x+/- stddev * sample error

I think you might be confusing terms and formulas here. There are 2 cases that need to be distinguished:

  1. Sample Distribution : An example would be, you ask 100 people in this forum, what their score was after writing the CFA exam. Say the mean is 50%, and the standard deviation is 10%. Then your 5% confidence intervall is computed as (I assume you know where the 1.96 comes from):

mean ± std *1.96 or 50 ± 10 * 1.96

  1. Sampling Distribution: Imagine a couple of people (let’s say 10 guys) in this forum have already done the work for you, that is, over the last few years, around June, each of them asked exactly 100 people about their exam score, so you have 10 samples, each with a sample size of 100. Each of the 10 samples provides you with a mean, so you have 10 means. Those 10 means then give you a sampling distribution. Here the confidence interval is given by:

mean (of the means) ± std error * 1.96

What is the standard error?The standard error is =standard deviation/sqrt(n), where n is equal to 100, the size of each sample. We use the standard error when we are dealing with sampling distributions, and the standard deviation when dealing with the actual samples.

Check out this site for a beatiful illustration of the difference between the two:

http://onlinestatbook.com/stat_sim/sampling_dist/

Hi Tartaglia, Thanks for this. This may seem like a stupid question, but how would I know if I am meant to be using the standard deviation or the standard error of the sample mean? Examples I have seen it doesn’t really state evidence of either sampling distribution or not and when the sampling error should be used.

That is actually a very good question, because in all of the questions I encountered so far, you were supposed to figure that out for yourself.

You will be able to tell what you are dealing with, by looking at the description of the data. Example at the top of my head (which is a variation of what I mentioned above):

  1. Sample: Suppose the mean IQ ratio of all CFA candidates is 100 and the population variance is 25. You select a random sample of 50 candidates from this population and compute the sample mean. You have a distribution of 50 IQs and you can compute the mean of those 50 IQs and you’ll have your sample mean.

  2. Sampling Distribution: Suppose the mean IQ ratio of all CFA candidates is 100 and the population variance is 25. You select a random sample of 50 candidates from this population and compute the sample mean. You repeat this exercise 100 times and compute the sample mean of all 100 samples. You have the mean of means now (aka grand mean). If you want to compute the Standard Error, you need to divide the standard deviation by the sample size ( which is 50, not 100 ).

Let me know if this is not clear.

The example I was looking went along the lines of ‘A CFA test was given to 40 candidates. The mean score was 75. Assuming population standard deviation equal to 20, construct and interpret with a 99% confidence interval for the mean score of the exam for the 40 candidates.’ From this surely as only one test is taken they should be using the sample standard deviation rather than the standard error of the mean. The answer however is using the standard error. Why is this?

Can you post the exact question and answer? Also, what study material is the question from?

This is a good example. You are asked to provide the confidence interval for the mean score, thus you are again the world of the sampling distribution. Why is that?

Remember, the sampling distribution is the distribution of all the means that you collected (if you check out the link I gave you above, you will see how the individual sample means drop down to form the sampling distribution- I highly encourage you to take a look at it). In this question you only collected one mean, but that mean is still an element of the sampling distribution (that is, of all possible means, that could have been collected). Now you want to know, what is the confidence interval around this mean, within the sampling distribution and thus you use the standard error.

The following two pages also illustrate very well when you should use which ( highly recommended )

http://pages.wustl.edu/montgomery/articles/2664

http://pages.wustl.edu/montgomery/articles/2757

I have looked into the notes and agree with everything. So surely that the example therefore links to your first statement abotu CFA results. If you only asked once then that is a sample distribution rather than sampling distribution>?

Is the sample size allow to change is size?

No, the question you cited above relates to the sampling distribution.

If you ask 50 people in this forum about their exam score, then you have one sample and a sample distribution, where each point represents one person in this forum

Now if you compute the mean of that sample, that mean is an element of the sampling distribution, that is, the collection of all possible means (again, the first link illustrates this in a nice java application).

You are then asked to make a statement about the mean, thus you have to look at the sampling distribution (although you only collected one sample).

For whatever it’s worth, you _ never _ use variance in the construction of a confidence interval. The units are wrong.

Standard deviation? Sure!

Standard error? Of course!

Variance? Don’t be absurd!

For our purposes, you are typically given one sample with a certain sample size or you are told you have 100 samples each of size 50. So the sample size does typically not change in size (obviously you could create a sampling distribution any way you like, that is, you could gather 100 samples, each having a different size).

Thanks Tartaglia, I’ll let you know if I have any other questions.

Sure thing!

It takes a while to digest the whole sampling vs. sample thing, as usual, repetition helps.

Not in the case of a given sampling distribution.

Respectfully, this isn’t true. A sampling distribution is partly defined by the sample size use to calculate the statistic. The sampling distribution of x-bar, for example, is x-bar’s distribution based on repeated random samples of size N. If you change N, you’re working with a different sampling distribution for the given statistic (i.e. 100 samples of unequal sizes deals with 100 sampling distributions).

The simulator you posted will actually help with this. Create your own wacky distribution and take 10,000 samples of N=2 for the mean. Observe the sampling distribution. Next, keeping everything else the same, take 10,000 samples of N=25 for the mean. Notice the difference in distributions.

Hope this helps.

Thank you tickersu for correcting this.

I also did notice that the simulator always would use the same sample size, but I wrongly assumed that there was no requirement for the sample sizes to be equal each time. But what you said makes sense of course, given that the sampling distribution is defined (among other things) by the sample size of the individual samples, if you change the sample size for one sample, that sample becomes part of a different sampling distribution then.