Central Limit Theorem

According to Central Limit Theorem, if a sample size of n (where n >= 30) is taken repeatedly from a Population, then mean is calculated from each such sample and then those means are plotted in a graph with probability on Y axis and mean on X axis then it takes a form of Normal Distribution. My question is how many times these samples need to be taken, is there any formula which guides this decision? Also, if we calculate the Mean and Variance of Normal Distribution of sample means thus formed, the Mean of the distribution will be equal to the underlying Population mean and Variance of the distribution would be Population SD upon square root of sample size, right? Pls confirm, i am a bit lost.

sasankm, you have got the concept correctly. And answer to your first question is already in your next statement. That is, just by knowing the sample size, you can determine SD of distribution of sample means and you already know the mean of distribution of sample means (as mean of the Population). So, you know both the parameters that are required to define (plot) a normal distribution. Mean and its SD. So, the answer to your question is: Number of samples to be drawn from a population is irrelevant to determine the shape of distribution of those sample means. And to answer your second question; your statement is correct. To recapture, all you need to know about Central Limit Theorem is: For samples taken from any Population (normal or not-normal Population) and sample size greater than 30 1) Mean of these Samples will approximately plot a Normal Distribution 2) Mean of such Distribution will be the Mean of the Population 3) SD of that Distribution will be SD of Population Dist/Sq root of sample size Hope this helps.

Thanks rus1bus for your reply, But, i did not get the first part, you mean to say that from just one sample drawn (with sample size = 30) if i calculate the mean of that sample, i can assume that, that mean will be the Population mean as well? I thought I have to draw many samples of 30 each and then i will form a Normal Distribution and then if I calculate the mean of that, that will be equal to the mean of Population! Pls pardon for my ignorance, i am still confused on this part.

“i can assume that, that mean will be the Population mean as well” No. You can, however, assume that the EXPECTED mean of your sample will equal the population mean. In other words, your sample mean will approach the population mean as the sample size increases. Sigma / sqrt(n) is the standard error of your sample mean, not the standard deviation of your sample.

H M Walrus is exactly right. I am adding more in an attempt to elaborate it further. 1. First lets understand that if you know Mean and SD of any variable, and you know it has a normal distribution, you can graphically define the shape of its bell. 2. Next, you can define the shape of distribution of Sample Means for samples of a given fixed size, if you know the Population Mean and Population Standard Deviation. (This is from Central Limit Theorem, right?). With SD of such distribution as = Pop SD / Sqrt of sample size. 3. This distribution you thus obtained means, that, if you collect any sample of that fixed size from your population, Mean of that sample will lie within that Bell you have defined. 4. Now, to answer your question: “you mean to say that from just one sample drawn (with sample size = 30) if i calculate the mean of that sample, i can assume that, that mean will be the Population mean as well?”. You can assume that, but you also have to know that your Sample Mean may not be exactly equal to the Population Mean. Your Sample Mean could lie anywhere within the distribution curve you plotted in step 2. This is why, it is suggested to have a large sample size. Because, larger the sample size, lower will be SD of the Sample Distribution, thinner will be your distribution Bell and closer will be your Sample mean to your actual Population Mean. 5. But larger sample size means, more research costs in terms of time and effort. So, there is a trade off between accuracy and costs. 6. So, you see, you need only 1 sample to ESTIMATE your Population Mean, but that sample should be as large as that can be afforded and should represent the Population as CLOSE as possible. That is why it is very important to understand and eliminate various Sample Biases that can get in, in your sample collection process. Hope this helps. If you have queries we can discuss further.

Hi rus1bus, thanks for taking such pain to explan things in details, now i got the following queries, on your point 2: If I have already with me the Population SD and Mean, then why should i be interested in a sample distribution? I mean, I am under the assumption that to save on cost and time we cannot afford to study the entire population and settle for a sample, so that from that sample we can find the characteristcs of Population( Mean and SD). Pls correct me if i am wrong.

Sasankm, you are correct and I think you have got your sampling concepts good. In my first reading, I was no where close to your level of understanding. Now, for clarification on your query, you need to understand 2 basic applications of Sampling Techniques: 1. When Mean of the Population is known: Example for this could be, say a machine which produces a job with a Mean length of 2.5 cm (This is the known population mean). A quality manager takes a daily sample of outputs and based on the Mean of that sample, he/she has to decide whether or not to fix/tune the machine. In this case, you have your sample mean for the day and you need to see how many Standard Deviations (SD of distribution of sample means) is your Sample Mean away from actual Population Mean, for a given confidence level. If it fits in, you dont have to tune your machine and if not then you require tuning of your machine. So, this could be the type of application, when you do know your Population Mean. 2. When Mean of the Population is Not Known: This is the case, when you are using a sample to get its mean and estimating that for Population Mean. This you are doing because, calculating the Mean for entire population is either not affordable or not feasable. In this case, you are taking/using sample mean as Population Mean. But you need to document this in your research, so that your audience knows the possibility and extent of error in assuming Sample Mean as Population Mean. Also, in both the above cases, Population SD may or may not be known. If Population SD is not known, you will use SD of your sample to get SD for the distribution of sample means and use t-test instead of z-test for any further analysis of the case. Hope this helps.

Thanks rus1bus, its much clearer now, i will get back if i have any query on this later. Thanks for your patience to bear with a rookie like me.