Central Limit Theorem - not getting it

So this is how I understand the theorem:

  1. Regardless of the shape/skew of the population distribution, the distribution of the sample means will be normal.

  2. As you add more data points, the sample starts to resemble a normal bell curve more and more.

I am not getting how this is true at all. Let’s say you have a skewed population of 300 college students and the thing you’re measuring is test scores. This should be negatively skewed (very few students with extremely low scores, median higher than mean). If we take a sampling of 25 test scores, I get my 1st data point. Since I’m dealing with a negative skew, then I should be more likely to get sample means that are on the higher end, right?

Let’s assume Little Johnny has a 50 while the other 299 students in the class have 80. In this case, the entire class average is 79.9%. Even in a sample of 10, if I manage to catch LJ, the sample mean will be 77; at n=25 and catching LJ, the sample mean is 78.8. As I increase the sample size, the sample mean will still have an expected value of 79.9.

I even dug out an oooooold mathematical stats texts: CLT assumes drawing samples from an infinite population, but if we change that to be a finite population, the expected value of the sample mean will still be the population mean. However, there is a finite population correction factor applied to the variance of the sample mean.

1 Like

Your point 2 should be more clear that as you increase the sample size used to calculate the sample mean, the distribution of sample means (at that fixed sample size) is approximately normal. I.e. if you have, say, a uniform distribution and take a random sample of size 10 and calculate the sample mean repeat this many many times (infinitely), the distribution of those sample means will look less like a normal distribution than if you did the same with samples of size 40 ( and 41, 42,…100…324…).

There are a few exceptions to this, such as a Cauchy RV (special case of student’s T).

A quick point is that direction of skew does not always guarantee the relationship of mean in relation to median as taught in books. Negative skew does not guarantee that the mean < median < mode, and similarly, positive skew doesn’t guarantee that mode < median < mean.

For your example, it doesn’t matter the underlying distribution. You may be more likely to get higher values of the individual measurements, but this doesn’t have anything to do with the overall shape of the distribution of sample means of size n – these are separate.

The general idea of the CLT is that the sum of a bunch of little things (could be then calculated as a mean, but really, it is the sum that matters) is normally distributed. This is why the normal distribution has such a prominent place in life. You may have phenomena from which the underlying distribution is asymmetric or skewed, but when you take a random sample from that and calculate a sum, if repeated, those sums (again at a certain sample size) would behave very nicely.


An excellent example.

If I recall correctly, the Cauchy distribution does not have a finite variance. That probably has something to do with its failure here.

Interesting story about my experience with the Cauchy distribution:

When I was working on my Master’s in mathematics, I needed one more class one semester, and the only class available was an upper division statistics class. I hate these classes – three pages of manipulations to evaluate some stupid integral – but I liked the professor, so I signed up.

We had our first midterm exam on a Friday at noon. I had a project at work that was due that day, and I was working 16-hour days to get it finished; literally four hours of sleep a night Monday through Thursday. I finished it on Thursday, and tried to study for the exam, but kept falling asleep.

On Friday morning I went in to work to finish off the project and hand it off to my boss, and was running late for class. I couldn’t find a parking place, so I parked in the 30-minute spaces at the front of campus.

The exam comprised 5 questions: 20 points apiece. The first was something about a Cauchy distribution (I don’t recall what), and I answered that one immediately.

I read the second question and said to myself, “I know how to do this,” and started to write. After a few minutes I reread what I had written: complete gibberish! Nothing I wrote made sense. I decided to move on to the third question.

The rest of the exam followed those same lines: I would read the question, say to myself, “I know how to do this”, I would start to write, then I would read what I wrote: unintelligible nonsense!

When I returned to my car, I got the final insult: a parking ticket.

I got 20 / 100 on the exam: the only exam I’ve ever failed (before or since).

Cauchy is undefined in the mean, variance, skew… a weird distribution. So I have to ask, did you nail the Cauchy question and completely whiff the others?

I did.

I got a solid 20%.