 # Question on variance and sample variance, population vs sample?

A portfolio has exhibited the following characteristics over a 3-year period: Year 1 2 3 Return 5% 10% -3% the mean is 4 is the variance (5-4) squared + (10-4) squared + (-3 - 4) squared? The sum = 86, and variance is 86/3, right? So the variance is 28.67. However, I just watched a Stalla lecture, and they say the variance is 43, which would imply 86 / (n -1). Wouldn’t 43 be the SAMPLE-VARIANCE because it is (n-1)? I guess the question is: Are the returns over a three year period a population or sample. If these are a sample, wouldn’t 43 be a SAMPLE VARIANCE? NOTE: THIS IS THE ACTUAL QUESTION FROM STALLA A portfolio has exhibited the following characteristics over a 3-year period: Year 1 2 3 Return 5% 10% -3% Variance 43

It seems like I answer this question a lot these days. Just about anything you calculate from data is a sample statistic unless you observe all possible outcomes (which means it’s almost certainly not a problem for statistics). When you calculate the variance of three years worth of portfolio returns, you are trying to make some inference, say about the risk going forward. That means you are trying to estimate something. If you are trying to estimate something you need to use the sample variance if you want ti be unbiased. The terms “sample variance” and “variance” are used interchangably when it is clear that you are calculating something from a sample. Anyway, Stalla is right here - use (n-1).

Thank you very much for explaining this question very articulately, and promtly. yw

Thanks Joey

If someone makes me God of the Universe tomorrow, I’m doing away with this n vs. (n-1) thing. The problem is that to really get it you need to learn much more math than is anything like practical for being able to calculate summary statistics (e.g., “Excuse me sir, but I am wondering if I should assume that my population comes from the filtration upon which the portfolio Brownian motion process is measurable or just the filtration generated by the Brownian motion itself.” - “Look, moron, just calculate the f-ing sample variance for me”). However - you use n instead on (n-1) when the population mean is known which only happens when the omniscient question writer gives it to you or you observe the entire population. Since omniscient question writers don’t exist outside of stupid tests, this causes all kinds of confusion. Further, in any real problem the difference between dividing by n and (n-1) should be trivial compared to all the other errors in the problem (the biggest of which here is saying that portfolio returns are all drawn from the same population, which is never true except with test questions). In the real world if someone asks you to make inferences based on 3 data points, you should coyly smile and say “Not much I can do with 3 data points - the mean is somewhere around 4% plus or minus a billion”. Suppose that you absolutely know that all of these observations are not only drawn from the same population, but that the population distribution is normal. Statisticians ought to completely agree on how to estimate the variance, yes? Check out: http://www.isds.duke.edu/research/conferences/valencia/2005Varanasi.pdf (With my sincerest apologies for posting anything realted to Duke University) The first line of the abstract: “Point estimation of the normal variance is surely one of the oldest non-trivial problems in mathematical statistics and yet, there is certainly no consensus about its more appropriate solution.” From the first line of the paper: “Point estimation of the normal variance has a long, fascinating history which is far from settled. As mentioned by Maata and Casella (1990) in their lucid discussion of the frequentist decision-theoretic approach to this problem, the list of contributors to the twin problems of point estimation of the normal mean and point estimation of the normal variance reads like a Who’s Who in modern 20th century statistics.” If statisticians can’t decide what the right way to do it is, why is it that CFAI (and many others including American Psychological Association, Society for Actuaries, and I’m sure others) test people on what they say is the right way to do it? Honestly, I think that it’s because “unbiased” is such a poltically correct sounding property that any good estimator must have it. From the great statistician Charles Stein “I find it hard to take the problem of estimating ó2 with quadratic loss very seriously" and yet every non-statistician in the world gets tested on it on every freaking professional licensing exam in the world.