i have read alot of your comments and you are great… so here is my question… i am confused when calculating covariance. specifically when to use n and n-1. i know you said that if they give use the mean then we use n. and that if we have to calculate the mean we should use n-1… ok… this goes hand and hand w/ cfai. specifically r 53 problem 1c… whe had to calculate the mean thus when calculating covariance we used n-1… however i am using q-bank for pratice questions and they seem not to follow this concept. an example of this would be when they give us a 2 asset portfolio w/ some returns… we still have to calculate the mean, but they use n. it is getting on my nerves b/c i keep geting the questions wrong according to q-bank. but when looking at cfai i am right… so is there another way to go about this? a more definitive way b/c i seem to be spending 20 seconds on questions just thinking about wherther to use n or n- 1

http://en.wikipedia.org/wiki/Sample_mean_and_covariance If the population mean is known, then use n. If the population mean is unknown and you can have to calculate the sample mean, then you use n-1

^ that’s all correct. The only caveat is that if you have the entire population then x-bar = mu and you would use n.

when you are saying if the population mean is known you mean if it is given in the question right?

Right - it’s not real world. The only time the population mean is really known is if you have the entire population (in which case inferential statistics are not especially useful) or you have some omniscient question writer telling you “the mean is known to be 5”.

ymc Wrote: ------------------------------------------------------- > If the population mean is known, then use n. If > the population mean is unknown and you can have to > calculate the sample mean, then you use n-1 Could you explain why we use n in the case of population and n-1 with sample. I posted a question about the degree of freedom earlier because this was confusing me.

It’s about estimating sigma^2. Suppose that you know mu then your best guess at sigma^2 is sum(X(i) - Mu)^2/n, i.e., the average of the sqaured deviations. But if you don’t have mu then you compute sum(X(i) - X-bar)^2. However, X-bar contains some information about each of the X(i)'s so in some sense the sum of the squared deviations about X-bar is expected to be smaller. It turns out that you can exactly account for that by using n-1 instead of n.

A long story short: sum((X_i-X_bar)^2)/(n-1) is an unbiased estimator for sigma^2, ie expectation of it is exactly sigma^2 Proof: E(sum((X_i-X_bar)^2)/(n-1)) = sum(E(X_i^2) - 2*E(X_i*X_bar) + E(X_bar^2))/(n-1) = sum(sigma^2+mu^2 - 2*(sigma^2/n + mu^2) + sigma^2/n + mu^2))/(n-1) = (n-1)*sigma^2/(n-1) = sigma^2 Exercises for you: 1. Prove E(X_i^2) = sigma^2 + mu^2 2. Prove E(X_i*X_bar) = sigma^2/n + mu^2 3. Prove E(X_bar^2) = sigma^2/n + mu^2 kochunni69 Wrote: ------------------------------------------------------- > ymc Wrote: > -------------------------------------------------- > ----- > > If the population mean is known, then use n. If > > the population mean is unknown and you can have > to > > calculate the sample mean, then you use n-1 > > > Could you explain why we use n in the case of > population and n-1 with sample. I posted a > question about the degree of freedom earlier > because this was confusing me.

Do you really think that was likely to be helpful?

JoeyDVivre Wrote: ------------------------------------------------------- > Do you really think that was likely to be helpful? But this does illustrate the n-1 in sample variance formula has nothing to do with degree of freedom.

Couldn’t disagree more - what is a degree of freedom? Check out the Wiki entry on degrees of freedom for a good explanation

The following link is to an academic work about why we divide by n-1 http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/23/be/0e.pdf

kochunni69 Wrote: ------------------------------------------------------- > The following link is to an academic work about > why we divide by n-1 > > http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/ > content_storage_01/0000019b/80/23/be/0e.pdf Well, this confirms my view that it has more to do with unbiasedness of the estimator. The distribution of this estimator (adjusted by a factor of (n-1)/sigma^2) is a chi-squared distribution with degree of freedom of n-1. This is where the degree of freedom comes into play.

Oh well. Just have to turn in my Ph.D. and ten years as a Stats professor because I don’t know what a df is. Or maybe you need to think about it a little more.

Then why don’t you go ahead and explain why you need to know df to explain (n-1) here? I understand that the estimator has a df of (n-1) but do you need to know this to explain (n-1)?

Do you need to? Probably not. But it’s a very general principle that makes things much easier. I can do those expectation calculations in my sleep, but what’s better is to be able to figure out the dimensionality (e.g. degrees of freedom) just from looking at what’s being calculated. If you get this, you can fix MLE’s, do regression problems, give test statistic distributions, etc. without doing any of those calculations. Does that proof above give any insight into what is happening or lead you to any other results? I would say not. Obviously, this is all way beyond any CFA material.