# Correlation Matrix in Excel

I just noticed this in Excel when doing covariance/correlation matrices. I have 3 securities’ returns: Stock A, Stock B, Stock C down columns in excel and their respective dates in rows like this: Stock A 5/15/08 -.02 5/14/08 .0125 5/13/08 -.01 5/12/08 .008 Why do I have to use (.0125/-.02)-1 and not LN(-.02/.0125) in the column with all the return data?

Are you asking why you can’t take the log of a negative number?

I should rephrase. The output is a percentage change over a one day period. My question relates to the way that percentage (the return that is being used to construct the matrix) is calculated. example: on 5/13/08 the closing price of Stock A was 100, and on 5/12/08 it was 99. If I use 99/100-1 I get -.01. If I use ln(100/99), I get .01005. In order to get the right correl matrix, I have to use the former. Why?

draw a graph of ln(x), and then contemplate the meaning of correlation.

You can’t take a log of a negative real number and get a real number (though you might get an imaginary number, FWIW). If you do log returns, then you need to do LN((% change/ 100%)+1)

in your example you are taking 99/100-1 = -.01, while when you are doing the LN you are doing ln(100/99), you should be doing ln(99/100). This will get you a comparable number.

Ok so if it is supposed to be linear, why do we have to do (t/(t+1))-1 instead of doing ((t+1) - t )/t ?

I think you have it mixed up there. Assuming that by t and (t+1) you mean Value(t) and Value(t+1), where t represents a time period index, then you should be doing [(Value(t+1)/Value(t))-1] (you had the t+1 and t indices reversed in your post). This is algebraically equivalent to what you are asking, about. It’s not hard to show that this is equal to: (Value(t+1)-Value(t)) / Value(t) Now, for small values of x (i.e. x close to zero), LN(1+x) is approximately equal to x. So if X is a percentage change (or more accurately a proportional change), then LN(1+change) ~ (change). This is why, for most “normal” (in the non-statistical sense) returns, using LN(1+change) is as good as using (change) for correlation and other calculations. A big reason why Log returns are theoretically better to use, is that something like a stock can appreciate infinitely, but the most you can lose (if unleveraged) is 100% or when change=-1. However, a random normal variable can take any value between negative infinity and positive infinity. So to map these random variations to a variable that can vary between negative 100% and infinity, you can use a lognormal distribution, which is just LN(x), where x is a normally distributed random variable. When x->positive infinity, LN(x)->positive infinity; When x->negative infinity, LN(x)->0. So Value*LN(normally distributed random variable) can go anywhere between 0 and infinity, although large changes are substantially less likely. And that’s why you are usually asked to do log returns. Another nice feature of LN(x) is that if you take LN((Value(t+1)/Value(t) - 1) + 1), you get LN(Value(t+1)/Value(t)) = LN(Value(t+1)) - LN(Value(t)). This makes return series easier to manipulate in excel, since the percent change from time t to time t+s is approximately the difference between the logartithms at two time periods, as long as the change isn’t too large. The only problem is that you need a random variable of negative infinity to produce a stock value of 0 in this model, and clearly bankruptcies happen more often than once in an infinite number of years; and certainly more often than stocks go to infinity. However, we have people writing books about that, which get cooked up as Swan a le noir or something like that.

Having written all this and reviewing the original post, I think that someone is instructing you to do it incorrectly, or they may simply be thinking that the data series is in chronological order as opposed to reverse chronological order. As for correlation, I think the correlation of log returns are likely to give you better diversification measures than raw returns. This is because rising 10% and then dropping 10% gets you a net value of 99% of your original value. On the other hand, exp(10%)*exp(-10%) gets you exactly 100% of your original value. If the typical returns are relatively small, there shouldn’t be a great deal of difference between the correlations of raw returns and log returns, and people have their own tastes on how to do it. However, if large or outlier moves are more common, then it starts to make a difference.

Thanks for the clarification guys