Multiple Regression - Natural Logs

whey do they take natural logs of certain variables such as: number of market makers, market size, etc.???

Because when they look at the relationships between the variables, the relationship between x and y doesn’t appear to be linear, but the relationship between x and ln(y) does appear to be linear.

There 's nothing magical about taking natural logs. If the data made it appear that x_² and y have a linear relationship, then they’d take √_y instead of ln(y).

Hi, anyone else take a stab at this? I am lost with this…

I don’t feel I am particularly good with explanations but let’s see… First get the fundamentals down, just in case you are unclear :

Here is what a linear relationship looks like (here y = 2x):

http://www.wolframalpha.com/input/?i=2x

Here is what an exponential relationship looks like (here y = e^x):

http://www.wolframalpha.com/input/?i=e^x

In the first case y grows by the same amount (here, 2) constantly and that is why it looks like a simple line. In the second case however you can see that y barely moves in the beginning and then it “explodes” - little changes of x down the line result in big changes in y. This effect is attributed to the fact that the slope is not constant, as you move forward it becomes steeper, it is dynamic.

Two variables that are obviously related with a linear relationship (for example feet and meters) can be adequately described by the linear regression model because that is what it is trying to do : It is trying to come up with a line that more closely resembles the relationship between the data you have inserted.

Now what if you knew beforehand or suspected even, that the relationship between two variables is in fact exponential?

Then the regression could look like this : y = e^(b0+b1X+ε) Now since we cannot work with this (it is not linear as the x now is in an exponent), we have to transform it. If you take the natural logarithm of both sides your new equation is : ln(y) = b0 +b1X + ε Now this is a linear relationship, something that the model is built for. Most analysts and economists build an intuition over time about variables that are probably related in an exponential manner so they know they will get a better fit by converting their data beforehand, but I seriously doubt you will have to do so in the cfa exam. So if you think you are struggling too much with this concept, it might be of strategic importance to just learn what S2000magician said by heart : If the relationship between two variables does not appear to be linear, use logs. That’s the most you can be asked to know in a multiple choice exam and there is CERTAINLY much more important stuff to spend time on!

Great. Thanks guys!