misrepresentation of correlation // data mining

Hi can someone explain me the difference between data mining and misrepresentation of correlation? seriously, I don’t get it! thx maeva

** it’s misinterpretation of correlation (not misrepresentation)

maeva Wrote: ------------------------------------------------------- > Hi > > can someone explain me the difference between data > mining and misrepresentation of correlation? > seriously, I don’t get it! > > thx > > maeva let me try. data mining. You have the specific idea in mind, eg. “return related to GDP?” But bad data affects your CME and gives you the false conclusion that return is related to GDP, even though actually it is not. misrepresentation of correlation. You have no idea in mind but you just run the correlation analysis using historical and find that “wow, low food price is highly correlated with bad weather” — and you tell yourself that if it is low food price next time, it will be bad weather. Hope I am correct but these are really bad examples though … sorry

Data Mining: I think Stock A is effected by Banana Prices, so I search through data until I find evidence that in fact it supposedly does according to my data - In other words it almost like drawing a conclusion first and then building a model to fit it. MisInterpretation of Correlation: Stock A is 90% correlated with Banana prices. So I assume that an Increase in Banana prices will cause an increase in Stock A, but in fact this correlation is a spurious correlation and there is a third Variable C, that they are both highly correlated to. So in fact Stock A isn’t correlated to Banana Prices its that both are Correlated with the third Variable which is I dont know Gas prices. And when Gas prices go up Banana prices go up due to transportation costs and Stock A goes up because its a Energy related company.

Ok…I think I got it now! many thanks