data snooping bias

The reference is p.655 in the FSA book. Can someone give a concrete example/explanation of data snooping bias related to filters in equity screening?

Data snooping doesn’t relate to filters in equity screening. It is a bias that occurs when the same data is used for many “experiments” such as testing the correlation between some variable and stock returns on a set of data. Avoid data snooping by using a fresh sample universe every time… Note: I hope I am using the right terms here when I say “Universe”… Quant was too far back to remember… uh oh…

“Data snooping doesn’t relate to filters in equity screening” What if you’re back-testing a filter using historical stock data? I’m confused since the book metioned data snooping bias as a drawback to equity filters.

Acording to CAIA Level 1 Schweser Book:

" Data dredging , also known as _ data snooping _, refers to the practice of overusing statistical tests (e.g., running hundreds of tests) to identify significant relationships with little regard for underlying economic rationale. The main problem with data dredging relates to the failure to take the number of tests into account when examining the results (i.e., placing too much confidence in the results)." (Burkett 243)

My understanding of data snooping is that you are effectively analysing a set of data over and over again until you find some sort of result that looks like it could be of use for predicting the future for example. So you’re essentially randomly searching for patterns in the data and when you find one which is possible, you assume it to be of value - however this pattern arose by pure chance and holds no meaningful explanation.

Here’s one I tell my students. Let’s assume for the sake of argument that ALL price multiples are randomly distributed. If you test the correlations of a large number of them (let’s say 30 or 40) with subsequent returns, you’ll likely find a couple with correlations that are significantly different from zero at the 5% level of significance. You might then (erroneously) conclude that these multiples are good predictors of returns.

Data snooping (or data dredging) is a particularly important issue in financial research because we have so many potential factors to choose from, and using computers makes it easy to data snoop and test until we find something that looks significant.