Sample Sizes?

How do you define an appropriate sample size?

I have been playing around with some commodity returns ('83-'12) in a spreadsheet and began wondering wether this was too much?!

I’m far from experienced, but I’d think re: returns, the big thing to consider is macro variables. That range has a few recessions that you’ll need to consider

As your sample date range increases, the estimates look more precise, but the likelihood of incorporating a regime change (which makes your model either more complex or nonstationary, or both) increases. People often use 5-8 years of data because that range should cover approximately one business cycle. They will then try to get weekly or daily data to increase N and improve the precision of estimates. There is a point of diminishing returns. It probably does not make sense to take 5 years of minute-by-minute data, unless your model specifically incorporates something that requires that precision.

To some extent, it depends on what you are trying to measure.

What are you using the data for? The world has changed a lot since 1983! A lot of things might not be applicable any more.

The sample you use has to incorporate major political and economic events that have taken place. For example, the period before and after currencies were allowed to float.

Commodities are especially tricky given their sensitivity to monetary policy. I’d argue it doesn’t make much sense to look before 2008.

depending on what you want to do…i would go back to 1900 if i could…

I don’t think it’s too much, but you should identify the underlying regimes to break it up into samples. Obviously that’s really hard if not impossible…

The sample size depends on how noisy the data are, you’re probably going to want you’re data to be 1.645 standard errors for 1 tailed tests and 1.96 for 2 tailed tests. Commodities prices are more volatile than US equities, so you’re going to need a sh!t ton of data for anything conclusive. to solve for the number of years of data needed, use:

1.645 or 1.96 =(whatever you’re trying to test)/ (stddev/SQRT(n years))

solve for N years (you can rearrange this, Im just too lazy)

What exactly are you trying to solve, maybe we can help.