Portfolio standard deviation

data_storm · February 26, 2015, 7:09pm

Hello guys this is ahmed from Egypt, I’m working as an investment analyst in the asset management field Apparently today while I was calculating the standard deviation, co variance, and the VAR to update my efficient frontier, i found something that i cant understand and adjust within the model and i became skeptical with the results produced. Simply i have around 80 stocks in this portfolio and the problem is that i have two stocks that don’t have the same time span as the others. so in the excel spread sheet i have data “returns” for ( X,Y,Z,A,B,C) Stocks for the last year, and for (O,M) stocks i have returns for just 6 months. (this is because they weren’t listed and outstanding for the whole last year), clearly i added (ZERO) in the cells that i don’t have data for. now i feel that these (Zeros) are not suitable because they just distort the standard deviation produced. please if you have and ideas on this let me know.

CEBU_BUKO_FINANCE · February 26, 2015, 9:01pm

My suggestion is to abandon that efficient frontier concept, it’s pretty where it lies: in the finance book.

Destroyer_of_Worlds · February 26, 2015, 9:04pm

Ahmed, inserting zeroes for past returns will certainly dampen the calculated volatility with the stocks for which you have missing history. FYI, in Excel, standard deviation functions (STDEV, STDEV.P, etc.) will overlook blanks, but if you physically put zeroes in, it will take those as actual observations (i.e., a series of zeroes is a pretty stable entry, so this will lower your vol estimates). But I don’t really think this is what you’re asking…

Not knowing anything about the sophistication of any estimator tools you may have access to, if you are asking about the philosophical aspect of what you are doing, the short answer is you have two options:

Shorten the history of all the stocks to have the longest common history of returns, or,
Calculate the volatility of the shorter series and assume that these figures would have had the same parametric characteristics, projected into the past.

data_storm · February 26, 2015, 9:20pm

Mr. Cebu, Come and tell my manager Seriously, do you have any other alternative?

data_storm · February 26, 2015, 9:28pm

Dear: MR. Destroyer of worlds thanks for taking time to reply to my question. Obviously yes these zeros went to lower the absolute risk factor (St. dev) and yes I am asking about the way i should seek to get over this problem. I will try to shorten the periods to match the shortest one and tell you the update. Thanks Sir.

Katalepsis · February 27, 2015, 12:51am

I agree with Cebu here especially considering the number of individual stocks that you’re using in the optimization. In my opinion the frontier allocations are going to be driven by noise but I’m curious what others think. I think an efficient frontier is much more interesting when applied at the asset class level…

That said, I would be careful shortening the periods to match the shortest one if that forces the timeframe to be less than a full business cycle. You want the inputs to reflect your forward looking expectations and using less than a full market cycle will introduce a bias. At the asset class level I’d recommend using a proxy benchmark to fill out the return stream to be able to do this, not sure what the best practice would be at the individual security level.

jmh530 · March 2, 2015, 5:48pm

There’s actually a pretty big literature on how to handle missing data. There are a number of different approaches that work pretty well. I’m a fan of EM algorithms, multiple imputation, and full information maximum likelihood.

Most of the more sophisticated approaches are less conducive to excel. Something similar to FIML might be helpful. FIML can operate similar to what Katalepsis recommends. FIML is more about coming up with the mean and covariance matrix. You estimate them for the series with the longest history and then you regress the shorter history series against the longer ones in steps until you complete the covariance matrix. Unlike what Katalepsis recommends, this is like generating the missing data based on the results of the regression. So, you should also be incorporating if securities are high/low beta or have high/low idiosyncratic variance.

You could do this with the 6 stocks against each other, but I’d probably recommend throwing in some broad market index in there as well.

Destroyer_of_Worlds · March 3, 2015, 12:32pm

^ This method swaps one source of bias (sampling error) for another (omitted variable bias). Unless the data are truly random and Gauss-distributed in reality, the generator process outcome will water down any non-iid correlation information that exists between the shorter series and the longer series (by assuming that the joint distributions are normally-distributed with no significant third- or fourth-moment influences or ordering problems). This is especially true for individual stocks tied to idiosyncratic company- and sector-level factors.

In addition to the inherent problem of unstable correlation matrices between the data series on the missing gap, there of course are still the parameter simplification issues baked into a standard mean-variance optimization approach (which Katalepsis is trying to communicate) – but, alas, these potential errors are ignored for the sake of the fact that the OP’s boss is demanding an output.

My guess is that the OP’s timeframe and resource limitations will prevent him from implementing a FIML approach or any other such “backfilling” technique; never mind implementing the proper calculations to tally, across the entire model, what the total potential estimation errors are likely to be.

jmh530 · March 3, 2015, 3:36pm

True, but what’s the alternative for someone who wants to do this in excel? More data means better estimates. For instance, what if it’s March 2009 and you’re looking at U.S. finance stocks plus Visa (which IPOed in early 2008). If you limit yourself to only the history where all the data is available (the one year or so ending in March 2009), then you would get very very wrong estimates.

Destroyer_of_Worlds · March 3, 2015, 4:17pm

This is true in some, but not all cases. Inventing a data stream with distribution parameters with no historical way of knowing whether that asset would have behaved like that at some past point in history introduces a completely different host of errors into the model. One has to ask themselves if the potential errors of generating filler data are less than the potential errors of relying on inadequate sample size.

MrSmart · March 4, 2015, 7:48am

Which asset management firm do you work in?

onlysimon · March 4, 2015, 9:52am

try to get some grey market prices pre ipo, even if you can only find/estimate a few datapoints it’s better than a random backfill.

data_storm · June 21, 2015, 10:56am

Apparently, i lost my password and couldnt get back to you.

the point is do you have any idea on how to handle the EM algorithms, multiple imputation