Design Patterns for Portfolio Backtesting

I’m wondering if others on this board have some sources/insights into design patterns that are useful for doing portfolio construction and backtesting in R or MATLAB (or even in general). Most of what I do is tactical asset allocation, although potentially I’d want to do more screening, which would involve a more dynamic selection of assets. I’m able to code stuff to do this work and get the results, but can’t help feeling that I am organizing data inefficiently, and that my data objects are either not flexible enough or, alternatively, are scattered all over my digital workspace. For example, a portfolio history has, at a minimum, a sequence of asset exposures over time. If we have a sequence of asset returns as well, we can tell the object to spit out its exposures, or just the total portfolio returns. But should the asset returns be its own object, or should I just hold them in the portfolio object too? In Java, I might use the strategy pattern for this, but Im not sure how to implement in R or MATLAB. Anyone know of a reference or source that can help with good coding/process design specifically oriented to these types of problems?

bchadwick Wrote: ------------------------------------------------------- > are scattered all over my digital > workspace. > > what does that mean ? Why the mambo jumbo ? If you want answers you have to write in English that non-natives can understand. Can you rephrase your question in 2 sentences ( short ones)?

Does anyone have some good reference material (books, short papers, websites, etc.) that address good codin practices specifically for handling problems in portfolio construction, backtesting, and simulation. Not things like “this is how you calculate a return,” but things like “if you organize your dataset and calculations this way, it’s easier to update your system or change your formulas without having to rewrite everything or make an entirely new dataset.”

Why R/MATLAB? I would recommend Q over R/MATLAB, though if you’ve never programmed in an array or functional paradigm, you’ll have a steep learning curve. Perhaps just use Java. I’m not sure what “flexible data” means. Data should just be in flat files or a database. Organization and normalization are important, as is metadata like schemas. Flexibility can come in the tools that manipulate data/metadata, but these should be separate from the data itself. Traditionally this area of CS is called the “Data Model”, but there is no one “good” solution. Translation between objects relational database is a classic problem in computer science. (Google “object-relation impedance”.) Many reasonable people differ here but perhaps a good place to start is to look at an ORM solution. For Java, this might be Hibernate. Asset returns should be separate from portfolio returns.

Thanks Justin88, this is a help. I just find myself using R the most, and Matlab is fairly similar. I’ll see what comes up under “object-relation impedance” As for “flexible data,” I meant “flexible data objects.” So if I have a portfolio history which is basically a table of asset weights by date, I might want the object that holds this to be able to produce an array of portfolio returns, or the portfolio return at time T, or maybe just the asset exposures at time T, or the exposure to asset X at time T. I know more or less how to do it in Java, but I don’t know how to do it in R or Matlab. I figured that this couldn’t be the first time that anyone had ever encountered issues like what is the best way to organize your data for portfolio analysis (outside of Excel), and thought that I’d rather see how other people do it before trying to reinvent the wheel.

Well, are you using R or MATLAB? No reason to buy two hammers if you only have one nail. You seem somewhat familiar with object-oriented software engineering (Java) but not so familiar with array/matrix programming. (Nothing to be ashamed of, these are very different skills.) There may be a ramp up period where you are learning not only the language but a different programming paradigm. While programming in R or MATLAB can be convenient for simple tasks, nontrivial software engineering (like a portfolio backtesting engine) is significantly more difficult.

this sounds like that TD ameritrade commerical… “Backtesting” “Backtesting” “Backtesting” lame.

I use R mostly, but ocassionally I have clients that want me to use MATLAB. You bring up a good point. I guess I’m using OOP ideas when I really need to be doing a different programming paradigm. So maybe the array/matrix paradigm what I need to bone up on. Basically, I started prototyping this system in Excel. That works fine for quick-and-dirty and exploratory stuff, but it will clearly be an implementation and error-prone nightmare if it is going to be executed in Excel. If I want to add assets to the model, it is also a nightmare to add the right columns everywhere in the spreadsheet. It would clearly be very easy to forget to update one sheet and mess everything else up. True, I could write a VBA macro, but even that would not really be all that robust. So I moved to R as a language that could be more automated than Excel but requires less “building everything from scratch” than Java. It’s also nice that you can use matrix multiplication rather than loops to turn asset exposures into portfolio returns (it’s a tactical asset allocation strategy, so we have relatively few assets compared to a stock picking algorithm - although I have another potential client that would want me to do stock selection and portfolio construction, and I’d like to be prepared for that too). So far, I’ve done ok with R scripts that I have in a txt file library, but when I do that, I create a whole bunch of data objects in R’s workspace with matrices of asset exposures, asset returns, portfolio returns for the strategy, portfolio returns for comparison, benchmarks like 60/40, equal-weight, etc… When I type ls(), in order to see what they are and which ones I want to plot to generate a graphic, I think “what a mess… there’s got to be a more sensible way to organize this data and still get the outputs I need.” Nothing so far has made it impossible to do what I need to do… I just keep thinking that I can’t be the first person ever to encounter this issue, and maybe I can learn something from how others approach these kinds of tasks.

builders Wrote: ------------------------------------------------------- > “Backtesting” Backswing! Err… backtesting.

I’ll keep it short and skip the mambo jumbo. You need two tables, one for stock (and the likes) data and the second for portfolio composition data. Matlab reads portfolio composition first then gets the historical prices for the desired time frame from the stock table.

I’ve been doing a lot of stuff in R lately. I’ve been using the xts package for organizing my time series data. The PerformanceAnalytics package uses xts which is why I went with that (even though I don’t find myself using the stuff from PerformanceAnalytics that much). This is pretty straight forward to keep track of a portfolio so you can get the exposures at a given time or over the whole period you have data. I find the most annoying thing to be accounting for NAs. Another thing to try would be something like if you have two variables (say prices and returns), you could combine them into data.prices and data.returns. I’m not sure how much this would help, but I suppose there could be advantages in keeping track of stuff. If you plan on using Matlab, there’s a feature there that I didn’t know about until recently called a dataset array that is incredibly helpful. For instance, if you have prices and P/Es for a bunch of companies, you can incorporate it into one dataset array and just pull out what you need. One thing I do in Matlab when testing different portfolios is to use a multidimensional array so that the first element in the third dimension might be the benchmark weights through time, the second will be an equal weighted portfolio, and the third will be my strategy. I haven’t tried this in R yet, but I imagine something similar is doable. Let me know if you have any more specific questions.

mo34 Wrote: ------------------------------------------------------- > I’ll keep it short and skip the mambo jumbo. You > need two tables, one for stock (and the likes) > data and the second for portfolio composition > data. Matlab reads portfolio composition first > then gets the historical prices for the desired > time frame from the stock table. Ah, I got it. So there are some numbers and you multiply them. Thanks, that helped a lot. The issue is that generating the portfolio weights is a multistep process that generates their own intermediate tables of data. Most of the time, I can ignore those, but sometimes I need to look at them for a variety of reasons. Sorry to mambo jambo you, but I had wanted to encapsulate those, or maybe find a way to derive them on the fly.

jmh530 Wrote: ------------------------------------------------------- > I’ve been doing a lot of stuff in R lately. I’ve > been using the xts package for organizing my time > series data. The PerformanceAnalytics package uses > xts which is why I went with that (even though I > don’t find myself using the stuff from > PerformanceAnalytics that much). > This is pretty straight forward to keep track of a > portfolio so you can get the exposures at a given > time or over the whole period you have data. I > find the most annoying thing to be accounting for > NAs. Another thing to try would be something like > if you have two variables (say prices and > returns), you could combine them into data.prices > and data.returns. I’m not sure how much this would > help, but I suppose there could be advantages in > keeping track of stuff. > > If you plan on using Matlab, there’s a feature > there that I didn’t know about until recently > called a dataset array that is incredibly helpful. > For instance, if you have prices and P/Es for a > bunch of companies, you can incorporate it into > one dataset array and just pull out what you > need. > > One thing I do in Matlab when testing different > portfolios is to use a multidimensional array so > that the first element in the third dimension > might be the benchmark weights through time, the > second will be an equal weighted portfolio, and > the third will be my strategy. I haven’t tried > this in R yet, but I imagine something similar is > doable. > > Let me know if you have any more specific > questions. This is really helpful. Also, I really like the idea of the three dimensional array, good, because I can have different strategies, cash, and a benchmark all in the same array and would to be able to compare them by having each as a separate layer of time/exposures. Now, it would be nice to have a way to match a portfolio construction algorithm to the the strategy, so layer 1 of the 3D array would be be all cash, layer 2 be, say, a rebalanced 60/40 portfolio, and layers 3+ dedicated to different strategies. Each layer representing a table of dates and allocations.

Doing it with a different set of dates will be tricky, but you can pull out like weekly instead of daily or something like that. If you really wanted datasets with a different array of dates, you would have to use cell arrays which are slower.

Yeah, mine is a low frequency strategy, so weekly and monthly are fine.

bchadwick Wrote: ------------------------------------------------------- > mo34 Wrote: > -------------------------------------------------- > ----- > > I’ll keep it short and skip the mambo jumbo. > You > > need two tables, one for stock (and the likes) > > data and the second for portfolio composition > > data. Matlab reads portfolio composition first > > then gets the historical prices for the desired > > time frame from the stock table. > > > Ah, I got it. So there are some numbers and you > multiply them. Thanks, that helped a lot. > > The issue is that generating the portfolio weights > is a multistep process that generates their own > intermediate tables of data. Most of the time, I > can ignore those, but sometimes I need to look at > them for a variety of reasons. Sorry to mambo > jambo you, but I had wanted to encapsulate those, > or maybe find a way to derive them on the fly. Regarding the output table where Matlab will store the results. In that table you can add columns describing your portfolio. For example you can add a column “stage” and store a value (1,2,3, … ) depending on the stage of the portfolio construction. I personally also store the date generated in a separate column, you can add as many columns as needed to describe your portfolios. This is much easier than using data constructs in Matlab.

bchadwick Wrote: ------------------------------------------------------- > This is really helpful. Also, I really like the > idea of the three dimensional array, good, because > I can have different strategies, cash, and a > benchmark all in the same array and would to be > able to compare them by having each as a separate > layer of time/exposures. Why a 3D array? It seems like you should just create new 2D arrays as you see fit, making sure that they are indexed consistently. That way you can add and delete “layers” dynamically without much of a performance hit.

^ Awesome! These are the kinds of things I was hoping to hear.

The best reason I have to use multidimensional arrays when doing some of this stuff is because it simplifies the code. For instance, if I have a function that creates portfolio returns and performance statistics, it’s less code to do everything in terms of a multidimensional array than it is to call the function for every portfolio.

justin88 Wrote: ------------------------------------------------------- > bchadwick Wrote: > -------------------------------------------------- > ----- > > This is really helpful. Also, I really like > the > > idea of the three dimensional array, good, > because > > I can have different strategies, cash, and a > > benchmark all in the same array and would to be > > able to compare them by having each as a > separate > > layer of time/exposures. > > Why a 3D array? It seems like you should just > create new 2D arrays as you see fit, making sure > that they are indexed consistently. That way you > can add and delete “layers” dynamically without > much of a performance hit. Dim 1 = time index Dim 2 = asset exposure Dim 3 = portfolio/strategy You could then set up an array of functions that generate portfolio weights for each of the strategies on dimension 3. Then use a list structure ® or cell array (Matlab) to turn that into a single object. Neat; we’re getting close to what I was hoping for in this thread. :slight_smile: