Anyone here pretty familiar with principal components (PCA) or interest rate/credit models might be able to help me out with some questions. 1) Say you have log excess returns for a bunch of stocks and the log risk-free rate of return as a matrix you perform PCA on. Since you choose factors in PCA based on how much variance they explain, does mean that log risk-free returns would normally not be one of the more important factors? If that’s the case, what’s the normal procedure in practice? Model the risk-free separately? 2b) Let’s say you have the above, plus a bunch of bond yields (let’s say the changes in government curve + changes in YTMs for a bunch of corporate bonds). I would guess this PCA might pull out factors that might be correlated with the market, two important government yields, and a corporate bond risk factor. There might be some non-linearity when trying to explain corporate bonds returns. But PCA assumes linearity. If I shouldn’t include these in PCA, any idea what I should do instead? 3) In Matlab, the coefficients from the pca functions do not sum to 1. This means that the factors you produce might have high correlations with an individual security, but they probably have much higher variance. So for instance, the interest rate factor might be highly correlated with one of the yields, but have quite different variance. Is there any advantage to making the coefficients sum to 1 so that the factors more closely reflect the underlying securities they are highly correlated to?

1 - Depends what you’re trying to do with the result of the PCA. If you’re trying to cluster your securities based on their exposures to the difference PCAs, then you don’t need to worry about the risk-free rate at all. If it’s important, it will come out as one of the dominant Eigenvectors. 2 - If the linearity in the PCA is troubling you, then use ICA ( independent component analysis). that’s not a function in Matlab, you’ll have to program it, but it’s pretty easy to program ( a day maybe max). ICA does not minimize variance, but rather looks at maximizing independence using Kurtosis as a measure of independence. 3 - There is no value in having the coefficients sum to 1 ( assuming the PCA is not your end point as I said earlier) and that you’re using it as a way to cluster your universe. I’m not sure what you’re doing but this whole PCA thing been beaten to death sine the mid-80s, so unless you have access to some new data, don’t waste too much time on it.

Thanks for the help. I was playing around with dividing the coefficients last night and I wasn’t getting the right answers anyway. I’ve heard good things about ICA in some of the papers I was looking at. Just googled and found an implementation here: http://research.ics.tkk.fi/ica/fastica/ I’m not doing it because I think it will necessarily add value. Doing it more because it can produce a covariance matrix for a bunch of stocks as well as much more complicated models without much work. Since estimating an ARMAX-GARCH model for 3000 stocks (using the equity market as the X) would be slow, it seems easier to simplify the problem with PCA and just model the factors really well. Also, I figured out last night how to clean the factors and put the returns back together for an adjusted cleaned series. Much faster than when doing it with a crap load of securities.