Principal Component Analysis

is anyone here familiar with conducting principal component analysis? I’m trying to run this on the US yield curve to come up with fewer uncorrelated variables. I understand that the first component produced is a shift, the second is a twist and the third is a butterfly. and that these first three components usually represent most of the variability in my original variables. just wondering how to interpret the results. so i now have three new variables that are linear combinations of the components and the original variables. but i dont know how this translates into a parallel shift of the yield curve for factor 1, a twist for factor 2 and so on. any ideas? thanks!

just know that changes in the shape of the yield curve, however it occurs, explains something like 95% of a bond’s return.

The Principal Components basically sort out the directions of changes in descending order of maximum variance. In other words, the first component represents a kind of “common factor” explaining changes in all variables simultaneously. The second factor then finds the common factor that explains the largest amount of the remaining variability (after taking out the contributions of the first). The third does the same thing after taking out all variability that’s explained by the first two factors, and so on. So the idea is that a parallel shift in the yield curve should affect all maturities more or less the same way, so probably it explains the most variability in yields, and therefore the first factor will most likely represent parallel shifts. After this, the curve most likely steepens and/or flattens, so the second factor probably explains this. After that, the butterfly or bulging of the curve is the most common event, so it’s assumed (though you can check by looking at the factor loadings) that this would be factor three. If you have N points on the curve, you will have potentially N factors come out of your PCA analysis. Usually if you have > 3 factors, the next ones represent higher scale harmonics on the yield curve. But beware, because funky things can happen with the data, so don’t assume that factor 2, 3, etc. mean what I just said - check the factor loadings. It is pretty safe to assume that the first factor represents parallel shifts, but make sure you don’t have any negative factor loadings just to be sure.

You are correct in identifying the 3 factors as below. Factor 1: Changes in the level of interest rates (parallel shifts in the yield curve) Factor 2: Changes in the slope of the yield curve (twists) Factor 3: Changes in the curvature of the yield curve (butterfly shifts) As per the regression analysis by Litterman and Scheinkman, Factor-1 explains almost 90% of the observed variation in the total returns for all maturity levels. Factor-2 explains about 8.5% of variation and Factor-3 explains about 1.5%. HTH

I’ve read about PCA, but what computer program (excel? ) /guides do you use to do it?

I use a statistical software addin for Excel called XLSTAT-Pro to run PCA. A tutorial on PCA is available on the XLSTAT website on the following page: www.xlstat.com/demo-pca.htm

I’ve also used Excel add-ins. It’s also common to use SAS, S-Plus and so on. We solved this very problem in a class I took last term. Hope you can see this. see how Factor 1 (f1) is pretty much the same value for all y1 … y9, that’s how you interpret this as a parallel shift. Copy these into excel and plot f1, f2 and f3 against yields and you’ll see why f1 appears almost parallel (upward shift of about 44 bp), explains 95.22%, factor f2 is a slope change and so on. Principal Components f1 f2 f3 f4 f5 f6 f7 f8 f9 y1 36.47 13.65 4.67 4.40 1.31 0.30 .02 0.01 0.00 y2 40.62 7.75 2.52 4.08 0.70 1.76 0.09 0.03 0.00 y3 42.99 5.37 1.15 1.97 1.49 0.17 0.09 0.06 0.00 y4 44.09 3.48 3.45 0.35 1.21 1.10 .09 0.02 0.00 y5 44.21 1.33 3.90 0.64 0.14 1.35 0.05 0.04 0.00 y6 43.32 2.92 2.34 1.29 2.12 0.35 0.20 0.03 0.00 y7 42.58 6.87 1.11 .66 2.29 1.24 0.20 0.01 0.00 y8 41.31 10.07 0.10 4.79 1.61 1.57 0.08 0.00 0.00 y9 37.69 11.17 6.75 1.79 0.50 1.66 0.02 0.00 0.00 % explained 95.22 3.50 0.67 0.42 0.11 0.08 0.00 0.00 0.00 Cumulative % 95.22 98.72 99.38 99.81 99.92 100.0 100.0 100.0 100.0 All values in basis points (bp) First factor: (nearly) parallel shift Second factor: slope Third factor: curvature The first factor explains 95.2% while an exact parallel shift factor explained 94.8%

thanks guys, much appreciated! i guess im a little confused as to the difference between the components and the factors. when i run PCA, I get a component matrix that shows values of each component for each point on the yield curve (just like from DoubleDip’s post). i also get three new factor variables in my data set that have values for each date historically. i was thinking that the components are used to come up with the factors and then we somehow have to compare the factors with the original historical yield curves and see some sort of a parallel shift for factor 1, and twist butterfly etc. for the other two factors. if im looking at the components for parallel shifts of the yield curve, don’t I somehow have to compare these values with the actual observed yield curve values? any ideas? thanks.

The original yield curve data were used to compute change in yields delta y1 (for 1 year) etc, these were used to generate the factors. You don’t compare to original yields once you’ve done the analysis, but the analysis helps decompose the change in yields over a particular time period to various factors, in this case, 95.22% of the change in the yield curve (across all maturities) was due to a parallel shift. After the parallel shift is explained that leaves 100 - 95.22% or 4.88%. Of this, 3.5% was due to slope change and so on.

sorry guys just one more follow up: are we supposed to run PCA on the original change in yield curve data or on the correlation matrix of this data? im reading up on some PCA that is done on the correlation and/or covariance matrices instead, just wondering what the difference is. thanks!