QuantVolk, question about Augmented Dickey Fuller Test

It’s a little embarassing to ask this, given that I do a fair amount of econometrics, but I’m puzzled by something. I tend to do more cross-sectional analysis than time series analysis, and so some of my knowledge here is rusty.

I am analyzing the historical performance of the Case-Shiller housing index, using data from Robert Shiller’s website:

http://www.econ.yale.edu/~shiller/data.htm

Specifically from this file here:

http://www.econ.yale.edu/~shiller/data/Fig2-1.xls

I’m using R and the “tseries” library. The goal is to create some kind of model for forecasting long-term (10+ year) CS index. It seems that there really isn’t much that is powerful for explaining past changes at this timeframe, other than inflation, which is a little surprising, but not super-surprising

When I’ve used more recent Case-Shiller data, such as that off of the S&P website - I find that the composite CS-10 index since 1987 (when the series starts) is integrated of order 2. I have to difference twice before the ADF test supports a conclusion that the series is stationary. That’s problematic, because it means there will be a ton of noise by the time one integrates back to the nominal Case-Shiller for forecasting, but of course the world doesn’t have to be simplifiable just because we want it to.

However when I run adf.test() on the nominal CS index from Shiller’s much longer-term data (Column i in the Excel link above, starting in 1916, because of other data limitations), I get a p-value of 0.065. This says that we can’t conclude that the series is stationary at the 95% confidence level, but it is pretty darned close (it would be stationary at the 93% confidence level). This was a surprise, because I had been expecting to find results similar to those I had with the post 1987 data.

But here’s the stranger thing. If you actually plot the value of the nominal CS index over time, it doesn’t look even remotely stationary There’s a huge trend upwards as 100 years of inflation (among other things) has its effects. I’m very surprised that the ADF test came so close to concluding stationarity and trying to understand why.

I understand that if something LOOKS stationary, it might not be (that’s part of what the ADF test is for), but I didn’t think that something which - on inspection - looks highly trending would be able to pass it. This is strange to me.

Can any other quanthead explain what is happening? How is it that a highly trending series ends up potentially passing a stationarity test without having been detrended?

tests are not perfect. AFAIR the DF test uses a chi-squared distribution which tends to suffer with large data samples. it is always good to look at the data. try using the ACF/PACF functions to see if you still have a significant ARIMA. if so, run the appropriate arima, if not try demeaning the data. after demeaning run another DF test and compare the p-values.

in both cases check all diagnostic plots (especially QQ-plots) and maybe eliminate obs with large cook’s distance and leverage.

Thanks for the reminders. As you implied, these things are always more complex with real data. The raw data ACF looks like it has very long-memory effects (makes sense for an illiquid asset), and it’s even longer when doing an ACF of the logged values (ADF test strongly concludes non-stationarity for logged values of the CS Index before any differencing is done).

After one differencing (of logged values), the series looks like it drops off quickly in both ACF (lag of 3) and PACF (lag of 1). So maybe it is integrated of order 1 after all, just not detectable after 1987.

Regime shifts in the interim are likely to make a mess (postwar GI Bill, changes in lending practices, etc.), so maybe that is what is happening in later-term data, whereas the long-term series still shows up as quasi-stationary.

I’m still trying to understand intuitively how there could be such a dramatic difference between the test results and the visual inspection. I understand that spurious correlation/regression is a possibility for a non-stationary series to read as stationary, but I would think that in a spurious regression case, the data would at least look plausibly nontrending.

you should look into the diagnostic plots for that, sometimes a few outliers can mess up the whole model. run a jackknife procedure for your parameters and determine which obs throw them off. here’s the code I use, I guess you know how to adapt it:

par(mar = c(5,5,0.5,0.5), cex.lab = 1.5, cex.axis = 1.5, mfrow = c(2,1)) plot(lm.influence(lm1)$coefficients[,1], xlab = “Index of omitted observation”, ylab = “Intercept”) plot(lm.influence(lm1)$coefficients[,2], xlab = “Index of omitted observation”, ylab = “slope”) par(mfrow = c(1,1))

what estimation method are you using by the way?

Right now, just OLS and loglinear models. I’m still exploring the data.

My model is that real Case-Shiller price changes are basically linked to median average family income (though the only data I can get on that is real wage data as a proxy) and to interest rates, which I use by taking 10y treasury rates and calculating the monthly cost of a 30y mortgage on $100,000 at that rate (also a crude measure, and probably should include risk premium too, but still decent as a first order approximation).

Although some coastal cities will have limited area for housing unit expansion, I expect the inland areas to take up the slack, so new unit construction will increase to accommodate demographic expansion. Thus, I don’t expect population demographics (other than perhaps age/earning cohorts) to have much effect on a nation-wide housing price index, even if they do have important effects in certain localities.

So basically I try to predict real home price changes via changes in real income and the changes in the cost of owning $100k worth of home. Then I add inflation expectations to get the nominal forecast, which probably will come from TIPS-Treas spreads when applied in a forward-looking context. Ultimately, there may be ARMA effects in there too, but I’m hoping that after 10 years, they may wash out and not be so important.

If there is a deterministic time trend, the model should include a slope and intercept. In most stats packages this should be an input when you specify the parameters for the ADF test - should be part of this library as well.

the more I think about it, the more I am convinced that TS is not the right approach. have you ever considered a random slope model?

I assumed that unless I set a deterministic time trend, the default would be set to zero. However, it’s possible that the function’s default behavior automatically tests for a trend and removes it. That would explain what happened.

I’ve checked and the documentation says: “The general regression equation which incorporates a constant and a linear trend is used and the t-statistic for a first order autoregressive coefficient equals one is computed. The number of lags used in the regression is k. The default value of trunc((length(x)-1)^(1/3)) corresponds to the suggested upper bound on the rate at which the number of lags, k, should be made to grow with the sample size for the general ARMA(p,q) setup. Note that for k equals zero the standard Dickey-Fuller test is computed. The p-values are interpolated from Table 4.2, p. 103 of Banerjee et al. (1993). If the computed statistic is outside the table of critical values, then a warning message is generated.”

So that seems to explain it. however, there seems to be no way to set the deterministic time trend to be zero with this function. I may need to load a different implementation in order to test it, or go ahead and do it manually.

I am not familiar with random slope models, but am looking them up now. On first blush, it seems that they are not that different from I(2) models, except that they presumably allow for random intercepts as well.

I appreciate the reference, though. Thanks.

What is it that makes you think TS is not the right approach. I’m curious what you are sensing here that perhaps I’ve missed.

you noted that you would be having one non-stationary dependent and at least two non-stationary independet vars, so you would need a double cointegration for your model to work properly in a time series context. if you treat the data more like a stochastic panel - and therefore use a random slope model - cointegration is not needed. however, you would need a GLS or a FGLS estimation model instead and their predictive capabilities are rather dismal.

in the end it comes down to how well you believe this long-term forecast will work. what I am suggesting are techniques used in short-term macroeconomic data analyses and those are not fit for long-term forecasts. maybe you are trying too hard to find a model for a forecasting horizon that is beyond every statistically meaningful forecast horizon. long story short, you might just be fine with a simple first differenced OLS because those forecasts are wild guesses anyway (maybe run a GARCH procedure with the OLS model as mean model).

Quite honestly, I am leaning toward a model that says housing prices track inflation and have some short-term ARMA behavior and that’s about it. From a total returns perspective, you do get real investment returns from the rent values, but - aside from the recent property bubble, you don’t see much real appreciation going on over the long term.

The anomalies to this seem to be a housing price depresion during the interwar period and then the most recent bubble. I seem to detect some effects of real income and interest rates, but these are substantively just tiny corrections compared to inflation, and a more parsimonious model may be far more practical without sacrifcing much in terms of predictive or explanatory power. ARCH or GARCH may be called for though, as there is definitely volatility clustering.

I’m being asked to do this by a group who wants to investigate LEAP-like contracts on case-shiller indexes, collateralized by real homes. However, since the underlying assets are basically illiquid, a replicating portflio that is dynamically traded is not possible. This basically leaves discounted expected value as a valuation mechanism, but we still have to come up with an expected value and dispersion estimates at different long-term time frames, and then identify a sensible discount rate.

It’s a fascinating problem, but it’s no fun to come back to them and tell them I can offer them an estimate, but with solar-system sized error bars, particularly since larger bars are going to mean they have to pay more to construct the LEAPS.

Look into using a VAR model or a VECM model. Either will work, just depending on whether you find the variables to be cointegrated. I beleive that you can use the VAR model if there is no cointegration (may need to first difference or log-transofrm data…or both). VECM will be similar to VAR except includes error correction term. Hence it is more intuitive that you can use variables that are cointegrated. VECM is a more powerful tool when attempting to predict long-term trends (especially if you need to first-difference/lof transform the data and opt to use the VAR).

You can use the AIC criteria to determine the optimal lag to fit the model and use Johanssen test to determine the order of integration.

If you are not looking to quantify the the results, but rather establish a directional influence, you can use a Granger-Causality test.

Someone please correct me if I’m off base, I am rusty with econometrics.

I generally try to avoid VAR and the like because it always struck me as just throwing a bunch of variables into an equation without necessarily having a theory, and thus highly prone to data mining, but in this case it may be worth a shot. Also, affordability ratios and rental yields might make sense in an error correcting model (though I don’t think I can find rental yield data going back very far).

Lots of useful stuff to think about here. Thanks.

It’s your duty as a charterholder to point out the massive uncertainty entailed in such calculations. Overfitting the model just to push the price of the resulting product is irresponsible. Given that your underlying IVs will have linear forecasts beyond some point, your whole model will forecast a straight line beyond that point. In light of the last financial meltdown such a simplistic and overfitted model should not be used in asset pricing beyond 2 years - especially not in real estate.

I know my responsibilities on that score, and agree, but thank you for backing me up on them. This is why I think projecting real appreciation rate of close to 0% plus some short term ARMA wobbling around that figure is not such a bad model to run with.

It’s just a more pleasant conversation when I can come back with a more useful answer than “I wouldn’t trust my own model beyond 2 years, even though I know what you wanted from me was more like 10 years.” There’s only so much blood one can squeeze from noisy data. Try to squeeze more, and the blood gets all over your P&L.

So I sometimes I like to imagine I’m just doing it all wrong and that there may be some better answer out there, using approaches that were developed or better disseminated after I was in school (and there are more and more of those appearing with each passing year). Part of my inquiry here is to double check on that.