DoubleDip

FXjunkie · June 6, 2008, 7:54pm

Hey, you sent out some great stuff on time series y’day - do you have a complete summary? I don’t have enough time to cover it tonight - am finding it tough to digest Schweser/CFAI explanations at this stage. I found your guide to be much simpler to follow. any help would be appreciated. thanks

DoubleDip · June 6, 2008, 11:28pm

Hi, so glad it helped. What specifically are you stuck on? So let’s just go over the problems with time series. First, remember that the mean of an AR(1) = b0/(1-b1). Usually, I just look at b1. If it’s close to 1, I don’t do any more tests but ideally, you should use the t-test (b-1)/s. If you end up with a unit root, it’s OK, but you can’t use the model as-is but have to subtract the first lag (this is called differencing). Also remember how to forecast. If you are forecasting quarterly sales, for example, and find that your model is something like SALES(t) = a0 + a1 SALES(t-1) + a2 SALES(t-4), and it is currently Q42003 and you are asked to forecast Q12004, I actually write the dates in for S(t-1) and S(t-4), so I’d write this out SALES(Q12004) = a0 + a1 SALES(Q42003) + a2 SALES(Q12003). That way I can’t really make a mistake. Then if you are asked to then compute a forecast for sales in Q22004, do the same thing: SALES(Q22004) = a0 + a1 SALES(Q12004) + a2 SALES(Q22003) where SALES(Q12004) are what you computed in the previous step. When I get tired and can’t think, writing out what I need to do really helps. Another issue with time series is serial correlation. You know about regular correlation in regular regression, serial correlation is similar, but in this case the series is correlated with itself. An example would be where I am trying to forecast S&P levels on prior S&P levels. This probably makes sense, since a high price yesterday tends to mean a high price today (you wouldn’t jump from 1500 to 200 say, but it’s probably more likely to stay within a narrow range of the current level.) To test for serial correlation you use the Durbin Watson statistic. This is about equal to 2(1-correlation). Know that when correlation = 0, DW = 2. When correlation = 1, DW = 0 so a value of DW between 0 and 2 might be cause to think there is positive serial correlation. When correlation = - 1, you’ll get DW = 4, so a DW between 2 and 4 might make you think there is evidence of negative correlation. But, we’d have to check a table of DW values to see for sure. Here is an example. I am looking at S&P levels over time (t is the month) I made up the data but assume it looks like this. I did a regression on S&P return (at time t) = a0 + a1 S&P return (at prior time), and found the formula S&P(t) = 1996.9 + 3.77 S&P(t-1). Here are my calculations. t S&P Level Predicted Level Error e(t) (e(t) - e(t-1))^2 e(t)^2 1 2000 2000.709 -0.709 0.503 2 2010 2004.485 5.515 38.741 30.417 3 2005 2008.261 -3.261 77.014 10.632 4 2015 2012.036 2.964 38.741 8.783 5 2016 2015.812 0.188 7.705 0.035 6 2009 2019.588 -10.588 116.117 112.103 7 2025 2023.364 1.636 149.432 2.678 8 2022 2027.139 -5.139 45.911 26.413 9 2045 2030.915 14.085 369.571 198.383 10 2030 2034.691 -4.691 352.529 22.005 Sum 1195.762 411.449 DW Statistic 2.906222909 I calculated the DW statistic, but you won’t have to on the test (they say). It’s 2.9, so I am thinking there might be evidence of negative correlation but I can’t know for sure until I look up in a table. For 10 data points and just one predictive variable (predictive variable being the prior day’s level) I found dl = 0.88 and dU = 1.32. If your statistic lies between these bounds, the test is inconclusive, if you are above the upper bound, you have evidence of negative correlation. 0-------dLP------dUP-----2-------dLN----dUN-----4 The above line shows all values the DW statistic takes on. To interpret it, if you are between 0 and dlP, there is evidence of positive serial correlation. Between dLP and dUP, inconclusive. Between dUP and dLN, none. Between dLN and dPN, inconclusive and between dUN and 4, strong evidence of negative correlation. So first you detect the problem using the appropriate test, then you correct it if need be. For AR models, you are not supposed to use the DW statistic to look for correlation between the errors between the predicted model and the actual data. It’s a real problem if errors are correlated with each other, since this is something you have not captured fully in the regression. Serial correlation can lead to smaller estimates of the error, and so when you do a t-test, you’ll get apparently too large t-values, and you’d think something was significant when it might not be. I am trying to think of what else could be on the exam … just be careful forecasting as I can imagine that would be on there. One more thing. This isn’t just for time series but for any regression. Just because you get a high R^2 does NOT mean that the right hand side variables “cause” the left hand side. You need to explore further. Remember also that multicollinearity can cause high R^2 and so can “data mining” where you throw in everything but the kitchen sink to try to explain some behavior. One time on an interview someone asked me, “A trader comes to you with a new model that he says will predict S&P 500 returns with an R^2 of 0.99. Are you going to trade based on this?” The thing to remember is “CORRELATION DOES NOT IMPLY CAUSATION.” If the model is misspecified, you might think something explains something when in fact there are model misspecifications, correlation and so on … what if every weekend, it rained and you ran a regression showing % chance of rain = b0 + b1 * X where X is an indicator variable = 1 if it’s a weekend, and 0 if not. You get the result % chance of rain = 5 + 2 X which would mean 5% chance of rain on any given day, but 25% chance if it’s a weekend, and you get a high R^2 and think that your model is good, but, what if rain and weekend are related to some other variable completely? Or, what if rain CAUSES weekends and not the other way around (weekends cause rain). I know this particular example is ridiculous, but it’s another example of problems that can occur when people think that the right hand side causes the left hand side. Good luck!!! I love this link for identifying orders of ARMA series, but it’s something to reserve for the second week in June (a schweserism I like) http://www.duke.edu/~rnau/411arim3.htm Here’s another link http://www.quantcandy.com/blog/wp-content/uploads/2008/01/time-series-review-final.pdf Also remember the 6 assumptions of linear regression.

heeralm · June 6, 2008, 11:30pm

wowww good timing i just started to review quant for the final time

DoubleDip · June 6, 2008, 11:40pm

Some things I would remember: definitely t test (b - bo)/s, standard deviation is the square root of variance, how to compute covariance, F test, what is numerator and denominator for F test (k, n-k-1), careful with the lookup tables (F test is one tailed, t test might be two tailed), know RSS, SSE, what to divide by to get means, (number of predictor variables in regression = k so that’s the df for RSS, and n-(k+1) for unexplained variance), F = MSR/MSE … what else. correlation = sqrt of R… know adjusted R^2 (R^2 always increases when you add new explanatory variables even if they have no power to explain, but adjusted R^2 includes a penalty for a new variable, R^adj may or may not increase when you add a new variable). I would be able to construct an ANOVA table too. I would do that first. And how to compute confidence interval for the predicted variable y. Also know that if the t value you calculate is greater than the critical value you look up, that is strong evidence of significance. You can quickly calculate the t value from your coefficients by taking the coefficient and dividing by the error. Then, you can quickly determine whether significant just by comparing to t critical. If I look up a critical t value of 2 and I have calculated t statistics of 1.9, 2.5, and 4, the first one is statistically INSIGNIFICANT and the last two are significant. And, the best estimate for the expected value of a variable is the mean.

PhillyBanker · June 7, 2008, 12:00am

wait im confused when you say you can use the DW to test for serial correlation of the error terms, and then later you say you are not supposed to use DW to test for serial correlation between the predicted model and actual data. My quant is very weak, though I don’t recall that being within the scope of the LII exam. Could you please explain?

over05 · June 7, 2008, 12:04am

Doubliedip, I am not 100% clear about first differencing…it is still a little confusing for me. For example, which one below do you consider first differencing? Suppose the exam ask you to do first differencing, which one do you choose? My understading is that the first one is for testing unit root (it is also first differencing, but only for detecting unit root), the second one is what you need to build a correct model after you detected something to have a unit root. Which is correct for 1st differencing: x(t) - x(t-1) = b(0) + (b1-1)x(t-1)+e(t) x(t)-x(t-1) = b(0) + b1(x(t-1)-x(t-2)) + e(t)

ozzy609 · June 7, 2008, 12:12am

As stated above, just remember that you can’t use DW for autoregressive models. You must use a t-test in that situation: r/(1/T^.5)).

PhillyBanker · June 7, 2008, 12:13am

ahh. Okay. DW for time series…

deep2002 · June 7, 2008, 12:22am

remember folks that for multiple regressions, a positive serial correlation gives you type I errors. I’m not sure if we have to change the model if there is negative serial correlation. Anyone know? That was a great post DoubleDip

Turkish · June 7, 2008, 12:24am

Excellent timing doubledip…time series was kicking my a$$ and we know we’ll see at least something on it tomorrow.

ozzy609 · June 7, 2008, 12:49am

PhillyBanker Wrote: ------------------------------------------------------- > ahh. Okay. DW for time series… Careful here, I think you can still use DW for trend models, but you can use it for AR time series models.