# Autocorrelation

Quick question: In the Autocorrelations of the Residual section of a model why are there four lags (pg 369 CFA text)? Does each lag represent a different observation? For example, if there were 359 observations (as there are in this example) is there 359 possible lags (autocorrelations)? I read in the reading that they would typically select four to simulate a quarterly model and 12 for a monthly, but what happens when the observations occur over several years? Are these four (12) somehow taking an average autocorrelation of all quarterly (monthly) observations for that time period?

I’m horrible at statistics… but my understanding is that with 359 observations, you have 355 lags possible (if you’re lagging 4) i.e. observation #5 corresponds to #1, #6 to #2, and so forth. Therefore, you can only have 355 points of comparison. If I’m way wrong, someone please shed light on it after you’re through pointing and laughing.

Whoa. If you have 359 observations, you could have 358 lags (or maybe 357 for model fit reasons). That is, you could fit the model Y(t) = a1*y(t-1) + a2*y(t-2) + … + a357*y(t-357). Since you only observed two points that have lag 257, you are going to get a really terrible fit on a357 (among other reasons). Further, I believe an AR(357) model not at all ever. It will always be an overfit model. Now if we fit four lags we would have at least 355 observations to fit each parameter estimate which I am believing a lot more.

So for the 359 example with four lags you could have 355 residuals? And each residual would simply be the correlation run under a different set of observations? For example: Residual 1 (lag 1) = the error term captured when running the regression with observations 1 & 2. and Residual 2 (lag 2) = the error term captured when running the regression with observations 2 & 3. Residual 3 (lag 3) = the error term captured when running the regression with observations 3 & 4. …

TJR Wrote: ------------------------------------------------------- > So for the 359 example with four lags you could > have 355 residuals? Yes > And each residual would > simply be the correlation run under a different > set of observations? > > For example: > > Residual 1 (lag 1) = the error term captured when > running the regression with observations 1 & 2. > and Residual 2 (lag 2) = the error term captured > when running the regression with observations 2 & > 3. Residual 3 (lag 3) = the error term captured > when running the regression with observations 3 & > 4. … I think you’re missing the big picture here. When you run a regression you use all the data. All the data is used to estimate the parameters and then the residual of each individual observation relies on the model fit from all the data.

AR(1) model is used in the problem on page 369. In any regression model residuals can be examined. if residuals are e1 … e359, autocorrelation with lag 1 = correl(e1…e358, e2 … e359), autocorrel with lag 2 = correl(e1 … e357, e3 … e359) - obviously the longer the lag, the shorter the vectors used in autocorrelation calculations. Typically autocorrelation declines as lag increases. Therefore, it’s important to look at autocorrelations with small lags. 4 was used just as an example in the problem. The number could’ve been 3 or 5 or 10.

TJR Wrote: ------------------------------------------------------- > Quick question: In the Autocorrelations of the > Residual section of a model why are there four > lags (pg 369 CFA text)? Does each lag represent a > different observation? For example, if there were > 359 observations (as there are in this example) is > there 359 possible lags (autocorrelations)? It sounds like the original question was referring to why there were only four lags listed in the “Autocorrelations of the Residual” section of Table 5, Example 6. It notes in the example that they are only looking at the autocorrelations of the first four lagged variables. If we saw the entire model, we would see should see 358 lags as Joey noted above, which would take up an additional 10+ pages in the book for this example. > I read in the reading that they would typically > select four to simulate a quarterly model and 12 > for a monthly, but what happens when the > observations occur over several years? Are these > four (12) somehow taking an average > autocorrelation of all quarterly (monthly) > observations for that time period? I dont really understand this question…It sounds like you are comparing the AR(1) model itself with testing for seasonality in an AR(1) model. In this case, the 4th lag autocorrelation (for quarterly data) and the 12th lag autocorrelation (for monthly data) are each used to test for seasonality. See the second paragraph under “Seasonality in Time-Series Models” on p.389. As for why we would only test the first four autocorrelations in any AR(1) model, regardless of the number of observations, to ensure that a model is correctly specified, I could not find this in the reading. It does state that any given autocorrelation shows the correlation of the variable in one period to its occurence in the previous period. Therefore if each autocorrelation is dependent on the previous autocorrelation, we should be able to detect serial correlation within the first four autocorrelations for large samples? If anyone can provide clarification on this, that would be great.

The reason it’s not in the reading is that it is a bit more involved than that. Just having an autocorrelation doesn’t mean that we should include it in the model. For example, if the structure really is AR(1) the X(t) is correlated with X(t-1) which is correlated with X(t-2) so we expect X(t) to be correlated with X(t-2) even though the real structure is AR(1). That’s a pretty easily solvable problem but it just gets beyond the scope pretty quickly.

Joey, you are right about autocorrelations of dependent variable. The reading is talking about autocorrelations of residuals. They use AR(1) and then test whether assumption of no serial correlation of residual is violated or not. There are two questions worth discussing: how to specify a model (whether it’s AR(1), AR(2) or any other kind of regression) and then how to test whether it’s specified properly or misspecified. In the example on page 369 they discuss the second question.

Oops. Got my threads confused.

It sounds like I need to correctly state my question How are we able to determine/assume that a model is properly specified and there is no serial correlation by only testing the first four autocorrelations of the residual in a model with a large number of observations? Is this case the same for a model with a small number of observations?

Lisa Marie Wrote: ------------------------------------------------------- > It sounds like I need to correctly state my > question > > How are we able to determine/assume that a model > is properly specified and there is no serial > correlation by only testing the first four > autocorrelations of the residual in a model with a > large number of observations? Is this case the > same for a model with a small number of > observations? I don’t disagree with you, Lisa Marie. Four autocorrelations of the residuals are not always enough, especially when a model has a large number of observations.

I don’t know about that - if it’s not seasonal and it’s not Markov, what is it?

I believe that the four autocorrelations of the residual represent the covariance of the error term of the four most recent observations. That being said if you have 300 observations of quarterly data the four most recent observations should be sufficient. However, if your observations consist of monthly observations you would miss eight observations which occurred during the most recent year. Therefore, the amount of observations would be insufficient.

There’s this really odd disconnect going on here. Forget about the difference between regression residuals and fitting time series models - if the errors are an error process these are the same with an extra complication of fitting a regression model in the other. Everywhere in finance (with a few exceptions), we live with Markov models, i.e., where something is going depends only on where it is, not where it’s been. Then we get to time series methods and everyone wants to do regression models having twelve lags. I’ve seen lots and lots of regression models in my life and I have never seen one with 12 anythings that I thought was worth the computer ink to print it, much less an AR model. I know the CFA curriculum does this weird thing with seasonal models through autoregressive models. It’s quirky because it’s not really an autoregressive model, it’s a seasonal model but that would probably require them to teach analysis of covariance (ANCOVA) which is not in the syllabus. Somehow it leads people to think that fitting AR(12) models is the way to go. I can’t think of any real-world process that I would say is a stationary AR(T) time series where T is some number greater than about 2 or 3 and 99% of the time T = 1.