so I’m certainly capable of memorizing the formula but why does this get us the mean-reverting level of a time series?
formula= intercept/ 1- beta
thanks!
so I’m certainly capable of memorizing the formula but why does this get us the mean-reverting level of a time series?
formula= intercept/ 1- beta
thanks!
sorry… just to clarify-
I’m trying to wrap my head around the big-picture issued in regression & time series analysis.
-Heteroskadicity I think I understand- if the independent variable’s value affects the error term then that’s bad…in fact I think I get the concept that anything that affects the error terms makes the analysis less reliable
-I don’t understand why a unit root is an issue when talking about a time series- and I don’t understand the calculation of the finite mean-reverting level as stated in my Q above??
-And also- just real broadly- I think I DO understand the idea behind checking for seasonality…start w/ an AR time series mod, check auto correlations of residuals & if there is a huge correlation at say t-4, then Sales are likely correlated w/ their value 4 quarter ago so we have to correct for this by adding a new term…BUT I do NOT understand if we start w/ a time series model & suspect nonstationarity, we 1st difference it & model the 1st differenced timer series as an auto regressive model? What is the reason for doing this and can someone try and give a quick practical example of this last question?
Thanks!
A covariance stationary series:
Y(t) = b0 + b1Y(t-1) … should be flat, so we can assume:
Y(t) ≈ Y(t-1)
Replacing above:
Y(t) = b0 + b1Y(t)
Y(t) - b1Y(t) = b0
Y(t)*[1 - b1] = b0
Y(t)’ = b0 / (1 - b1)… Y(t)’ in this case would be the mean-reverting level (I would also call it “long-term average”)
Yup, that’s the core of knowing the impact of heteroskedasticity, multicolinearity, misspecification, etc. because all those problems affect errors and prevent them from being normal distributed, which means parameters could also be affected and provide erroneous (biased) results.
The calculation is above. Not much problem understanding it tho.
A unit root in a time-series means that the serie behavior is erratic. Regression analysis intention is to discover significant patterns that turn into useful information for predicting the future. A model can’t stand a divergent time-series unless you got cointegrated time-series (a pair of divergent time-series that behave in the same direction).
Correct.
As said above, if you graph a stationary time-series you will see it is flat along the time. A non-stationary one will behave like an erratic wave. You can’t explain a non-stationary time series using a stationary one, and vice versa. A common solution is diferentiating the non-stationary series, however the interpretation of the results changes with it. You can also do a 2nd differentiation, or 3rd, or 10th (at this level you arrive at the quantum universe, so no useful for normal life anymore). The highest differentiation I have seen in real life was the 2nd, after that level the interpretation of results are cheese tho.
A quick example, (may not apply to all currencies) the exchange rate of a currency is a non-stationary time-series, however, when you make a 1st differentiation in inter diary terms ER(t) - ER(t-1) = ΔER, it turns into a stationary time-series. The economic explanation is that in any single day the ER has a bounded variation (presumably because the economy is in a stable scenario). Now, if you want to predict the GDP using the ER, you know now you can use the ΔER in your model.
Hope this helps!
I’ll admit, I didn’t read your whole post, as it was lengthy! However, I do think something should be clarified about this particular part. Multicollinearity does not, in any shape or form, bias the parameter estimates (for the betas) or other aspects of the model fit (predicted Y, R-squared, MSE…).
absurdly helpful…yes- seriously thank you!
so just to clarify this part- which is the part that helped me the most I think:
“A unit root in a time-series means that the serie behavior is erratic. Regression analysis intention is to discover significant patterns that turn into useful information for predicting the future. A model can’t stand a divergent time-series unless you got cointegrated time-series (a pair of divergent time-series that behave in the same direction).”
Essentially if there is a unit root in front of the Yt-1 term that is what is causing problems because it makes it so the mean is non constant (since our best guess of Yt=Yt-1) & that throws off the variance as well? Whereas if there is a coefficient infront of .5 that basically means were able to interpret the AR time series model?
Thanks again! Super helpful…
Are you really, really sure? When you add an “independent” variable highly correlated with other independent variable of the model you won’t have an increase of valuable information but efficiency is affected. With efficiency I mean higher volatility of the data than it should be, hence variance of errors. The CFA books state that multicolinearity does affect R2 making it high and T-stats low because of increased variance of errors. It also states that the best way is to drop the less useful variable, however I do know this method is not always the best because sometimes you need to increase multicolinearity in order to get a better fit.
I have not my software on hand right now (Eviews), but I will try to make an elaborated simulation of multicollinearity impact on a regression model tonight and see what happens.
absurdly helpful…yes- seriously thank you!
so just to clarify this part- which is the part that helped me the most I think:
“A unit root in a time-series means that the serie behavior is erratic. Regression analysis intention is to discover significant patterns that turn into useful information for predicting the future. A model can’t stand a divergent time-series unless you got cointegrated time-series (a pair of divergent time-series that behave in the same direction).”
Essentially if there is a unit root in front of the Yt-1 term that is what is causing problems because it makes it so the mean is non constant (since our best guess of Yt=Yt-1) & that throws off the variance as well? Whereas if there is a coefficient infront of .5 that basically means were able to interpret the AR time series model?
Thanks again! Super helpful…
The math can say a lot. Suppose a time-series can be described using this AR model: Y(t) = 0.2 + 1.0Y(t-1)
If you replace numbers here: Y(t)’ = b0 / (1 - b1)
You will get infinite or unknown. So that time-series is non-stationary (It has no mean-reverting level or long-term average).
Suppose now the following AR model: Y(t) = 0.2 + 0.25Y(t-1)
Replacing, you will get a mean-reverting level of 0.2667.
Suppose that Y(1) = 2, so Y(2) = 0.2 + 0.25 * 2 = 0.7
Y(3) = 0.2 + 0.25 * 0.7 = 0.375
Y(4) = 0.2938
Y(5) = 0.2734
Y(6) = 0.2684
Y(7) = 0.2671
Y(8) = 0.2668
Y(9) = 0.2667 <<<<<< Mean-Reverting Level achieved.
As we see, a stationary time series will always tend to its long-term average. If you note, the first number I put Y1 = 2 was really far from the reverting mean. You can use any number (85 for example) and it will revert to 0.2667 eventually, it will just last longer to achieve the mean. Also, if you graph those numbers (try making a simulation of 100 calculations), you will see a flat line. Also, impact the series with an exogenous number like 13 or -24, it will revert to 0.2667 eventually.
Now try a simulation with the non-stationary AR model above, it will just explode because it has no long-term average.
Hope this helps!
Are you really, really sure?
I am 1000% sure.
When you add an “independent” variable highly correlated with other independent variable of the model you won’t have an increase of valuable information but efficiency is affected.
The efficiency is only affected for the standard errors of the estimated coefficients that are suffering from multicollinearity. It does not bias the model standard deviation of the error term (nor the F-statistic, nor R-squared).
With efficiency I mean higher volatility of the data than it should be, hence variance of errors.
Efficiency is only lost with respect the variables suffering from multicollinearity and is only related to their respective estimated coefficients. This is easily seen by going through a derivation of the variances for OLS beta coefficients (in MLR). The multicollinearity is factored into the standard error (or variance) of a specific coefficient. “Higher volatility” doesn’t really apply to the data as a whole, it applies to the sampling distribution of the estimated coefficient. Think about it: if we have less unique information about a particular beta this uncertainty is reflected in a larger standard error for its sampling distribution.
The CFA books state that multicolinearity does affect R2 making it high and T-stats low because of increased variance of errors. It also states that the best way is to drop the less useful variable, however I do know this method is not always the best because sometimes you need to increase multicolinearity in order to get a better fit.
I will say that although the CFA curriculum is pretty trustworthy for finance material, it’s far from a statistics textbook. They are misleading in the way that they present that material–they don’t explain it well, and they make it seem very black and white. There is a trade off with efficiency and unbiasedness when you omit a truly important variable because of collinearity. If you have a large enough sample size, it may be wiser to keep both in the model since the large sample size will mitigate efficiency issues while biased estimates won’t be (not accounting for asymptotic unbiasedness due to consistency…). If you don’t want to interpret coefficients and only want to make predictions with the model, you’re probably better off leaving both variables in if they’re both actually important. If you want to interpret estimated coefficients, you might want to drop one of the predictors, but there are many more options available (partialling out, ridge regression, principle components analysis, variable transformations, etc.). There really are tons of options, it just depends what you’re doing and what’s appropriate.
Multicollinearity isn’t causing R-squared to be high. What they’re trying to say is this is one way you can detect possible issues of multicollinearity. In other words, if you get a model with a significant F-test and large R-squared, this looks good (it seems the group of predictors is helpful)! However, low individual t-stats would maybe seem weird with this case, possibly indicating multicollinearity. I’ve written many times on here that high R-squared, significant F-test, and non-significant t-tests only appear paradoxical (maybe you can locate an old post of mine, or I can later when I have time). These results aren’t actually paradoxical when you understand what each of these things tell us (a group of variables vs. adding/removing a single variable after accounting for the others). Again, it’s a pattern that might be helpful to detect multicollinearity. Do note, though, that the t-test deflate (are lowered in magnitude) because of multicollinearity (this is one thing that is caused by MC, via inflation of the variance for the estimated coefficient).
I have not my software on hand right now (Eviews), but I will try to make an elaborated simulation of multicollinearity impact on a regression model tonight and see what happens.
I would say to take a stab at it for the sense of learning. If there is a bias, you can calculate it (i.e. showing that the expected value of beta-hat is not beta, but, in fact, different by some factor/quantity-- this would be an example of calculating the bias in the estimate). Nearly any regression text you can pick up will tell you that OLS is still BLUE (best linear unbiased estimator) in the presence of multicollinearity (barring perfect MC). Others will tell you the model fit (predictions, r-squared) is unaffected by multicollinearity. These are both ways to say the same thing.
Now that you’ve sat through my discussion (admittedly not as thorough as some of my past posts), I’ll direct you to outside sources.
1) Jim Frost works for Minitab, a statistical software/consulting company (among other things, I believe):
You’ve struck me as interested in stats since you started posting on here (you were willing to help people), so I think you’d enjoy this source. If you don’t have time to read, here is a direct quote from a later paragraph in his article after he fixed the MC (induced through an interaction term) by standardization, and he compares the two models as proof for the reader. He says (direct excerpt):
"Compare the Summary of Model statistics between the two models and you’ll notice that S, R-squared, adjusted R-squared, and the others are all identical. Multicollinearity doesn’t affect how well the model fits. In fact, if you want to use the model to make predictions, both models produce identical results for fitted values and prediction intervals! "
I encourage you to read his post, and compare the original and fixed output yourself (he has it on the page). Note that model based statistics are the same between the two models (R-squared, F-test, SD of the regression (SER/RMSE)).
2) Paul Allison is a Fellow of the American Statistical Association (ASA) and has a statistical consulting company (also a professor or maybe a former professor of sociology). He’ll go a little into the issue that it’s not cut and dry how and when to handle MC, but he keeps this discussion pretty short and does mention that model fit and R-squared are unaffected.
http://statisticalhorizons.com/multicollinearity
I’ll leave you with those two, since they’re more accessible than a textbook, but if you’d like, I can dig through a few books to get a page reference for you. Let me know what you think!
if you weren’t convinced up until now, tickersu’s favorite subject is math and regression is his (her) forte… Do not poke the bear!
if you weren’t convinced up until now, tickersu’s favorite subject is math and regression is his (her) forte… Do not poke the bear!
As odd as it may sound to some who don’t like mathematics (S2000 just took an antacid from that statement), I really wish I had a more formal background in math (this, of course, would allow me to go to higher levels of statistics more easily ). But, feel free to poke me, there’s a lot a don’t know (and I believe I’m open about that when it comes up). Specifics regarding some time series stuff (modeling AR(n), ARCH, GARCH…), for example, hasn’t really interested me, but that could be because I haven’t looked at it too much.
The efficiency is only affected for the standard errors of the estimated coefficients that are suffering from multicollinearity. It does not bias the model standard deviation of the error term (nor the F-statistic, nor R-squared).
Ok, makes sense. The error of the model should not bias because it is just the portion not explained by the independent variables, it does not matter whether they are overlapping or not.
Efficiency is only lost with respect the variables suffering from multicollinearity and is only related to their respective estimated coefficients. This is easily seen by going through a derivation of the variances for OLS beta coefficients (in MLR). The multicollinearity is factored into the standard error (or variance) of a specific coefficient. “Higher volatility” doesn’t really apply to the data as a whole, it applies to the sampling distribution of the estimated coefficient. Think about it: if we have less unique information about a particular beta this uncertainty is reflected in a larger standard error for its sampling distribution.
I knew that, perhaps my fault in not specifying I meant “data” as the portion of the data with the issue, in this case the highly correlated variables, that when together in the model, the error variance of coefficients (of those variables) will increase.
I will say that although the CFA curriculum is pretty trustworthy for finance material, it’s far from a statistics textbook. They are misleading in the way that they present that material–they don’t explain it well, and they make it seem very black and white. There is a trade off with efficiency and unbiasedness when you omit a truly important variable because of collinearity. If you have a large enough sample size, it may be wiser to keep both in the model since the large sample size will mitigate efficiency issues while biased estimates won’t be (not accounting for asymptotic unbiasedness due to consistency…). If you don’t want to interpret coefficients and only want to make predictions with the model, you’re probably better off leaving both variables in if they’re both actually important. If you want to interpret estimated coefficients, you might want to drop one of the predictors, but there are many more options available (partialling out, ridge regression, principle components analysis, variable transformations, etc.). There really are tons of options, it just depends what you’re doing and what’s appropriate.
Yup, exactly that, there are ways to handle MC and will depend of the model final use. Good info about the options available btw. “Ridge regression” sounds interesting… lol.
Multicollinearity isn’t causing R-squared to be high. What they’re trying to say is this is one way you can detect possible issues of multicollinearity. In other words, if you get a model with a significant F-test and large R-squared, this looks good (it seems the group of predictors is helpful)! However, low individual t-stats would maybe seem weird with this case, possibly indicating multicollinearity. I’ve written many times on here that high R-squared, significant F-test, and non-significant t-tests only appear paradoxical (maybe you can locate an old post of mine, or I can later when I have time). These results aren’t actually paradoxical when you understand what each of these things tell us (a group of variables vs. adding/removing a single variable after accounting for the others). Again, it’s a pattern that might be helpful to detect multicollinearity. Do note, though, that the t-test deflate (are lowered in magnitude) because of multicollinearity (this is one thing that is caused by MC, via inflation of the variance for the estimated coefficient).
Yes, that is what the CFA book says, those patterns could reveal MC. High R2 because variables seem to do a good explanation, but low T-stats because increased (artificially) variance of errors of coefficients.
Now that you’ve sat through my discussion (admittedly not as thorough as some of my past posts), I’ll direct you to outside sources.
1) Jim Frost works for Minitab, a statistical software/consulting company (among other things, I believe):
You’ve struck me as interested in stats since you started posting on here (you were willing to help people), so I think you’d enjoy this source. If you don’t have time to read, here is a direct quote from a later paragraph in his article after he fixed the MC (induced through an interaction term) by standardization, and he compares the two models as proof for the reader. He says (direct excerpt):
"Compare the Summary of Model statistics between the two models and you’ll notice that S, R-squared, adjusted R-squared, and the others are all identical. Multicollinearity doesn’t affect how well the model fits. In fact, if you want to use the model to make predictions, both models produce identical results for fitted values and prediction intervals! "
Nice info! Good to know. However, if I have to guess a real life scenario, it would be a little bit weird to present a model with high R2, incredible low T-stats (implying none or almost none of the indep variables are good explicators) to your chief or client, but tell them the model is great for prediction! Lol.
I encourage you to read his post, and compare the original and fixed output yourself (he has it on the page). Note that model based statistics are the same between the two models (R-squared, F-test, SD of the regression (SER/RMSE)).
2) Paul Allison is a Fellow of the American Statistical Association (ASA) and has a statistical consulting company (also a professor or maybe a former professor of sociology). He’ll go a little into the issue that it’s not cut and dry how and when to handle MC, but he keeps this discussion pretty short and does mention that model fit and R-squared are unaffected.
http://statisticalhorizons.com/multicollinearity
I’ll leave you with those two, since they’re more accessible than a textbook, but if you’d like, I can dig through a few books to get a page reference for you. Let me know what you think!
Will check this one surely, sounds nice. Indeed, it was the attempt i wanted to do.
Thanks Tickersu for the patience of writing this post.
Nice info! Good to know. However, if I have to guess a real life scenario , it would be a little bit weird to present a model with high R2, incredible low T-stats (implying none or almost none of the indep variables are good explicators) to your chief or client, but tell them the model is great for prediction! Lol.
This happens in real life scenarios. The explanation is simple if you understand a bit of basic statistics. A high R-squared and significant F-test do not contradict low t-stats. The reason is because these statistics are looking at different things. (Remember, looking at all t-tests is not the same as looking at the F-test when you have more than one independent variable; t-tests are for a single coefficient, assuming everything else remains in the model; F-tests are comparing a model with and without a specified group of variables.) I’ll explain further, hopefully for optimal clarity.
R-squared and the F-test allow you to answer questions regarding the independent variables as a group. High R-squared says that the group is good at explaining variation in Y. The significant F-test says that the group is significant for predicting Y (an alternative is to say that at least one of the independent variables in the group is significant for predicting Y).
The t-test answers the question: if we use all the other variables currently in the model, is this specific one significant in predicting Y?
Take a simple case where X1 and X2 are used to predict Y and X1 and X2 are highly collinear. Additionally, the group does a good job at explaining variation in Y (as evidenced by a high r-squared). A significant F-test says that the group of X1 and X2 is significant for predicting Y (or similarly, at least one variable from the group of X1 & X2 is significant).
However, suppose the t-tests are both nonsignificant. This doesn’t contradict the high R-squared and the large F-stat. The nonsignificant t-test for X1 says: if we have X2 in the model, then X1 is not significant for predicting Y. Similarly, the nonsignificant t-test for X2 says: if we have X1 in the model, then X2 is not significant.
This should be clear now, why the results are not actually contradictory when you dig deeper. It might look contradictory, but it isn’t. It happens in real world research and consulting, and if someone really doesn’t believe you that it’s not a problem, prove it to them by reducing MC and showing that nothing changes (or just show them Jim Frost’s Minitab article so he can prove it for you)!
By the way, you could have multicollinearity with a low R-squared. This brings me back to my point that, assuming they did make this claim, the CFA Institute is wrong to say that multicollinearity (in itself) increases either of R-squared or the F-test.
Will check this one surely, sounds nice. Indeed, it was the attempt i wanted to do.
Thanks Tickersu for the patience of writing this post.
Glad to have the discussion. I actually had a similar discussion with the CFA Institute a while ago because some of their online questions contained the same inaccuracies. The question writer insisted I was wrong for some time, without acknowledging my sources or reasoning, while not providing any of his own sources or reasoning. Eventually, the QM curriculum author was involved, and he agreed with my stance (because it’s clear as day either once you get some examples that aren’t very technical, or when you have the theoretical background).