Regression question

naturallight · May 14, 2010, 7:21pm

I’m trying to calc the RSQ between two variables, but they have both risen over time, so I’m worried that my RSQ now is artificially high. Do I need to adjust for this? If so, how? PS: Neither variable is prices, so I can’t back out inflation.

jbaldyga · May 14, 2010, 7:27pm

regress the growth rates instead.

naturallight · May 14, 2010, 8:09pm

Yeah, so y/o/y growth rates? I tried that. One of my variables is highly seasonal, so I used the y/o/y growth rate on the moving 12-mo average. Does this sound ok? The new way takes my RSQ from 0.9ish to 0.2ish. Don’t like that.

DoubleDip · May 14, 2010, 8:17pm

Maybe should incorporate seasonality effects using indicator variables, or lagged differences. using moving 12 month average will just smear the seasonal effect, explaining your low explanatory power.

jbaldyga · May 14, 2010, 8:19pm

yty growth rates shouldn’t be affected by seasonality i don’t think as long as your measurement periods are annual (i.e. December over December are the same season). so i don’t think you need to do a moving average, unless there’s other noise in the data you’re using. lower RSQ is expected when regressing growth rates. as long as it’s a significant variable and uncorrelated with the error term it’s not a problem. but you won’t be able to predict Y very well, just the marginal relationship between X and Y. if you want to predict Y better, you need more significant variables. econometrics is all BS anyways so…

DarienHacker · May 14, 2010, 8:26pm

As JB says: correlation of time series requires the series be stationary: constant mean and variance over time. Trending series need to be adjusted (such as by turning into growth rates, or sometimes even simple first differencing them will suffice.) another general approach is to normalize the series. e.g. if it’s Sales, then can you instead look at sales/store.

bchad · May 14, 2010, 8:40pm

You could also add a trend variable if you are just doing a quick-and-dirty analysis. Generally first differencing (or log-first-differencing) is best, but just adding a variable as a time index can help control for common-but-unrelated effects. Your R^2 will still be high, but your estimates of the effects of your independent variables will be improved.

Mobius_Strip · May 14, 2010, 9:00pm

naturallight Wrote: ------------------------------------------------------- > I’m trying to calc the RSQ between two variables, > but they have both risen over time [STOP!] 99.999% R-squared for a non-stationary time series ain’t worth $hit…

naturallight · May 17, 2010, 7:49pm

Ok, thanks for the comments. Very helpful. Regarding the moving average, I think my explanatory variables are better used on a month-to-month basis (because they don’t exhibit seasonality). So I wanted to use the 12-mo moving average on my predicted variable so I could use the m/o/m % change (not y/o/y) on both the explanatory and predicted. Do you think that’s ok? Is a trend variable the same as a dummy variable?

Muddahudda · May 18, 2010, 9:35am

What is your job naturallight? I don’t ever get into this level of nitty gritty in mine. Wonder who would need to use this.

bchad · May 18, 2010, 1:53pm

A trend variable is just a variable that is proportional to T. The idea is that including it will correlates with any (linear) trends in the data, leaving other changes to be explained by other variables. It does not reduce your R^2 because the computer thinks the trend variable is part of your model. However, it is more of an instrumental variable than an explanatory variable, but it has its uses (particularly if the trend is linear or can be linearized). If your explanatory variable is not seasonal, and you want to use month-on-month changes to explain your dependent variable, then you do have to include seasonal variables in the model. Otherwise your regression coefficient standard errors are going to be artificially high, leading you to a higher chance of false negative conclusions.