 # correlation

“correlation gives an INDICATION of the strength of the linear relationship and does not provide info regarding nonlinear relationships that may exist between the variables.” can someone explain this? sounds contradictory. if there is a linear relationship, how can there be nonlinear relationships…

sounds to me like that text is not suggesting both relationships exist simultaneously, rather it is saying that it gives an indication of linear relationships. however, if the relationship is nonlinear, it will not help.

Y = a + b* x + c*Exp(d*x) has a linear part and a non-linear part

i thought correlation between A and B returns was rhoAB = COV AB/ (std devA * std devB)

Over a certain sample size, variables with a relationship like y = A*e^(bx) may appear to be linearly correlated, but are not.

Another way of looking at it: you have a sample are are wondering whether there is a linear or non-linear relationship between the variables. Sample correlation will tell you if there is a linear relationship and will also suggest its strength. It won’t help you with any inferences about the non-linear relationship that may be there. So a low correlation means there is no linear relationship but will not mean there is no non-linear.

If you have some kind of linear relationship, like Y = aX + b , you can read the correlation coefficient as a measure of how much “noise” there is around the best fit line. A higher magnitude (read absolute value) of correlation coefficient means that there is less noise - less stuff other than the mathematical relationship affecting what Y is. It is only an INDICATION of the strength of the relationship, because you can have a high correlation whether a=0.001 or a=1000, and many people would read the value “a” as a measure of strength. I tend to think of it as the CONSISTENCY of the relationship (how consistently X can be used to predict Y), rather than the STRENGTH of the relationship (which I read as the magnitude of “a”). Now consider the relationship Y = 1 - (X^2). This is also a very definite relationship. It’s an inverted parabola that passes through (-1,0), (0,1), and (1,0). If you put these points into a linear regression or correlation test using any range -A to +A, you’ll get a correlation coefficient of 0, suggesting that there is NO RELATIONSHIP, even though we have defined a very specific relationship and didn’t even bother to add some noise. Why did this happen? Because the correlation coefficient assumes that the relationship between two variables is linear, and this relationship is non-linear. If you plot this out on graph paper and try to fit a best fit line through some range, it should be pretty obvious why this would happen. Any nonlinearities are going to be modeled as “noise” unless you do something specific to filter out the nonlinearities (which is sometimes but not always possible). Now, if you try to regresss or correlate Y on (X^2), you’ll discover that that relationship has a coefficient of 1, because you’ve transformed X in a way that “undoes” the nonlinearity.

(oops, it would have a correlation coefficient of -1, because the parabola is inverted)

good explanations above. adding a bigger picture view in case it helps further the understanding. please correct me if i am wrong/unclear anywhere. two variables (RVs) can have a deterministic relationship (exact formula) or probabilistic relationship (exact formula unknown, but joint sample from joint population distribution can be used to estimate the best fit formula). we focus on the probabilistic case, we want a summary measure of co-movement to define the relationship. relationship could be linear, or non-linear or combo (as joey’s example shows) for linear relationship, pearson’s correlation (commonly referred to as ‘correlation’ only) will suffice. for non-linear relationship, pearson’s correlation will give only partial info on co-movement. other measures of non-linear or ordinal correlation will be needed to give fuller picture. for a complete view of co-movement for two RVs, you need to look at copulas, which also address conditional co-movement (how co-movement may change when variables are above/below certain values). another perspective: for one factor linear regression: sqrt of r^2 is r, the correlation (measure of linear co-movement with that one independent variable) for multiple linear regression: sqrt of R^2 is R, the ‘multiple’ correlation (measure of linear co-movement with the predicted equation of betas-loaded independent variables). the ‘partial’ correlations (ceteris paribus linear co-movement measure) with each independent variable can be extracted from the beta). this is probably way more than you asked for, but i find it useful to view how it all fits together, to make sense of it all intuitively.

grover33 Wrote: ------------------------------------------------------- > sounds to me like that text is not suggesting both > relationships exist simultaneously, rather it is > saying that it gives an indication of linear > relationships. however, if the relationship is > nonlinear, it will not help. thanks. you were correct. they clarify this in the chapter summary.