Correlation and Independence

allalongthewatchtowe · July 18, 2014, 8:59pm

Hello All,

Let’s say that I take a blood sample for blood sugar concentration at different times from men on Mars and women on Venus. Here is the array for Men:

2.5 5.5 6.5 3.5 3 3.5 6 5 4 4.5 5 5.5 3.5 6 6.5 3 8 6.5 8 6 6 3 7 8 4 3 2.5 8 4.5 5.5 7.5 6 9 6.5

And here’s the one for Women:

7 3 6 4.5 3.5 4 3 3 3.5 4.5 7 5 5 7.5 2.5 5 5.5 5.5 5 4 5 6.5 6.5 7 3.5 5 3.5 9 2.5 8.5 3.5 4.5 3.5 4.5

All 34 samples were provided by 34 Men and 34 Women. ( I am pasting these data so that we can analyze them in Excel). This said, can we assume that the two RVs are independent? Practically, I believe that they are independent no one individual provided two observations and men on Mars are unrelated to women on Venus. However, if I use the identity to calculate correlation, I find that the correlation is 0.166. This said, should I consider these data independent for hypothesis testing ?

I am a bit confused. I would appreciate any help.

Thanks

S2000magician · July 18, 2014, 9:43pm

If the samples are drawn from normal distributions, the statistic for testing whether the population correlation is (null hypothesis) equal to zero (covered in Level II) is:

t = [r × √(n – 2)] / √(1 – _r_²)

This statistic follows a t distribution with n – 2 degrees of freedom.

When r = 0.166 and n = 34, t = 0.9523. As this is less than virtually any critical t value (which are generally around 2), we fail to reject the null hypothesis and conclude that the population correlation could be zero, so the samples could be independent.

allalongthewatchtowe · July 19, 2014, 7:29am

Thank you so much, S2000magician!

S2000magician · July 19, 2014, 6:02pm

My pleasure.

allalongthewatchtowe · July 22, 2014, 1:41pm

Hello S2000magician,

I have a follow-up question on your approach. I think if the two random samples are independent, then we know that correlation is zero. However, correlation = 0 doesn’t establish Independence. Right? Given this, how do I decide about the independence of the two samples? Should I use Chi-Square test? I am a bit confused. I would appreciate your help.

Thanks in advance,

Allalongthewatchtower

S2000magician · July 22, 2014, 2:07pm

That’s an excellent question, to which I haven’t an answer. ρ = 0 is a necessary condition, but I don’t know if it’s a sufficient condition.

Let me pursue that one.

MrSmart · July 22, 2014, 3:02pm

If X and Y are independent, then they are also uncorrelated. However, if X and Y are uncorrelated, then they can still be dependent. To see two extreme examples of this, let X be uniformly distributed on the interval [−1, 1]. If X ≤ 0, then Y = −X , while if X is positive, then Y = X. The same is true for y=x^2 on the interval of [-1, 1] for x. A zero co-variance implies that no linear correlation exists between the two variabes, but that does not mean that they are also independent. Correlation is a statistic that does not imply causation.