Correlation and Independence

Hello All,

Let’s say that I take a blood sample for blood sugar concentration at different times from men on Mars and women on Venus. Here is the array for Men:

2.5 5.5 6.5 3.5 3 3.5 6 5 4 4.5 5 5.5 3.5 6 6.5 3 8 6.5 8 6 6 3 7 8 4 3 2.5 8 4.5 5.5 7.5 6 9 6.5

And here’s the one for Women:

7 3 6 4.5 3.5 4 3 3 3.5 4.5 7 5 5 7.5 2.5 5 5.5 5.5 5 4 5 6.5 6.5 7 3.5 5 3.5 9 2.5 8.5 3.5 4.5 3.5 4.5

All 34 samples were provided by 34 Men and 34 Women. ( I am pasting these data so that we can analyze them in Excel). This said, can we assume that the two RVs are independent? Practically, I believe that they are independent no one individual provided two observations and men on Mars are unrelated to women on Venus. However, if I use the identity to calculate correlation, I find that the correlation is 0.166. This said, should I consider these data independent for hypothesis testing ?

I am a bit confused. I would appreciate any help.

Thanks

If the samples are drawn from normal distributions, the statistic for testing whether the population correlation is (null hypothesis) equal to zero (covered in Level II) is:

t = [r × √(n – 2)] / √(1 – _r_²)

This statistic follows a t distribution with n – 2 degrees of freedom.

When r = 0.166 and n = 34, t = 0.9523. As this is less than virtually any critical t value (which are generally around 2), we fail to reject the null hypothesis and conclude that the population correlation could be zero, so the samples could be independent.

Thank you so much, S2000magician!

My pleasure.

Hello S2000magician,

I have a follow-up question on your approach. I think if the two random samples are independent, then we know that correlation is zero. However, correlation = 0 doesn’t establish Independence. Right? Given this, how do I decide about the independence of the two samples? Should I use Chi-Square test? I am a bit confused. I would appreciate your help.

Thanks in advance,

Allalongthewatchtower

That’s an excellent question, to which I haven’t an answer. ρ = 0 is a necessary condition, but I don’t know if it’s a sufficient condition.

Let me pursue that one.

If X and Y are independent, then they are also uncorrelated. However, if X and Y are uncorrelated, then they can still be dependent. To see two extreme examples of this, let X be uniformly distributed on the interval [−1, 1]. If X ≤ 0, then Y = −X , while if X is positive, then Y = X. The same is true for y=x^2 on the interval of [-1, 1] for x. A zero co-variance implies that no linear correlation exists between the two variabes, but that does not mean that they are also independent. Correlation is a statistic that does not imply causation.