Confusion about binomial and hypergeometric - (statistics)

hellomello · March 14, 2012, 4:52am

I have a question. Let’s say only 15 out of 50 job candidates can get the job at a company. (30%) Would an individual prefer sampling WITH replacement or sampling WITHOUT replacement, if they wanted to sample 5 job candidates from a pool of 50 candidates, HOPING that the 5 be one of the 15 that got the job??? I thought about it, and wrote out all the binomial and hyper-geometric probability distributions, but somewhere in the middle, the probabilities conflict so I am having trouble. *SIGH* Please help!!!

maratikus · March 14, 2012, 5:12pm

Conceptually it would make sense that you would be better off without replacement because the odds go up. Indeed, probability of having at least one successful candidate out of 5 with replacement is 1-(7/10)^5=0.8319 (approximately). Without replacement, it would be (C(15,1)*C(35,4)+C(15,2)*C(35,3)+C(15,3)*C(35,2)+C(15,4)*C(35,1)+C(15,5))/C(50,5)=0.8950. C(n,k)=n!/(k!*(n-k)!))

bchad · March 14, 2012, 6:11pm

What would be the point of selecting candidates with replacement?

“I see we’ve got John Smith coming in for an interview, but you can only talk to him for 25 minutes, because we’ve got John Smith coming in after that…”

hellomello · March 14, 2012, 7:33pm

the odds from the hypergeometric (sampling without replacement) go up only when you calculate the probability of having at least one candidate out of 5 , and even 2 candidates out of 5; but then turns the other way and goes down when you reach 3 out of 5, 4 out of 5, and 5 out 5 when comparing the probabilties from the binomial (sampling with replacement). why does that happen? that’s why i was confused.

aaronhotchner · March 14, 2012, 8:18pm

With replacement:

0 successes: (5 choose 0) * 0.7^5 = 0.16807

1 success: (5 choose 1) * 0.3 * 0.7^4 = 0.36015

2 successes: (5 choose 2) * 0.3^2 * 0.7^3 = 0.30870

3 successes: (5 choose 3) * 0.3^3 * 0.7^2 = 0.13230

4 successes: (5 choose 4) * 0.3^4 * 0.7 = 0.02835

5 successes: (5 choose 5) * 0.3^5 = 0.00243

Without replacement:

0 successes: (15 choose 0) * (35 choose 5) / (50 choose 5) = 0.15322

1 success: (15 choose 1) * (35 choose 4) / (50 choose 5) = 0.37069

2 successes: (15 choose 2) * (35 choose 3) / (50 choose 5) = 0.32435

3 successes: (15 choose 3) * (35 choose 2) / (50 choose 5) = 0.12778

4 successes: (15 choose 4) * (35 choose 1) / (50 choose 5) = 0.02255

5 successes: (15 choose 5) * (35 choose 0) / (50 choose 5) = 0.00142

If you want to maximize the probability that “at least one of the 5 selected is part of the 15 accepted,” then you want the distribution with the smallest probability of 0 successes, and that’s hypergeometric.

If you want to maximize the probability of all 5 being part of the 15, you would definitely want “with replacement” because those 15 do not go away as you continue to draw candidates.

It depends on what you’re trying to do, basically.

maratikus · March 14, 2012, 9:22pm

The above result is very intuitive. No formulas are really needed but they are helpful to confirm the result.