Can you build a confidence interval with only binary data ?

BlueTrin · June 13, 2015, 1:51pm

If I know nothing about a process and have only a series with n results of True False. Can I build a 95% probability confidence interval for the true probability of a True, using some kind of assumptions ?

tickersu · June 13, 2015, 5:28pm

Assuming I understood your question: yes, you can construct a confidence interval for the population proportion of “True” events out of the total.

Here’s a general idea to get you started:

https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

Hope this helps!

ink · June 14, 2015, 3:18pm

yes, you can. for instance, when you do a logistic regression for your binary data, you get this thing called odds ratio after exponentiating your coefficient (see more in Level II) ; For instance, you could say that men are 4 [95% CI: 2-6] times more likely to reads Sports magazine relative to women.

-the 4 represents the estimate from the odds ratio

the 95% CI: indicates that the estimate will fall between 2-6 in about 95% of the cases if you used the same number of subjects in order to perform your study

tickersu · June 15, 2015, 1:28am

ink:

yes, you can. for instance, when you do a logistic regression for your binary data, you get this thing called odds ratio after exponentiating your coefficient (see more in Level II) ; The (exponentiated) slope coefficient in a logistic regression is also interpreted as a multiplicative change in the odds, given a one unit increase in the IV.

For instance, you could say that men are 4 [95% CI: 2-6] times more likely to reads Sports magazine relative to women.

-the 4 represents the estimate from the odds ratio

the 95% CI: indicates that the estimate will fall between 2-6 in about 95% of the cases if you used the same number of subjects in order to perform your study This isn’t a proper interpretation of a confidence interval. The interval isn’t allowing us to discern where the _ estimate _ is likely to be, it is letting us know about the population parameter and it’s probable location. If you wanted to practically interpret it, you would say that there is a 95% level of confidence that the true odds ratio for (whatever it is you’re measuring) will fall between 2 and 6. The confidence level, 95%, refers to the long run number of intervals that would contain the true odds ratio. In other words, if you conducted infinite samples and calculated all possible intervals for the parameter of interest, 95% of those intervals would be “correct” in that they actually capture the true value of the parameter (nothing to do with the estimate) somewhere within the upper and lower bounds of the interval.

An odds ratio isn’t a probability, though. It’s a ratio of odds, and odds are a ratio of probabilities. I wasn’t sure if the OP wanted to model the probability-- I assumed he had only one variable and wanted a CI for the true proportion of the population with that characteristic.

Edit: completely glossed over a few points before, and I mixed up a few points as a result. Now, the bold has my replies, and I fixed my mistakes…the end of a long day.

ink · June 15, 2015, 2:02pm

tickersu:

ink:

yes, you can. for instance, when you do a logistic regression for your binary data, you get this thing called odds ratio after exponentiating your coefficient (see more in Level II) ; The (exponentiated) slope coefficient in a logistic regression is also interpreted as a multiplicative change in the odds, given a one unit increase in the IV.

For instance, you could say that men are 4 [95% CI: 2-6] times more likely to reads Sports magazine relative to women.

-the 4 represents the estimate from the odds ratio

the 95% CI: indicates that the estimate will fall between 2-6 in about 95% of the cases if you used the same number of subjects in order to perform your study This isn’t a proper interpretation of a confidence interval. The interval isn’t allowing us to discern where the _ estimate _ is likely to be, it is letting us know about the population parameter and it’s probable location. If you wanted to practically interpret it, you would say that there is a 95% level of confidence that the true odds ratio for (whatever it is you’re measuring) will fall between 2 and 6. The confidence level, 95%, refers to the long run number of intervals that would contain the true odds ratio. In other words, if you conducted infinite samples and calculated all possible intervals for the parameter of interest, 95% of those intervals would be “correct” in that they actually capture the true value of the parameter (nothing to do with the estimate) somewhere within the upper and lower bounds of the interval.

An odds ratio isn’t a probability, though. It’s a ratio of odds, and odds are a ratio of probabilities. I wasn’t sure if the OP wanted to model the probability-- I assumed he had only one variable and wanted a CI for the true proportion of the population with that characteristic.

Edit: completely glossed over a few points before, and I mixed up a few points as a result. Now, the bold has my replies, and I fixed my mistakes…the end of a long day.

Your additions about the CI are correct; the 95% CI will tell us that we expect 95% of the interval estimates to include the true population parameter, if we carried an infinetly many studies. Or another way I’ve seen it often formulated: you are 95% confident that the true mean is between lower-upper interval

tickersu · June 15, 2015, 3:43pm

ink:

tickersu:

ink:

yes, you can. for instance, when you do a logistic regression for your binary data, you get this thing called odds ratio after exponentiating your coefficient (see more in Level II) ; The (exponentiated) slope coefficient in a logistic regression is also interpreted as a multiplicative change in the odds, given a one unit increase in the IV.

For instance, you could say that men are 4 [95% CI: 2-6] times more likely to reads Sports magazine relative to women.

-the 4 represents the estimate from the odds ratio

the 95% CI: indicates that the estimate will fall between 2-6 in about 95% of the cases if you used the same number of subjects in order to perform your study This isn’t a proper interpretation of a confidence interval. The interval isn’t allowing us to discern where the _ estimate _ is likely to be, it is letting us know about the population parameter and it’s probable location. If you wanted to practically interpret it, you would say that there is a 95% level of confidence that the true odds ratio for (whatever it is you’re measuring) will fall between 2 and 6. The confidence level, 95%, refers to the long run number of intervals that would contain the true odds ratio. In other words, if you conducted infinite samples and calculated all possible intervals for the parameter of interest, 95% of those intervals would be “correct” in that they actually capture the true value of the parameter (nothing to do with the estimate) somewhere within the upper and lower bounds of the interval.

An odds ratio isn’t a probability, though. It’s a ratio of odds, and odds are a ratio of probabilities. I wasn’t sure if the OP wanted to model the probability-- I assumed he had only one variable and wanted a CI for the true proportion of the population with that characteristic.

Edit: completely glossed over a few points before, and I mixed up a few points as a result. Now, the bold has my replies, and I fixed my mistakes…the end of a long day.

Your additions about the CI are correct; the 95% CI will tell us that we expect 95% of the interval estimates to include the true population parameter, if we carried an infinetly many studies. Or another way I’ve seen it often formulated: you are 95% confident that the true mean is between lower-upper interval

That’s essentially what I meant by “there is a 95% level of confidence that the true odds ratio for (whatever it is you’re measuring) will fall between 2 and 6.” We are X% confident that the true parameter value will fall between lower-upperbound (units of measure if applicable)-- almost identical to yours!