a companion discussion area for blog.codinghorror.com

An Initiate of the Bayesian Conspiracy


It’s been a while since I did statistics, and I only did mediocre, but I’m going to give it a try. I didn’t read everything above, it was just too much, but none of the solutions I did read stated what my calulations did. Of course this means there is a higher likelihood I am wrong, but who gives a smeg. This is the intarweb.

I did use google, but I knew what I had to do: find P(B|A) while knowing P(A|B), P(B) and P(A) and didn’t remember the formula. If I had my formula book available, I would have used it so I don’t consider this cheating.

On to my calculations.

Chance of having breast cancer
P© = 1%
Chance of recieving positive result, having breast cancer
P(T|C) = 80% (read as T given B)
Change of recieving a false positive
P(T|^C) = 9,6%
Chance of recieveing positive result, both having and not
P(T) = P(T|C) + P(T|^C) = 89,6%

Question stated: A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

So what is probability of cancer, given a positive result? : P(C|T)

Bayes theorem says P(B|A) = P(A|B)(P(B)/P(A)), so for my notation i’m using P(C|T) = P(T|C)(P©/P(T)) which gives me

P(C|T) = 80%*(1%/89,6%) = 0,89%

Yes. Having been given a positive mammogram result there is a 0,89% chance that one actually does have breast cancer.

It sounds unintuitive to me, but both my statistics book said so at the time and Jeff said so himself in this article.


That’s what I get for posting the second before I have to run out the door and not having the time to think. Of course it cannot be 0,89%, less than the stated chance of having cancer! I retract my solution and am officially embarassed.


The easy way to remember Bayes is




Probability that any randomly selected woman will test positive:
P(+) = P(+|bc)P(bc)+P(+|~bc)P(~bc) = .8*.01+.096*.99 = .10304

Now apply Bayes Rule:
P(bc|+) = P(+|bc)P(bc)/P(+) = .8*.01/.10304 = .07764



Don’t get so upset, and practice your reading comprehension. 80% is the probability that she got a positive result, given that she has cancer. We are interested in the reverse: the probability that she has cancer, given that she got a positive result. The two are not the same.


I took a swing at this and emailed my results to a good friend who is a medical doctor and statistician who does research work on the statistical effectiveness of doing test screenings. She kindly sent me back a spread sheet showing a 2x2 grid plugged in with sample numbers. Boy was I ever far off the mark.
I suggest anyone here who has trouble with this to email any friend who happens to be a medical doctor / statistician / researcher.


Hmm, seems I’ve done this slightly wrong. But landed in the right ball park.

1% of the screened women have cancer
80% of those gets a positive test
So of all tested women 0.8% will have a positive test and have cancer
On top of this 9.6% of the women without cancer will get a positive test
so 0.8+9.6= 10.4% of all tested women will get a positive result
of these 0.8/10.5 = 7.7% will have cancer


By the way, isn’t this about the Base-Rate Fallacy?



I’m not sure I would give an answer 7.76 percent to any woman. Saying “You probably don’t have cancer” wouldn’t go over that well. I would recommend more (different) tests or a redo on mammography test.

I think it is interesting that the number is showing how reliable the test is. At only 7.76 probability, a mammography alone is not a reliable test for breast cancel in of itself. You would want a much more reliable test or correlating data before issueing any diagnosis. I believe that this is what the Bayes Theorum is showing here.


I managed to reach the correct (I think) answer without Google, the comments, or any knowledge of the Bayes’ Theorem using the following method:

0.8% of women get positive mammographies and have breast cancer

9.696969696969697% of women get positive mammographies and don’t have breast cancer

0.8 + 9.696969696969697 = 10.496969696969697

Therefore, 10.496969696969697% of women get positive mammographies

As 0.8% of women get positive mammographies and have breast cancer, we simply need to calculate what percentage 0.8 is of 10.496969696969697

Therefore, there is a 7.621247113163973% chance that a woman with a positive mammography has breast cancer.


I realize that this is a bit late. I went through the process of applying Bayes Theorem and got 0.776 true positives, which I believe is the right answer.

However, I want to find the true negatives, and I’ve done it wrong.

P(~cancer) = 1-P(cancer) = .99 (99%)
P(~positive) = 1 - P(true positive) - P(false positive) = 1 - P(positive|Cancer) =  1 - .8  - .096 = .104
P(~positive|~cancer) = P(~cancer)*P(~positive) = .99*.104 = .103

As a check, the situation is true negative, false positive, true positive, true negative, or
P(~positive|~cancer) + P(positive|~Cancer) + P(positive|Cancer) + P(~positive|Cancer) = 1.0

P(~positive|Cancer) = 1 - .8 = .2  (From the problem statement, P(positive|Cancer) = .8 )

.103 + .096 + .8 + .2 = 1.198

So clearly I am doing something wrong. I just don’t know what.

Many thanks.



Guess you figured this out already, but thought I will post this anyway.

If you want to find the true negative, I believe you have to pose the problem in the following manner:

P(~Cancer| ~positive) = P(~Cancer, ~positive) / (P(~Cancer, ~positive) + P(Cancer, ~positive))

=> P(~Cancer| ~positive) = (P(~positive|~Cancer) * P(~Cancer)) / (P(~positive|~Cancer) * P(~Cancer) + P(~positive|Cancer) * P(Cancer))

=> P(~Cancer| ~positive) = (894.96/990 * 990/1000) / (894.96/990 * 990/1000 + 2/10 * 10/1000)

=> P(~Cancer| ~positive) = 0.89496 / (0.89496 + 0.002) = 0.9978

[assuming the total number of women aged 40 years or above in the universal set is 1000]