My posts here attract an average of between two and three comments each. By far the most extreme is the posting on the “boy born on Tuesday”, which has received over twenty comments. I am very grateful to everyone who has made a comment.
To recapitulate: My informant has two children. What is the probability that both are boys, given that one is a boy born on Tuesday? The answer is 13/27; when I made this posting, I knew this, and that the fact that if the phrase “born on Tuesday” were omitted the answer would be 1/3. This seemed unintuitive to me, and I hoped someone could give me, not a mathematical argument, but a way of thinking about the problem to make it more “obvious”. In the end, I found this for myself (and other people came up with something similar): if one of the children is definitely identified as the “boy born on Tuesday”, say the older one, then the probability is exactly 1/2; the closer the subsidiary information comes to identifying the child concerned, the closer the probability comes to 1/2.
But what surprised me quite a lot was the reaction of many correspondents, who more-or-less rejected the calculation giving 13/27 as the answer, and wanted it replaced by a calculation which depends on the way in which the information was elicited.
Although I have a lot of sympathy with this point of view, I think I would like to post here a defence of orthodoxy (perhaps not a normal role for me, but as I said in another posting, I am doing this to practise my writing skills as well as to inform others). So here is a little tutorial on conditional probability. Perhaps this is not quite what you will find in elementary textbooks. After giving the mathematics, the much more difficult task of matching it to the real world will be attempted.
A probability space consists of a set S of outcomes, called the sample space; a collection of subsets of S called measurable sets or events satisfying certain closure properties; and a function P from the set of measurable sets to the real numbers, satisfying three conditions known as Kolmogorov’s axioms:
- P(A) is non-negative for all events A;
- P is additive over finite or countable disjoint unions.
We read P(A) as “the probability of A“.
Let A and B be events such that B has non-zero probability. The conditional probability of A given B, written P(A|B), is defined by
P(A|B) = P(A & B)/P(B).
Here A & B is the intersection of A and B (read “A and B“). Note that P(A) does not enter into this definition!
Let’s now see where 13/27 comes from. If our informant has two children, then there are 196 combinations of the gender and birthday of the two (14 for each child), which we can arrange in a 14×14 table whose rows correspond to the older child and columns to the younger. We assume that all 196 combinations are equally likely. The table splits into four 7×7 tables according to gender. Now let A be the event that both children are boys, and B the event that at least one is a boy born on Tuesday. Then B consists of one row and column of the table, containing 27 cells, and so with probability 27/196; A & B consists of the part of B within one 7×7 subsquare, containing 13 cells, and so with probability 13/196. The definition now gives P(A|B)=13/27.
Try the following exercise: if B & C has non-zero probability, then
P(A & B|C) = P(A|B & C)P(B|C).
At this point I set the following challenge to students. Let B be any event with non-zero probability. Define a function Q by the rule that Q(A) = P(A|B). Show that Q is also a probability measure – that is, it satisfies Kolmogorov’s axioms. In fact we can regard it as defined on the original measurable sets, or we can re-define the sample space to be B and just consider events contained in B (since any event A disjoint from B will have Q(A) = 0).
This leads us to the Bayesian perspective: if P is the prior distribution of probability, then Q is the posterior distribution, given the knowledge that event B has occurred. In fact this is a bit misleading: the terminology implies some sort of temporal succession which needn’t be the case in fact; we can calculate conditional probabilities based on events which may occur in the future, or even on counterfactual events. The formula works the same way in any case.
Now let us take an even more extreme Bayesian view. All probability is
conditional on what we already know to be the case. If E denotes
everything that we know, then we should replace P(A) by P(A|E) everywhere. This has the effect of replacing P(A|B) by P(A & B|E)/P(B|E). According to the exercise (which I hope you did!), this is equal to P(A|E & B). In other words, we have added B to “everything we know” and re-calculated the probability of A accordingly.
This brings us at last to the interpretation of conditional probability: P(A|B) is the probability that we assign to A given event B. As explained earlier, it is quite tricky to say this without assuming se kind of temporal or logical succession; but it is important not to make such an assumption.
So consider the “boy born on Tuesday”. In the Bayesian view, “everything we know” includes the fact that our informant has two children. Then A is the event that both are boys, and B the event that at least one is a boy born on Tuesday. Our prior evaluation of P(A) is 1/4; but once we have the information B, we re-evaluate it as 13/27.
I think (though without complete confidence) that, if the scenarios which have been proposed which give values other than 13/27 for the probability were analysed, they would not precisely conform to this picture; maybe the fact that the informant has two children is not part of the prior knowledge, or in ascertaining B we actually ascertain a smaller event, which would give a different conditional probability.
I hope that not everyone agrees with what I have said here. It would be a pity to curtail this very enjoyable conversation!
As a postscript, I propose the exercise of replacing the development of probability given here using Kolmogorov’s axioms by one which deals only with conditional probability. (There is a possible foundational problem here!) Maybe someone already did this …