## Conditional probability

My posts here attract an average of between two and three comments each. By far the most extreme is the posting on the “boy born on Tuesday”, which has received over twenty comments. I am very grateful to everyone who has made a comment.

To recapitulate: My informant has two children. What is the probability that both are boys, given that one is a boy born on Tuesday? The answer is 13/27; when I made this posting, I knew this, and that the fact that if the phrase “born on Tuesday” were omitted the answer would be 1/3. This seemed unintuitive to me, and I hoped someone could give me, not a mathematical argument, but a way of thinking about the problem to make it more “obvious”. In the end, I found this for myself (and other people came up with something similar): if one of the children is definitely identified as the “boy born on Tuesday”, say the older one, then the probability is exactly 1/2; the closer the subsidiary information comes to identifying the child concerned, the closer the probability comes to 1/2.

But what surprised me quite a lot was the reaction of many correspondents, who more-or-less rejected the calculation giving 13/27 as the answer, and wanted it replaced by a calculation which depends on the way in which the information was elicited.

Although I have a lot of sympathy with this point of view, I think I would like to post here a defence of orthodoxy (perhaps not a normal role for me, but as I said in another posting, I am doing this to practise my writing skills as well as to inform others). So here is a little tutorial on conditional probability. Perhaps this is not quite what you will find in elementary textbooks. After giving the mathematics, the much more difficult task of matching it to the real world will be attempted.

A probability space consists of a set S of outcomes, called the sample space; a collection of subsets of S called measurable sets or events satisfying certain closure properties; and a function P from the set of measurable sets to the real numbers, satisfying three conditions known as Kolmogorov’s axioms:

• P(A) is non-negative for all events A;
• P(S)=1;
• P is additive over finite or countable disjoint unions.

We read P(A) as “the probability of A“.

Let A and B be events such that B has non-zero probability. The conditional probability of A given B, written P(A|B), is defined by

P(A|B) = P(A & B)/P(B).

Here A & B is the intersection of A and B (read “A and B“). Note that P(A) does not enter into this definition!

Let’s now see where 13/27 comes from. If our informant has two children, then there are 196 combinations of the gender and birthday of the two (14 for each child), which we can arrange in a 14×14 table whose rows correspond to the older child and columns to the younger. We assume that all 196 combinations are equally likely. The table splits into four 7×7 tables according to gender. Now let A be the event that both children are boys, and B the event that at least one is a boy born on Tuesday. Then B consists of one row and column of the table, containing 27 cells, and so with probability 27/196; A & B consists of the part of B within one 7×7 subsquare, containing 13 cells, and so with probability 13/196. The definition now gives P(A|B)=13/27.

Try the following exercise: if B & C has non-zero probability, then

P(A & B|C) = P(A|B & C)P(B|C).

At this point I set the following challenge to students. Let B be any event with non-zero probability. Define a function Q by the rule that Q(A) = P(A|B). Show that Q is also a probability measure – that is, it satisfies Kolmogorov’s axioms. In fact we can regard it as defined on the original measurable sets, or we can re-define the sample space to be B and just consider events contained in B (since any event A disjoint from B will have Q(A) = 0).

This leads us to the Bayesian perspective: if P is the prior distribution of probability, then Q is the posterior distribution, given the knowledge that event B has occurred. In fact this is a bit misleading: the terminology implies some sort of temporal succession which needn’t be the case in fact; we can calculate conditional probabilities based on events which may occur in the future, or even on counterfactual events. The formula works the same way in any case.

Now let us take an even more extreme Bayesian view. All probability is
conditional on what we already know to be the case. If E denotes
everything that we know, then we should replace P(A) by P(A|E) everywhere. This has the effect of replacing P(A|B) by P(A & B|E)/P(B|E). According to the exercise (which I hope you did!), this is equal to P(A|E & B). In other words, we have added B to “everything we know” and re-calculated the probability of A accordingly.

This brings us at last to the interpretation of conditional probability: P(A|B) is the probability that we assign to A given event B. As explained earlier, it is quite tricky to say this without assuming se kind of temporal or logical succession; but it is important not to make such an assumption.

So consider the “boy born on Tuesday”. In the Bayesian view, “everything we know” includes the fact that our informant has two children. Then A is the event that both are boys, and B the event that at least one is a boy born on Tuesday. Our prior evaluation of P(A) is 1/4; but once we have the information B, we re-evaluate it as 13/27.

I think (though without complete confidence) that, if the scenarios which have been proposed which give values other than 13/27 for the probability were analysed, they would not precisely conform to this picture; maybe the fact that the informant has two children is not part of the prior knowledge, or in ascertaining B we actually ascertain a smaller event, which would give a different conditional probability.

I hope that not everyone agrees with what I have said here. It would be a pity to curtail this very enjoyable conversation!

As a postscript, I propose the exercise of replacing the development of probability given here using Kolmogorov’s axioms by one which deals only with conditional probability. (There is a possible foundational problem here!) Maybe someone already did this …

I count all the things that need to be counted.
This entry was posted in exposition. Bookmark the permalink.

### 15 Responses to Conditional probability

1. Sean Carmody says:

Peter,

While conditional probability is a convenient framework for talking about this problem, it is a bit of a red herring when it comes to the debate as to the “correct” approach to the Tuesday’s Child problem. It is not about a Kolmogorov vs as conditional formalism for probability. The problem is how to apply probability to the scenario of the puzzle.

The real issue is the extent to which you are prepared to expand your set of outcomes, S. Imagine a student was initially asked the simpler problem where our informant simply says ‘I have at least two children, one of whom is a boy’. The student then decides S consists of BB, GB, BG, GG (with equal probability), takes the information to exclude GG and concludes that the probability of two boys is BB. So far, so good (in traditional sort of way). Now the informant adds ‘and that boy is born on Tuesday’. The informant still considers S to consist of the four B/G combinations and insists that the probability of two boys is still 1/3. Oh no, you would protest, we need to expand S to include day of birth: B-Mon/B-Mon,B-Mon/G-Mon, etc. Proceeding in that way gets you 13/27 as you describe. But why would you insist on expanding S to allow for days of the week but be happy not to include S to allow for the possibility that the informant could equally say ‘I have a girl born on Wednesday’? That is the issue and it arises even if you never perform any Bayesian calculations.

Another useful perspective was suggested to me by a reader of my own post on Tuesday’s Child. He modified the set-up only very slightly. Your informant arrives with a boy and says ‘This is one of my two children and he was born on Tuesday’. Now what is the probability that the man has two boys? At face value, the answer would seem to be the same as in your original scenario. However, this time the path to an answer other than 13/27 is a little easier because it is more obvious how to expand S (it doubles in size to 392). We assume that the man was equally likely to bring either of his children, so now S includes entries that include whether the child came along or not: B-Mon-Yes/B-Mon-No, B-Mon-No/B-Mon-Yes, B-Mon-Yes/G-Mon-No, B-Mon-No/G-Mon-Yes. We now focus on all of those pairs which include B-Tues-Yes (there are 28) and look at the proportion of those which consist of two boys and we get 1/2! No Bayesian tricks up my sleeve there: just simple Kolmogorovian counting.

2. Bob Walters says:

I agree with Sean’s comment, in particular about Wednesday’s girl. I would like to add to the conversation but I have visitors and a conference coming up, so it will have to wait.

3. Ted Jones says:

I agree the issue is not about the definition of conditional probability, but instead about what the problem means when translated into mathematics. There is no question that there is a certain conditional probability, P(A|B), whose numerical value is 13/27. We all agree on this. The point of disagreement is whether the only possible interpretation of the question is that it is asking for the value of P(A|B).

In your original post, you said: “I don’t have any way of making this answer seem obvious, or even plausible.” Today, in your perhaps halfhearted role of defender of orthodoxy, it seems that not only is 13/27 plausible, any other answer is inconceivable. The paradox (what does Tuesday have to do with anything?) is not resolved, merely denied. You said you were originally “flummoxed” by the problem. Why was that? The orthodox answer seems to be pure ignorance, with perhaps a hint that you might need a little remedial work on conditional probability. I don’t find this orthodox non-resolution of the paradox very satisfactory.

For that matter, what exactly is the question? In the original post, it was:

[somebody says to you] “I have two children, and at least one of them is a boy born on Tuesday.”

In this one it is:

My informant has two children. What is the probability that both are boys, given that one is a boy born on Tuesday?

These are, trivially, two different English sentences. Is it entirely obvious that they should both result in the same mathematical formulation?

The boys and girls are perhaps part of the problem. How about this? Suppose there is a genetic locus on chromosome I with two alleles, A and a. A is dominant, with a phenotype that can be determined by a simple lab test (something like the Rh factor). Assume equal frequencies for A and a, Hardy-Weinberg equilibrium, random mating, etc., or in other words statistics like those of the boys and girls. Three questions:

1. I have two copies of chromosome I. One of them has the allele A. What is the probability the other also has A?

2. A lab test reveals that I have the A phenotype. What is the probability I have the AA genotype?

3. A laboratory procedure isolates one copy of chromosome I from one of my cells, with either of the two copies being equally likely to be selected. Further testing reveals that the isolated chromosome has allele A. What is the probability that I have the AA genotype?

(For greater precision, replace “I” with “a randomly selected individual”). Before any lab test is performed, I know I have two copies of chromosome I. For both questions 2 and 3, I have gained additional information which might well be described in words as discovering that I have at least one A allele. Yet questions 2 and 3 have different answers. Is the precise meaning of question 1 really obvious?

4. Sean Carmody says:

Ted’s example highlights the problem well. Everyone agrees that if we are very precise about the specification process (as in Ted’s questions 1 and 2) we can calculate probabilities in an uncontroversial way. The challenge comes when we are presented with a situation in which the specification process is not made clear. Faced with such a problem, there would appear to be two forms of response: (a) throw your hands up and say that without a precise specification, the problem cannot be solved or (b) make the best of the situation and make some “reasonable” assumption about the specification process. The orthodoxy makes one such assumption (there was no scope for any statement other than ‘at least one of them is a boy born on Tuesday’ being made, the heterodoxy makes alternative assumptions, usually that any gender or day of the week could just as easily have been used in the statement, restricted only by the actual gender and birthdates of our informants children. Of course, different assumptions will result in different probabilities. While we can prove various results in probability, we cannot prove the reasonableness of these various assumptions. Nevertheless to say, I have come around to the view that the most common heterodox assumption is better than the orthodoxy.

5. Ted Jones says:

Another thought crossed my mind while reading Sean’s last comment. What do you think of the “prisoner” paradox, which shares the setup of the Monty Hall problem: there are three prisoners, A, B, and C, and each of them knows that two of them are to be executed the next day, but not which two. Prisoner A asks a guard to tell him if he will be executed. The guard replies that, although he knows who will be executed, he is not allowed to answer that question. However, he is moved by pity to say, “I guess I can tell you this: B will be executed.”

A can reason that, before the guard’s statement, his probability of being spared was 1/3, but afterward, it has increased to 1/2, so his situation is improved. I don’t think very many people would accept this conclusion; and in fact, translating the Monty Hall reasoning to this formulation leads to the conclusion that A’s probability has not changed for the better (but C’s has).

Assuming correctness of the standard Monty Hall answer (which I accept, although via a more complicated route than “orthodoxy”), A’s computation based on {A,B,C} –> {A,C} must be incorrect. The question is: how do you distinguish A’s reasoning from the orthodox boy/girl computation based on {BB,BG,GB,GG} –> {BB,BG,GB}? What principle tells us when it’s valid to drop an excluded point from the sample space and continue in the obvious way, and when it is not?

6. Sam McCandlish says:

Phrase the problem like this: A person comes up to you and says, “I have two children. At least one of them is a boy born on Tuesday. What is the probability that the other is a boy?”

To choose the answer to give, ask them, “Would you have tried to confuse me with a question like this whether or not you had a boy born on Tuesday?”

If they would have, then the answer is just 1/2. If they wouldn’t have, then the answer is 13/27.

(Actually, maybe their decision of whether to ask you a question was based only on whether they had a child born on Tuesday, or only on whether they had a boy. Then you would get a different probability.)

7. John Faben says:

>So consider the “boy born on Tuesday”. In the Bayesian view, “everything we know” includes the fact that our informant has two children. Then A is the event that both are boys, and B the event that at least one is a boy born on Tuesday. Our prior evaluation of P(A) is 1/4; but once we have the information B, we re-evaluate it as 13/27.

I think this is where we disagree. I would count ‘everything we know’ as, well.. everything we know, including the norms of conversation, and why people say things they say. Then A is “our informant has two boys”, and B is the event “the informant walks up to me and says ‘I have two children, one of whom is a boy born on a Tuesday’”.

Then, based on the fact that people are no more likely to make bizarre statements like this about boys born on Tuesdays than about girls born on Mondays, I would get the answer 1/2.

Note that, as several commentators have pointed out, the 13/27 answer is unambiguously correct if you instead pose the question as follows: you walk up to a randomly selected member of the population and ask them the question “Is it the case that you have exactly two children and one of them is a boy born on a Tuesday?” and they answer “yes” – what is the probability that both of their children are boys?

Intuitively there’s a big difference between these scenarios – and I think it’s relatively easy to formalise why this is the case. Sean sums up the issue pretty well – I don’t really think there’s really any substantive disagreement. If we knew how the statement was generated, we would all agree on the correct probability. The only issue is what to do if we don’t.

Incidentally, I would reiterate one point from the previous discussion: I think the main reason that we find the 13/27 answer for the Tuesday problem counter-intuitive is precisely because of the issues that we’ve been discussing here. Real people just don’t decide what to say by generating statements at random and saying them if they’re true (essentially the assumption that the ‘orthodox’ method makes).

The ‘one is a boy born on a Tuesday’ feels irrelevant because a person could make a similar statement no matter what the genders and birthdays of their children… and we intuitively assume that they would have

8. Jonathan Kirby says:

It seems to me that the debate is a common one when the probability of real-world events, or “possibly real-world” events is discussed. In this case, your informant either has two boys or he does not, so perhaps the most correct answer is that the probability he has two boys is either 0 or 1, but we don’t know which!

However, this is clearly a useless answer, and not the point of the question. To make sense of the question we have to understand that real-world events do not intrinsically have meaningful probabilities. Rather, the probability as a number comes from a mathematical model of the event. The model will depend on what information we have. Another answer to the question might be that the probability depends on what we know!

Peter’s answer of 13/27 is the correct answer in his mathematical model. And indeed it seems to me that his is the only reasonable mathematical model given only the information he put forward.

One could argue that it is unrealistic to have exactly this information and not also the information about how the information was acquired. One could also point to some statistical evidence that the four combinations BB, BG, GB, GG are not all equally likely. With these or other extra pieces of information you could build another model for which the correct answer would be different. But if one starts with the premise that the question was a mathematical problem with a correct answer, Peter’s seems to be the only correct answer.

It seems counterintuitive that the answer is 13/27 and not 1/3, surely (and at least for me) because we process the problem without the Tuesday clause first for which the answer is indeed 1/3, and then do not see immediately why this extra information modifies the answer. Perhaps we do assume that the information was given just as the answer to the question “which day of the week was your eldest son born on?” and do not immediately see that this is a different question.

9. Apologies everyone, for not joining the discussion before. I was away for a week, and then tied up with exam boards for two weeks; I had no time for anything other than deleting spam on the blog during this time.

Also, I am sorry that this is rather long.

Probability is perhaps unique among 19th and 20th century axiomatic mathematical theories in its combination of simplicity of the axioms and applicability to the real world. Using this theory, actuaries stay in business; with a little help from complex numbers and quantum theory, physicists calculate the magnetic moment of the electron, agreeing with experiment to ten decimal places. As I’ve often said, I have no idea why it is so applicable. But I am sure that getting the applications right requires clear thinking, rather than new theories.

The main purpose of my post on conditional probability was exposition. The talking heads in this discussion may not be typical of people reading the post; in fact, my other expository posts attract many more readers than comments.

My experience is that this kind of presentation can help students think clearly. The two commonest mistakes in applying probability to the real world are

• (a) assuming that ignorance of the probabilities of outcomes means that they are equally likely (I have quoted Einstein on this), or that ignorance of the relationship between events means that they are independent;
• (b) not carefully specifying the event considered, or in the case of conditional probability, the pair of events.

It is the second of these which is relevant here.

A fair coin is tossed. The probability that it shows heads is 1/2.

In the absence of other information, this statement is accurate. But one should make two caveats:

• (a) As Diaconis, Holmes and Montgomery showed, a coin has a small bias towards landing the same way up as it started. So if we knew that it had been heads up before the toss, we might want to revise the estimate to 0.51 or so.
• (b) If the person tossing the coin is a skilled magician, and she is tossing to decide which of us pays for dinner, I might revise the probability upwards considerably.

But there is a much more serious objection, which I believe is relevant to many suggested approaches to the Tuesday boy problem. I will characterise it in extreme form first.

You toss a fair coin 100 times, and tell me the results. I choose at random a position from the set of those where the coin showed heads. What is the probability that the coin showed heads on that toss?

Of course this is silly; but what has gone wrong here is that some a posteriori information has got mixed up in the event considered.

A less absurd version is:

You toss a fair coin 100 times, recording the results; you tell me that the coin showed heads 56 times. I choose a position at random from the entire set. What is the probability that the coin showed heads on that toss?

In both examples, extra information is available. Unlike the proceeding ones, it is not a lack of fairness in the coin, but failure to include all the available information, which makes the answer 1/2 incorrect.

Essentially, it is a different problem. In the case of the Monty Hall problem, the answer would be different if, instead of the standard assumption, we were told that Monty had no information and had made a random guess which just happened to be correct. Similarly here.

If someone gives me information with the intention of misleading (as happens not infrequently), and I suspect that it is so, I should put this knowledge into my calculation of the probabilities. But a flat statement does not contain intention, it is just a statement.

Einstein also said,

Subtle is the Lord, but malicious He is not.

Whether he would have liked it or not, this is what we assume when we apply probability theory to the real world.

10. seancarmody says:

Peter,
Where does this leave your thinking on conditional probability and Tuesday’s child? Do you still think that the arguments for an answer other than 13/27 relate to a Bayesian approach, or do you think they are tied up in an error of one of the types you mention in your comment?
Sean.

• Sean: Your blog name is well chosen; you are not going to let me get off the spot!

My reason for the original post was because the answer 13/27 felt wrong to me somehow. I think I have resolved that (at least to my own satisfaction).

But I do owe you more of an answer than that. Once I am no longer struggling just to keep up with stuff coming in (I am acting head of department this week, for example), I will try to write out a considered opinion about this. I did print out all the comments as a preliminary to doing this, but then things hotted up again… sorry!

11. seancarmody says:

Peter, you’re not the first to experience my stubbornness! I can also be patient, though and understand the time constraints you are under so I will happily wait for further discussions. It is, after all, a fascinating topic.

12. Bob Walters says:

I am back from the conference and have a chance to comment.
I suppose I am repeating what others have said but perhaps in different words. Like Stubborn Mule I would like a reasoned refutation of my remarks if I am wrong.

I think the answer 13/27 to the question:
Somebody says to you “I have two children, and at least one of them is a boy born on Tuesday.” What is the probability that both children are boys?
is just plain wrong.

I think similarly that the answer 1/3 to the question:
Somebody says to you “I have two children, and at least one of them is a boy.” What is the probability that both children are boys?
is also wrong.

In Tuesdays Child problem the mistaken answer comes from attributing the same weight to one case, namely the case BoyTuesdayBoyTuesday, as to the other cases BoyTuesdayBoyWednesday, GirlTuesdayBoyTuesday, etc etc. But suppose 100 familes have BoyTuesdayBoyTuesday all will say BoyTuesday, whereas for for all other cases 50 will say BoyTuesday and 50 will make different statement.
So BoyTuesdayBoyTuesday has twice the weight and must be counted twice, giving (13+1)/(27+1)=1/2 instead of 13/27.

The same argument applies for the second simpler problem. Again BoyBoy needs to be counted twice giving (1+1)/(3+1)=1/2 instead of 1/3.

• Jonathan Kirby says:

Bob,

You are assuming something like: the information we have was given as a (truthful) answer to the request: tell me the sex of one of your children (and the day of the week on which he/she was born). Under this assumption, the answer 1/2 is correct, and 13/27 and 1/3 are wrong. But what is your justification for your assumption?

Maybe the information came another way, for example as an answer to the question: is one of your children a boy born on a Tuesday? Then the correct answer is 13/27, and 1/2 is wrong.

There are other ways the information could have come. For example: Was either of your children a boy born on a weekday? Yes. Which day? Tuesday. Here we get a different correct answer again.

So the answer changes depending on how the information was obtained. This is important. But there is also a sense in which Peter\s 13/27 is the correct answer, and that is that it makes somehow a minimal assumption on how the information was obtained. The assumption may seem improbable, but it is not quantifiably improbable, i.e. you cannot meaningfully say that it has any given numerical probability.

• seancarmody says:

Jonathan,

You are certainly correct that different assumptions about the protocol for generating the statement will result in different assumptions. The point of difference between your conclusion and Bob’s is the most “natural” or “minimal” assumption to make if we are told nothing about the protocol. The thing that seems unnatural to me about the assumption required to get 13/27 is that it implies that the P(X | Boy-Girl) is the same as P(X | Boy-Boy), where X is the event that Mr Smith makes the statement he does. In the absence of any other information, the first should be lower than the second. At least, that’s the argument I made on my blog post. While I suspect that this puzzle is one of applied probability rather than a limitation of the theory itself, it would be nice to have a theoretical framework to describe the minimalism/naturalism of different assumptions.