Marking exams numbs the brain, so not sure whether the following makes sense … but here goes.

The simplest question in all of probability theory may be, “What is meant by a fair coin?” But the answer is not likely to be simple. On the practical side, a widely reported study by Persi Diaconis, Susan Holmes and Richard Montgomery showed that there is bias in a regular coin toss: the coin is slightly more likely to come down with the same side up as before it was tossed (the bias being of order of magnitude 1%). But on the theoretical side, this question goes to the heart of the question “What is probability?” As I have already demonstrated, I don’t know how to answer this!

So instead I will talk about what you can do with a fair coin, assuming that such a thing exists. The analysis will raise some questions which are certainly not difficult but possibly of some interest. I want to come back later to talk about the situation where the coin can be tossed infinitely often, and this is partly a prelude.

A fair coin is a mythical gadget which has a probability exactly 1/2 of showing heads and 1/2 of showing tails each time it is tossed, and in any sequence of tosses, the results are mutually independent.

Most obviously, if I want to simulate an event with probability 1/2, I simply toss the coin: heads means the event occurs, tails means it doesn’t. If the desired probability is 1/4, I can toss the coin twice, and only “two heads” corresponds to occurrence of the event.

What if the required probability is 1/3? (If you know the answer to this question, please skip the next bit.)

I can toss the coin twice. The four outcomes each have probability 1/4. I assign one of them (say, heads twice) to the event occurring, and two (say, heads and tails, and tails and heads) to the event not occurring; in the remaining case, I reach no conclusion, and have to toss again.

Note, incidentally, that this protocol will also choose among three mutually exclusive events each with probability 1/3. Assign one result to each of the three events; the fourth outcome means “toss again”.

The number of times the experiment must be done before reaching a conclusion is a geometric random variable with probability 3/4; so the expected number of times is 4/3, that is to say, 8/3 tosses.

Can we be more efficient? It might seem at first that, since 15 is “closer” to 16 than 3 is to 4, it would be better to toss the coin four times, assigning five outcomes to the event and ten to its complement, the 16th meaning toss again. The average number of trials is 16/15, or 64/15 coin tosses: much worse!

The way to maximum efficiency is to reassign the tosses. Suppose that “tails, heads” means the event occurs and “tails, tails” that we try again. If the first toss is heads, then the event does not occur, and we can stop. A little calculation shows that the expected number of coin tosses is just 2.

More systematically, write the required probability as a binary “decimal”: 1/3=0.010101… From now on we represent the two outcomes of the coin toss as 1 and 0, instead of heads and tails. We can regard a sequence of coin tosses as generating binary digits of a real number; the event occurs if the number generated is less than 1/3, not if it is greater. So the protocol is as follows:

Toss the coin until the outcome of the toss first differs from the corresponding binary digit of the required probability. If it is smaller, the event occurs; if larger, it doesn’t. If it is equal, proceed to the next toss.

Clearly this method works for any real number between 0 and 1, rational or irrational. The only small problems are that the experiment might never terminate, and that there are numbers with two binary decimal expansions. These together only affect a null set of possible outcomes.

Now the chance of successful conclusion at each stage is 1/2; so the expected number of coin tosses is 2, irrespective of the required probability.

A couple of questions arise. Can we do better than 2 for the expected number of tosses in the case when the probability is a terminating binary decimal (a rational with 2-power denominator)? If so, by how much? After all, for probability 1/2, only one toss is required. This is a simple exercise: if the probability is a terminating binary decimal, we simply truncate the sequence of coin-tosses. Now there is just a simple finite series to sum. I don’t know whether an entirely different protocol could do even better, for arbitrary probabilities.

How can we use our fair coin to select among an arbitrary finite number of possibilities with given probabilities? For example we saw that we can select one of three equally likely events with 8/3 coin tosses on average. Once this problem is solved, we can simulate random walks, Markov chains, etc., with a fair coin.

In general, this could be done by first deciding whether the first event occurs; if it doesn’t, decide whether the second event occurs, using the conditional probability; and so on. This appears to require a number of tosses growing linearly with the number *n* of events: for example, for *n* equally likely events, the expected number is about *n*. On the other hand, simple ideas from information theory show that we need at least a logarithmic number of tosses. How can we get close to this lower bound?

Let *p*_{1},…,*p _{n}* be the probabilities, and let

*q*be the sum of the first

_{k}*k*of these numbers. We assume for simplicity that none of the

*q*s are terminating binary decimals. Use the coin to generate the binary decimal expansion of a random number

*x*in the unit interval; event

*k*occurs if the real number chosen by the coin lies between

*q*

_{k-1}and

*q*. Of course, we stop when the interval into which the number will fall is unambiguously in one of the intervals (

_{k}*q*

_{k-1},

*q*).

_{k}
How many tosses does this require? Let *f*(*n*) be the worst case. We have *f*(1)=0 and *f*(2)=2.

The first coin toss locates *x* in one of the intervals (0,1/2) and (1/2,1). So after rescaling by doubling the length, it reduces the problem of choosing one of *n* events to choosing either one of *k* or one of *n*–*k*+1, for some *k* (one more than the number of *q*s in the appropriate interval). So

*f*(*n*) = 1+max_{k}{(*f*(*k*)+*f*(*n*–*k*+1))/2}.

Now a philosophical remark is necessary. This is not a recurrence relation for *f*(*n*), since this quantity occurs on the right-hand side! In order to legitimize our position, we must stop and observe that the maximum cannot occur for *k*=1 or *k*=*n*. If we then remove these cases, it is a genuine recurrence.

Now it can be solved. We find that *f*(2^{a}+1)=*a*+2, and that *f* increases linearly between these values. To prove this, we note that the target function is concave. So in a proof by induction, we know by the inductive hypothesis that *f* is concave below the point we are looking at, so the maximum occurs in the middle, when *k*=(*n*+1)/2 if *n* is odd, or *n*/2 or (*n*+2)/2 if *n* is even. Then the induction step is straightforward.

So we do indeed have logarithmic growth. But *f*(3)=3, which is still worse than our 8/3 for three equally likely events. So probably there is more to be said!

A curious problem remains. The concavity of *f* shows that, at each stage, the worst case occurs when the numbers of *q*s within each of the intervals (0,1/2) and (1/2,1) are nearly equal. This raises the possibility that we could make the algorithm more efficient by allowing ourselves to rearrange the orders of the events to make the two subintervals more unbalanced. Of course, this is not always possible: with an odd number of equally likely events, the intervals are balanced, and rearrangement changes nothing. (However, allowing this at each stage might permit some small gain.)

Let us ask the opposite question: can we always rearrange to make things as bad as possible?

We can formulate the problem as follows:

We are given *n* intervals which exactly fill the unit interval (ignoring endpoints). We can certainly pack all but one of them into two intervals of length 1/2 (simply discard the interval which crosses the midpoint in some packing). Can we do this in such a way that the number of intervals packed into each of the two pieces are nearly equal (differ by at most one)?

This is not a difficult problem, and has a pretty solution, which I leave you to discover!

Pingback: 2010 in review « Peter Cameron's Blog

I see these n intervals as line segments that if they are put together they make up the unit interval.

The first thing to do is to order them by magnitude. So L1 would be the largest segment with length l1, L2 would be the second largest with length l2 and finally Ln would be the smallest one with length ln.

We first discard L1, the largest one, then split the rest into two groups G1 and G2 and put together or add the segments in each group to form two larger segments S1 and S2, one for each group and as it turns out that both of these intervals have length no greater than 1/2.

Assuming that n is even, G1 = {L2, L4, …, Ln} while G2 = {L3, L5, …., L(n-1)}. We add up the segments in G1 and G2 respectively and form S1 and S2.

The lengths of S1, S2 are going to be:

len(S1) = l2 + l4 + … + l(n-2) + ln

len(S2) = l3 + l5 + … +l(n-1)

Since, by definition, l2 >= l3 and l4 >= l5, …,l(n-2) >= l(n-1) and also because it has an extra term, it follows that len(S1) > len(S2).

If we put together S1, S2 and L1 we get the initial interval, so len(S1) + len(S2) + l1 = 1.

Lets assume that len(S1) > 1/2. Then l1 + len(S2) l1 + len(S2). However,

len(S1) = l2 + l4 + … + ln

l1 + len(S2) = l1 + l3 + … + l(n-1)

Again, by definition, l1 >= l2, l3 >= l4, …, l(n-1) >= ln, so len(S1) <= l1 + len(S2). This implies that len(S1) len(S2), len(S2) < 1/2.

Reasoning this way, we constructed two segments with lengths no greater than 1/2 that include all but one of these intervals or segments.

For an illustration, for n = 6:

0 1/2 1

|————-|——————-|——|—.—–|——————————|–|

L3 L2 L5 L4 L1 L6

0 1/2 1

|——————-|———|–|————.—————–|——|————-|

L2 L4 L6 L1 L5 L3

|————–S1————–|————–L1————-|———S2——-|

A similar argument holds for an odd.

Sorry, have to post this again since there was a crucial typo and the diagrams did not display right.

I see these n intervals as line segments that if they are put together they make up the unit interval.

The first thing to do is to order them by magnitude. So L1 would be the largest segment with length l1, L2 would be the second largest with length l2 and finally Ln would be the smallest one with length ln.

We first discard L1, the largest one, then split the rest into two groups G1 and G2 and put together or add the segments in each group to form two larger segments S1 and S2, one for each group and as it turns out that both of these intervals have length no greater than 1/2.

Assuming that n is even, G1 = {L2, L4, …, Ln} while G2 = {L3, L5, …., L(n-1)}. We add up the segments in G1 and G2 respectively and form S1 and S2.

The lengths of S1, S2 are going to be:

len(S1) = l2 + l4 + … + l(n-2) + ln

len(S2) = l3 + l5 + … +l(n-1)

Since, by definition, l2 >= l3 and l4 >= l5, …,l(n-2) >= l(n-1) and also because it has an extra term, it follows that len(S1) > len(S2).

If we put together S1, S2 and L1 we get the initial interval, so len(S1) + len(S2) + l1 = 1.

Lets assume that len(S1) > 1/2. Then l1 + len(S2) l1 = l2, l3 >= l4, …, l(n-1) >= ln, so len(S1) <= l1 + len(S2). This implies that len(S1) len(S2), len(S2) < 1/2.

Reasoning this way, we constructed two segments with lengths no greater than 1/2 that include all but one of these intervals or segments.

Once more. Hope it works.

I see these n intervals as line segments that if they are put together they make up the unit interval.

The first thing to do is to order them by magnitude. So L1 would be the largest segment with length l1, L2 would be the second largest with length l2 and finally Ln would be the smallest with length ln.

We first discard L1, the largest one, then split the rest into two groups G1 and G2 and put together or add the segments in each group to form two larger segments S1 and S2, one for each group and as it turns out both of these intervals have lengths no greater than 1/2.

Assuming that n is even, G1 = {L2, L4, …, Ln} while G2 = {L3, L5, …., L(n-1)}. We add up the segments in G1 and G2 respectively and form S1 and S2.

The lengths of S1, S2 are going to be:

len(S1) = l2 + l4 + … + l(n-2) + ln

len(S2) = l3 + l5 + … +l(n-1)

Since, by definition, l2 >= l3 and l4 >= l5, …,l(n-2) >= l(n-1) and also because in this case (n even) S1 has an extra term, it follows that len(S1) > len(S2).

If we put together S1, S2 and L1 we get the initial interval, so len(S1) + len(S2) + l1 = 1.

Lets assume that len(S1) > 1/2. Then len(S1) > l1 + len(S2). However,

len(S1) = l2 + l4 + … + ln

l1 + len(S2) = l1 + l3 + … + l(n-1)

Again, by definition, l1 >= l2, l3 >= l4, …, l(n-1) >= ln, so len(S1) <= l1 + len(S2), which is a contradiction. This implies that len(S1) len(S2), len(S2) < 1/2.

Reasoning this way, we constructed two segments with lengths no greater than 1/2 that include all but one of these intervals or segments.

A similar argument holds for n odd.