Probability is a mathematical subject where the application of the results to the real world remains controversial, and different schools of thought flourish.
Littlewood, in his Miscellany, discusses this, and comes firmly to the conclusion that probability theory can say nothing about the real world, and that a pure mathematician should “wash his hands of applications”, or if talking to someone who wants them, should say something along the lines of “try this; it is not my businesss to justify [it]”. However, later in the book, in the section on large numbers, he is happy to estimate the odds against a celluloid mouse surviving in Hell for a week. (Probably he would have argued that this is not the “real world”.)
That said, almost all mathematicians regard Kolmogorov’s axioms as the basis for probability theory. Littlewood says that “the aspiring reader finds he is expected to know the theory of the Lebesgue integral. He is sometimes shocked at this, but it is entirely natural.”
I had a project student this year who wrote about developing probability theory for non-classical logics such as intuitionistic or paraconsistent logic. I certainly learned a great deal about non-classical logic from supervising the project, for which I am very grateful to her; I might say more about this later. What I have to say below has nothing to do with her project, but the idea was sparked by a section in one of the papers she unearthed in her literature search. This is, perhaps, my small contribution to the foundations of probability theory.
Kolmogorov’s axioms say, in brief, that probability theory is measure theory on a space with total measure 1. In more detail, the ingredients are (Ω,F,p), where
- Ω is a set, which probabilists might call the sample space, but I rather like the philosophers’ term for this, the set of possible worlds;
- F is a σ-algebra of subsets of Ω that is, F is non-empty, contains Ω, and is closed under complementation and countable unions;
- p is a function from F to the real numbers satisfying
- p(S) is non-negative for all S in F;
- p(Ω) = 1;
- p is additive over countable disjoint unions; that is, if the sets S_{i} are pairwise disjoint for i∈N, then p(∪S_{i}) = Σp(S_{i}).
The paper in question is “Kolmogorov’s axiomatization and its discontents”, by Aidan Lyon, in the Oxford Handbook of Probability and Philosophy. Section 4 of the paper takes Kolmogorov’s axioms to task because they cannot model a lottery with a countably infinite number of tickets, which is fair in the sense that each ticket has the same chance of winning.
Right from the start, this struck me as odd. One of the themes of mathematical philosophy is that mathematicians are not careful enough in their work, and accept arguments without sufficient scrutiny. The proportion of people who support some form of constructivism is probably much higher among philosophers of mathematics than among mathematicians. (Typically, they reject proof by contradiction: a proof of a theorem must be a construction which demonstrates the theorem, it is not enough to show that assuming that the theorem is false leads to a contradiction.)
But here, it seems to be the other way around. I cannot imagine any construction which would actually implement a fair lottery with a countably infinite number of tickets. To point the contradiction, Kolmogorov’s axioms have no difficulty in handling the result of tossing a fair coin countably many times. This generates the set of all countable 0-1 sequences, interpreting 1 as “heads” and 0 as “tails”. Now the measure on this set is the standard product measure induced from the measure on {0,1} where each of the sets {0} and {1} has probability 1/2.
Here is the argument that a fair countable lottery is not possible. The sample space is the countable set of tickets. (Said otherwise, there is a possible world in which any ticket is the winning ticket.) We want all these outcomes to have the same probability p. Now the sample space Ω is the union of the singleton sets {T_{i}} for i∈N, which are pairwise disjoint; so, by countable additivity, if we add up p a countable number of times, the result should be 1. But this is not possible, since if p = 0, then the sum is 0, whereas if p > 0, then the result is infinite.
(The first part of this argument generalises to the fact that a countable union of null sets (sets with probability 0) is a null set.)
The difference between the two situations, it seems to me, is that I can imagine a mechanism that allows a coin to be tossed infinitely often. If we are going to have infinite processes in mathematics, this seems a fairly harmless one. But I am unable to visualise a fair method of picking one lottery ticket from a countable set. One could try the following: For each ticket, toss a fair coin infinitely often to generate the base 2 expansion of a real number in the unit interval; then the winner is the ticket whose number is the largest. Trouble is, there may not be a largest; the set of chosen numbers has a supremum but possibly no maximum.
Another way of describing the same process is like this. Take a fair coin, and toss it once for each ticket. Any ticket which gets tails is eliminated, those which get heads remain. Repeat this process until a single ticket remains, continuing into the transfinite if necessary (but there is no guarantee that this will happen!)
Littlewood gives an argument which shows that, if we really want a countable fair lottery, we have to give up either finite additivity or the property that the probability of the whole space Ω is 1. Take the following countable pack of cards: each card has consecutive natural numbers written on its two sides, and the number of cards bearing the numbers n and n+1 is 10^{n}. There are two players, Alice and Bob. A card is chosen randomly from the pack and held up between them, so that each can see one side, and they are invited to bet at evens that the number on their side of the card is smaller than the number on the other side. If Alice sees the positive integer n, it is ten times as likely that the number on the other side is n+1 than that it is n−1. So she should bet. Bob reasons in the same way. So each has the expectation of winning.
Littlewood describes this as a “paradox”; I think it is rather more than that.
Lyon claims that it is not necessary to be able to imagine a mechanism in order for the objection to Kolmogorov’s axioms to be valid. I don’t think I agree.
Anyway, Lyon describes a possible solution of using non-standard real numbers as probabilities, so that the probability of picking a ticket in the lottery is a suitable infinitesimal with the property that adding it countably many times gives 1. But he rightly remarks that there are problems with this solution: lack of uniqueness of non-standard models and inability to name the elements.
When I read this, it struck me that there was a potentially nicer answer to the question: use John Conway’s surreal numbers as probabilities. These numbers are explicitly defined and named; morever there is a unique surreal number 1/ω with the property that adding it to itself ω times will give 1.
Conway’s number system comes out of his general framework for game theory, desribed in his book On Numbers and Games. The first account of it was the mathematical novel Surreal Numbers by Don Knuth; he describes two backpackers who unearth an ancient tablet bearing the signature “J.H.W.H. Conway” and giving the rules for the construction of Conway’s number system, and their elation and frustration as they decipher the tablet and try to prove the assertions on it.
Has anybody seen a treatment of probability using Conway’s surreal numbers? It seems to me that in order to have such a treatment, one would presumably need to modify the notion of “null set”, to address the fact that the union of countably many “null sets”, such as the lottery tickets with probability 1/ω, may not be null; there may be difficulties lurking here. The problem of inventing a mechanism for the lottery still remains, but maybe closer inspection of the way the surreal numbers are generated would suggest a method.
Actually, the sting in the tail of this story is that there are some absolutely classical mathematical results which assign “probabilities” to sets of natural numbers. For two famous examples, which incidentally lead to the same value,
- the probability that a natural number is squarefree is 6/π^{2};
- the probability that two natural numbers are coprime is 6/π^{2}.
This uses a completely different notion of probability. The statements in question really assert that the proportion of natural numbers up to n which are squarefree (or the proportion of pairs which are coprime) tends to the limit 6/π^{2} as n→∞.