## Almost all

This is the first in a series of posts about the area where descriptive set theory, theory of relations, and infinite permutation groups meet. We begin with the question: how to prove that transcendental numbers (those which satisfy no polynomial equation with integer coefficients) exist?

The nineteenth century saw three completely different proofs of the existence of transcendental numbers.

First was Liouville, who proved a theorem about approximation of irrationals by rationals:

If a is the root of a polynomial of degree n with integer coefficients, then a cannot be approximated to any order higher than n.

(This means that, for any m>n and any positive constant c, there are only finitely many rational numbers p/q such that |ap/q|<c/qm.) Now consider Liouville’s number, whose decimal expansion has ones in positions 1!, 2!, 3!, …, and zeros in all other positions. Any rational nummber obtained by truncating after the one in position m!, for any m>n, is an approximation to order greater than n; so Liouville’s number is transcendental.

Liouville’s number is transcendental, but probably its only claim to fame is that it was the first number shown to be transcendental. More interesting to mathematicians were the proofs by Hermite and Lindemann that the familiar numbers e and π are transcendental. The proofs are not straightforward; and, indeed, transcendence theory is one of the most difficult parts of modern number theory. (Lindemann’s result put paid to any hope of squaring the circle with ruler and compass – there’s another story!)

Finally, along came Cantor, and announced,

Almost all real numbers are transcendental.

The purpose of this post is to look at the meaning of the phrase “almost all”, and consider its use in existence proofs of this sort.

Warning to constructivists: Please stop reading now! You are not going to like this!

Cantor observed that the number of algebraic numbers is the same as the number of natural numbers; that is, the algebraic numbers can be written in an infinite sequence (a1a2,…) whose terms are indexed by natural numbers. To see this, we define the height of a polynomial over the integers to be the sum of its degree and the absolute values of its coefficients. There are only finitely many polynomials of given height, having only finitely many roots; so our list consists of all roots of polynomials of height 1 (there are none), 2 (just zero), and so on; if an algebraic number has occurred earlier in the list, we do not repeat it.

On the other hand, the real or complex numbers cannot be so enumerated. From this it is easy to see, that if we remove a denumerable sequence (such as the algebraic numbers), what is left still cannot be enumerated, so in particular is non-empty. That is, Cantor has proved that transcendental numbers exist, even though he has not exhibited a single one.

The general set-up would go something like this. We have a family of subsets of a given set X, the so-called “small sets”, with the properties

• the union of a countable collection of small sets is small;
• X is not small.

We regard the complements of small sets as “large”. Now, if we have a countable collection of properties, such that the elements satisfying any one of them form a small set, it follows that there is an element satisfying none of the given properties.

Cantor takes X to be an uncountable set such as the real numbers, and the “small” sets to be the finite or countable subsets. Clearly both conditions hold. There are two other convenient families of small sets, both of which allow some small sets to be uncountable; these give more powerful existence proofs, as we will see.

Remaining with the real numbers for the time being, the two familes of small sets are

• the null sets with respect to Lebesgue measure (those sets which can be covered by a countable union of intervals with total length at most ε, for any ε>0);
• the meagre sets in the topology, those which are contained in a countable union of closed sets with empty interior.

In each case, the two required properties of the small sets follow from basic properties of the real numbers found in the early twentieth century, part of “descriptive set theory”. Both classes include the countable sets, so the methods generalise Cantor’s.

Curiously, these two families, though very different to a practising mathematician, are abstractly the same. Sierpiński showed that there is a permutation of the real numbers which maps null sets to meagre sets; Erdős showed that we can take this permutation to be an involution interchanging the two families. So, as families of sets, they are isomorphic. This subject is discussed in Oxtoby’s book Measure and Category. (Here “measure” refers to Lebesgue measure, “category” to Baire category, commemorating two celebrated French descriptive set theorists.)

These notions of small sets can be widely generalised.

In any measure space, in particular in any probability space, the null sets form a family of small sets in our earlier sense. Indeed, the second property, that the union of countably many null sets is null, is a simple consequence of Kolmogorov’s axioms for probability.

On the other hand, in any metric space, the interior of a set A is the set of points having a neighbourhood contained in A; a set is open if it is equal to its interior; a set is closed if its complement is open; and a set is meagre if it is contained in a countable union of closed sets with empty interior. It is clear that a countable union of meagre sets is meagre; and the Baire category theorem states:

A complete metric space is not meagre.

So the meagre sets in any complete metric space satisfy our requirements for a family of small sets.

It will be convenient to re-phrase this in terms of the complements, or “large” sets, the so-called co-meagre sets. A set is co-meagre if and only if it contains a countable intersection of open dense sets (where a set is dense if every neighbourhood of every point in the space meets it).

I am now going to concentrate on a specific example. Let X be the set of all infinite sequences of zeros and ones. We define the appropriate structures on X as follows:

• as a probability space, we regard elements of X as recording infinitely many tosses of a fair coin;
• as a metric space, two sequences are close if they agree on a long initial segment (for example, if the first disagreement is in position n, we take the distance to be 1/n).

It is a simple exercise to show that the metric space is complete. In fact, it satisfies a strengthening of the triangle inequality known as the hypermetric inequality:

d(x,z) ≤ max{d(x,y),d(y,z)}.

Now a subset A of X is open if and only if it is finitely determined, that is, every sequence in A has a finite prefix all of whose continuations belong to A; and a set A is dense if and only if it is always reachable, that is, every finite sequence has a continuation belonging to A. Thus, any countable intersection of sets which are finitely determined and always reachable is non-empty.

Note that all the topology has disappeared; the conditions are purely combinatorial! It is an exercise (not very difficult) to prove the Baire category theorem for this sequence space.

For measure, we have in fact nothing new. Regarding an infinite binary sequence as the base 2 “decimal” expansion of a real number, we have a bijection between our sequence space X and the unit interval, apart from a null set (the rationals with 2-power denominators have two represerntations); the bijection transforms the coin-tossing measure to Lebesgue measure on the interval. However, the topology of the two metric spaces is quite different. We can’t expect to make infinitely many cuts in the unit interval without changing its topology drastically!

Time for an example. We regard a property which holds for a large set as being a “typical” or “generic” property of elements of X, that is, almost all elements of X have this property. Here is a case where measure and category give different views of what the typical element looks like:

• The Law of Large Numbers asserts that amost all binary sequences (in the sense of measure, that is, the complement of a null set) have limiting density 1/2.
• On the other hand, almost all sequences (in the sense of category) have the property that they do not have a density; indeed, the lim inf of the density of the prefixes is 0 and the lim sup is 1. (The proof is an exercise: show that, for any n, the set of sequences having a prefix of length at least n with density less than 1/n is obviously finitely determined, and is always reachable by simply appending enough zeros to the given finite sequence.)

But there are many cases where they agree. For example, let us call a sequence universal if it contains every finite sequence of zeros and ones as a consecutive subsequence. It is easy to see that universal sequences exist (simply concatenate all finite sequences); but almost as easy to show that universality is a “large” property, so that almost every sequence (in either sense) is universal.

Let’s run through the argument for category to see how easy it is. (Typically, in this game, category is easier than measure.) Since there are only countably many finite sequences, it suffices to show that the set of infinite sequences containing a given finite sequence s is finitely determined and always reachable. The first is clear; if s does occur in a given sequence, then just choose a prefix containing s. The second is even simpler: given any prefix, append s to it, and every continuation will contain s.

Here is another example, which I will be returning to. Take a countably infinite set, say the natural numbers, and write down all the 2-element subsets in a sequence. Then there is a natural bijective correspondence between graphs on this vertex set, and our sequence space X: the nth pair of vertices is joined by an edge if the nth term of the sequence is 1, and not joined it it is 0. In the probabilistic model, we are choosing a random graph, by tossing a fair coin once for each pair of vertices to decide whether to put an edge or not. What do almost all graphs look like?

It turns out that there is a graph R, called the Rado graph or countable random graph, so that almost all graphs are isomorphic to R.

As promised, more on R later!

I will end this post with a generalisation of the binary sequence space.

Let T be a countable rooted tree. Its nodes come on levels indexed by the natural numbers (including zero). The root is the unique node on level 0; each node other than the root has a unique predecessor on the preceding level, and at least one but at most countably many successors on the following level. A branch of T is an infunite path, starting at the root and containing one node on each level. (So if a branch contains a node, it must contain its predecessor). Let X(T) be the set of all branches of T.

If each node has two successors, we have the infinite binary tree; labelling them by 0 and 1, we have a bijection between the set of branches and the set of infinite binary sequences.

If the number of successors of any node is finite, then we may define a probability measure on X in a simple way. For any node v, the measure of the set of branches containing v is the reciprocal of the product of the numbers of successors of nodes on the path from the root to v. We then extend to all measurable sets in the standard way. More intuitively, we take a random walk where, arriving at a node, we then proceed to one of its successors, each being equally likely.

(There are other measures which could be used; we will look at some of these later.)

The category model is simpler, and does not require that each node has only finitely many successors. We define the distance between two branches to be 1/n if they first diverge at level n; this makes X a complete metric space, so the Baire category theorem applies. Just as in the sequence case, it is purely combinatorial.

A set A of branches is open if and only if it is finitely determined, that is, for any branch x in A, there is a node v on x such that all branches containing v lie in A. And A is dense if and only if it is finitely determined, that is, any node lies on a branch in A. So our large sets are again those which contain countable intersections of finitely determined, always reachable sets.

We will see examples later in the series. 