## Graph limits and random graphs

These ideas are quite new to me, and maybe my exposition of them is a bit incoherent; but I think there are some interesting questions here. An old problem of mine on random graphs with forbidden subgraphs may be illuminated by work involving graph limits and further work about exchangeable measures begun by De Finetti in the 1930s.

Some time ago, I spent a while thinking about random infinite graphs with some restriction; for a simple example, random triangle-free graphs.

There are many models for such a graph. For example, we could add vertices one at a time, and for each new vertex, choose its neighbour set randomly among all the possibilities which don’t violate the restriction. (In the example, we pick a random independent set in the existing graph to be the neighbour set of the new vertex.) Provided the restriction is such that we never get stuck, this defines a probability measure on the set of all graphs on a given countable vertex set, with the property that the restriction holds with probability 1.

Alternatively, list the 2-element subsets of the countable vertex set in a sequence. Now go through the list one pair at a time. Perhaps the restriction forces the pair considered to be a non-edge (or an edge); if not, toss a fair coin to decide. Again, provided we never get stuck, this defines a probability measure as required. Unfortunately, it may define many different measures; there is no obvious reason why different orderings of the pairs should give the same measure. Two obvious test cases are lexicographic and reverse lexicographic order of the pairs.

There are many variants. To my mind, they all have the drawback that they depend on some ordering of the vertex set (or the set of pairs of vertices, or something). I was interested to get a measure which didn’t require a choice of an ordering.

To motivate this, let’s ask: what should be the probability that two vertices are joined by an edge in a random triangle-free graph? In the ordering models, this depends on the choice of ordering. In the model where the ordering of pairs begins {1,2}, {1,3}, {2,3}, …, the probability that {1,2} is an edge is clearly 1/2, and the same for {1,3}; but the probability that {2,3} is an edge is only 3/8 (since if we chose both {1,2} amd {1,3}, then {2,3} would be forbidden).

The slogan “probability is limiting frequency” suggests that the probability that {1,2} (or indeed any other pair) is an edge should be the limiting edge-density in large triangle-free graphs. Here it is convenient for the “large triangle-free graphs” to be labelled, that is, to have vertex set {1,2,…,N}; then the required edge-density is equal to the proportion of these graphs in which {1,2} is an edge. More generally, we define a measure by saying that, if H is a triangle-free graph (with vertex set {1,2,…,n}), then the “frequency of H in N-vertex graphs” should be the proportion of triangle-free graphs on the vertex set {1,2,,…,N} which induce H on {1,2,…,n}. Now suppose that this frequency tends to a limit as N → ∞. Then, for any vertices v1,…,vn in the infinite set, we let the event that these vertices induce H be this limiting frequency; we have defined the required measure.

In this case, it happens that there is a theorem of Kleitman and Rothschild which says that “almost all finite triangle-free graphs are bipartite”, in the sense that the proportion of triangle-free graphs which are bipartite tends to 1 as the number of vertices tends to ∞. This implies that the probability that any odd number n of vertices in the random graph induce a cycle is zero; so the random infinite triangle-free graph is almost surely (i.e. with probability 1) bipartite.

To generalise the context, let us say that an age is a class of finite graphs (closed under isomorphism and induced subgraphs) having the joint embedding property, that is, any two graphs in the class can be simultaneously embedded in a graph in the class. Equivalently, for any (finite or infinite) graph X, the age of X is the class of finite graphs embeddable in X as induced subgraphs.

Many different infinite graphs may have the same age; but there may be one particularly nice one which we would like to get. So for example, the class of all finite graphs is an age; the “nicest” infinite graph of which it is the age is the random graph, which I have discussed here. For the finite triangle-free graphs, there is a nice infinite graph, Henson’s graph, having this age. So our earlier remarks show that Henson’s graph is not “the random triangle-free graph”. [However, it is the typical triangle-free graph in the sense of Baire category.]

Now take A be a fixed age of finite graphs. (In our example it would be the class of all finite triangle-free graphs.) The following question seems interesting; I have been unable to answer it.

Let H be a fixed graph on the vertex set {1,…,n}. For Nn, let pN(H) be the proportion of graphs in A (on the vertex set {1,…,N}) which induce H on the first n vertices. Is it true that pN(H) tends to a limit p(H) as N → ∞?

If this is true, then we define a probability measure on graphs on a given countable vertex set by the rule that the event that a given n-tuple of vertices induces H should be p(H). We can then ask a second question:

What does the random graph in this model look like?

(Clearly, with probability 1, its age is contained in A; the containment may be proper, as in the case of the random triangle-free graph.)

In two remarkable talks by Balazs Szegedy and Tim Austin in the first of this year’s two London Combinatorics Colloquia this week, I learned something which may throw some light on the second of these questions. (Perhaps on the first also, though I don’t see how.)

Let me take them in reverse order. Tim Austin, after claiming to be an analyst rather than a combinatorialist, told us that some recent developments in infinitary limit objects for sequences of combinatorial objects are related to probabilistic results about exchangeable measures. The simplest case is that of measures on zero-one sequences, which we can regard as functions from the natural numbers to {0,1}. Such a measure is called “exchangeable” if it is unaffected by arbitrary permutations of the natural numbers.

Exchangeability means that the probability measure is invariant under permutations, not any particular random object. The simplest example of an exchangeable probability measure is given by tossing a biased coin countably often. A slightly more complicated measure is defined as follows: take two biased coins with different probabilities of heads; choose one of the two according to some probability measure; then generate the random sequence using this coin.

This concept was introduced by De Finetti who worked on the foundations of probability in the 1930s. He proved a lovely representation theorem for exchangeable measures. Any such measure is defined by a function f from [0,1]×[0,1] to [0,1] by the following rule. Choose u uniformly from [0,1]; for each natural number n, choose un uniformly from [0,1], with all choices independently; then let the nth term of the random binary sequence be chosen to be 1 with probability f(u,un), independently for all n. It is an exercise for the reader to find the function f representing each of the two examples above.

In the 1980s, this result was extended by Hoover and further by Aldous. We now define a probability measure on graphs on the vertex set N (natural numbers) to be exchangeable if it is invariant under permutations of the natural numbers. Obviously the measure giving the “random graph with age contained in A” described earlier is exchangeable, so these results could have something to say about my two problems.

Aldous and Hoover gave a representation theorem for exchangeable random graph measures similar to De Finetti’s: the function replacing f has three variables rather than two.

However, in an important special case, which I will come to in a moment, an arguent using ergodicity shows that the function is independent of the first parameter, so is really a symmetric function from [0,1]2 to [0,1]. As in De Finetti’s case, the values of the function give the probabilities of seeing a given induced subgraph on a given set of vertices, just as in my earlier set-up.

In fact, as Austin remarked, what we have here is precisely a graphon, a limit of a sequence of finite graphs which was introduced by Lovász, Szegedy and others, and had been beautifully described (in a different way) by Szegedy in the preceding talk!

Briefly: let (Gn) be a sequence of finite graphs. From the graph Gn, we define a probability distribution μn on graphs with at most k vertices, for any given k, by choosing k random vertices of Gn and taking the induced subgraph. (The words “at most” allow for the event that the same vertex is picked more than once.) They say that the sequence (Gn) converges if, for every fixed k, the probability measures μn converge to a limit μ. (This can also be expressed, using inclusion-exclusion, in terms of graph homomorphisms; the condition for convergence is that, for any fixed finite graph H, the probability t(H,Gn) that a random map from H to Gn converges to a limit t(H,G).)

A graphon is the “limit” of a convergent sequence of finite graphs. But what is one? It can be described by a function W from [0,1]2 to [0,1], which is symmetric and measureable. The measure μ is defined from the graphon as follows: pick any real numbers x1,…,xk in the unit interval; restricted to these, W defines a symmetric matrix of order k. Now join i to j with probability W(xi,xj).

There is much more to the rich theory of graphons than I have begun to hint at here. For example, there are connections with the flag algebras defined by Razborov, who used them to solve a case of the clique density problem. (The general solution to this problem was given at the Colloquium by Christian Reiher.) But since I don’t know what flag algebras are, I will say no more about this.

But there seems to be an obvious connection with my question about random graphs with given age. Just now I don’t have enough time to think what it might be. Any ideas?

However, something is still missing. As I have explained in earlier posts on the countable random graph, this is a beautiful object with many striking properties. Yet the corresponding graphon is the function W which takes the constant value 1/2.

Can the remarkable properties of the random graph be extracted from the constant function 1/2? If so, how?

Also, changing 1/2 to any other number between zero and one presumably gives a different graphon, but gives the same countable random graph! 