Yesterday I gave a talk on a theorem of Terence Chan, which he spoke about at a network coding meeting at Queen Mary in late 2012. He sketched a proof on the common room table after the talk, and I decided it was interesting enough to tell people about. Here is a brief summary; you can find slides of my talk here.

I am writing about it here because it raises an issue which is potentially a stumbling-block for the extreme Platonic view that mathematics is a pre-existing landscape which is simply discovered by intrepid explorers. The problem arises because some mathematical facts are regarded as more valuable than others, but I will show that this value is dependent on where you are standing to look at the fact.

Entropy is associated with a random variable, which is a function on a probability space Ω; but it does not depend at all on the values of the function, only on the kernel partition of Ω (in which two points are equivalent if they have the same image). The entropy of the random variable *f*, or the partition *F*, is −∑*p _{i}* log

*p*, where

_{i}*p*is the probability of the

_{i}*i*th part of the partition. (To keep things simple, assume that Ω is finite.) The entropy of

*f*is denoted by

*H*(

*f*).

Note that the base of the logarithms is not specified; it may be 2, or e, or the size of the alphabet in a coding application. Changing the base just multiplies the entropy by a constant. So I will be relaxed about such re-scaling.

The joint random variable associated with a collection of random variable maps Ω to the cartesian product of the ranges of the random variables in the collection, and is defined coordinatewise in the obvious way. Two points of Ω are equivalent in the kernel partition of the joint random variable if and only if they are equivalent in the kernel partitions of each of the random variables separately. So the corresponding partition is the meet (in the partition lattice) of the partitions associated with the individual random variables. It is denoted by *H*(*f*_{1},*f*_{2},…).

Information theorists are interested in the following question: given a collection of *n* random variables, describe the point in (2^{n}−1)-dimensional real space whose coordinates are the entropies of the joint random variables associated with all subsets of the given set of random variables. (The −1 is because the empty set of random variables has zero entropy and can be ignored.)

Now we can get examples from a group *G* with a family of subgroups *K*_{1}…*K _{n}*. Choose an element uniformly at random from

*G*. With each subgroup is associated a random variable whose kernel partition is the right coset partition associated with the subgroup. Taking joint random variables corresponds to taking the intersection of the corresponding subgroups. Now the entropy of the random variable associated with a subgroup

*K*is simply the logarithm of the index of

*K*in

*G*.

We call this a *group family* of random variables.

Terence Chan’s theorem asserts the following.

The entropy point associated with any family of random variables can be approximated arbitrarily closely (up to scaling) by the point associated with a family of subgroups of a finite group.

As an immediate corollary,

Any linear inequality satisfied by the entropies of a group family of random variables is also satisfied by the entropies of an arbitrary family of random variables.

This suggests a possible two-way flow of information between group theory and information theory. An “additive” inequality for entropies becomes an “additive” inequality for the logarithms of subgroup indices, and hence a “multiplicative” inequality between the indices themselves; and by Chan’s theorem, the implication holds in the other direction as well.

For example, *Shannon’s inequality*, a classical result, states that *H*(*f*_{1},*f*_{2}) ≤ *H*(*f*_{1})+*H*(*f*_{2}). Translated into the language of groups and subgroups, this becomes the well-known fact that |*G*:*K*_{1}∩*K*_{2}| ≤ |*G*:*K*_{1}|·|*G*:*K*_{2}|, or translating to a statement about orders, |*G*|·|*K*_{1}∩*K*_{2}| ≥ |*K*_{1}|·|*K*_{2}|. So this simple inequality is equivalent to Shannon’s.

The first “Non-Shannon information inequality” was proved by Zhang and Yeung in 1998, and is shown below. Since then, many more such inequalities have been established.

*H*(*f*_{1},*f*_{2})+*H*(*f*_{1},*f*_{3})+3((*H*(*f*_{2},*f*_{3},)+*H*(*f*_{2},*f*_{4})+*H*(*f*_{3},*f*_{4}))

≥ 2*H*(*f*_{2})+2*H*(*f*_{3})+*H*(*f*_{4})+*H*(*f*_{1},*f*_{4})+*H*(*f*_{1},*f*_{2},*f*_{3})+4*H*(*f*_{2},*f*_{3},*f*_{4})

When translated, we get an inequality for indices of subgroups of a finite group which, according to Chan’s theorem, is precisely equivalent:

|*K*_{1}∩*K*_{2}|·|*K*_{1}∩*K*_{3}|·(|*K*_{2}∩*K*_{3}|·|*K*_{2}∩*K*_{4}|·|*K*_{3}∩*K*_{4}|)^{3}

≤ (|*K*_{2}|·|*K*_{3}|)^{2}·|*K*_{4}|·|*K*_{1}∩*K*_{4}|·|*K*_{1}∩*K*_{2}∩*K*_{3}|·|*K*_{2}∩*K*_{3}∩*K*_{4}|^{4}.

Now the philosophical question arises. The paper of Zhang and Yeung was regarded as a very important breakthrough in information theory. Had some group theorist discovered the bizarre inequality above for subgroups of a finite group, is there any chance that it would have been published? Why is one result of so much less perceived value than the other, although they are precisely equivalent?

My friend João Araújo emailed me a comment which, with his permission, I am posting here.

—

Dear Peter,

Your post post is extremely interesting.

So let me wave my hands a little.

When I was in my first semester, undergraduate, I came to the conviction that “for every theorem there is a notation in which the proof of the theorem is trivial”.

If a Roman says “every natural number can be written as a linear combination of powers of 10”, that would be a great achievment, while for us the theorem is self-evident.

Probably what happens is this:

1. every topic has its own notation, tools and language.

2. All these combine to prompt natural problems in that setting.

3. The problems in that setting are those not easily attainable with the notation and tools of that area of mathematics; because the easily attainable are not considered relevant.

4. On the other hand, that theory might be “equivalent” to another one; and when such a thing happens, we can have many situations:

A. One difficult problem stated on the first setting is equivalent to a trivial problem in the second setting. Thus a breakthrough theorem in a topic might be trivial in the equivalent form. (Think on a divisibility criteria for 3 in Roman numerals and Arabic numerals).

B. one difficult problem stated on the first setting is equivalent to an equally difficult problem in the second setting. (There are many theorems that only do this; for example, the many equivalent formulations of the Riemann Hypothesis).

C. one difficult problem in the first setting is equivalent to a problem that does not look similar to the usual problems prompted by the notation and tools available for the second setting. (It seems this is the situation you describe today in your blog).

D. The natural problems in the first topic translate into problems immediately found as being also very natural in the second setting, but quite surprisingly nobody ever before thought about them! This is the golden standard for these type of considerations; and gold turns into the diamond standard when the impossible natural problems of the first turn into natural and solvable problems in the second, something fully attained by (k,k+1)-homogeneity, k-utp, k-etp, etc. ;-))))

And after all this, I remain a Platonic😉

Best wishes,

João

—

The notions of k-utp, etc., arose in our work on semigroups and groups, and have not had as much discussion here as I would like. One day maybe …

I’m not sure his list is exhaustive. The most dramatic instance that happened to me was the observation (by Jean-Marie Goethals, Jaap Seidel and me together) that the problem of classifying graphs with least eigenvalue −2 is solved by the classification of spherical root systems with all roots of equal length ((the famous ADE classification, which I have discussed at some length). Arguably this is case D, but it’s not quite a translation.