Yesterday I gave a talk on a theorem of Terence Chan, which he spoke about at a network coding meeting at Queen Mary in late 2012. He sketched a proof on the common room table after the talk, and I decided it was interesting enough to tell people about. Here is a brief summary; you can find slides of my talk here.
I am writing about it here because it raises an issue which is potentially a stumbling-block for the extreme Platonic view that mathematics is a pre-existing landscape which is simply discovered by intrepid explorers. The problem arises because some mathematical facts are regarded as more valuable than others, but I will show that this value is dependent on where you are standing to look at the fact.
Entropy is associated with a random variable, which is a function on a probability space Ω; but it does not depend at all on the values of the function, only on the kernel partition of Ω (in which two points are equivalent if they have the same image). The entropy of the random variable f, or the partition F, is −∑pi log pi, where pi is the probability of the ith part of the partition. (To keep things simple, assume that Ω is finite.) The entropy of f is denoted by H(f).
Note that the base of the logarithms is not specified; it may be 2, or e, or the size of the alphabet in a coding application. Changing the base just multiplies the entropy by a constant. So I will be relaxed about such re-scaling.
The joint random variable associated with a collection of random variable maps Ω to the cartesian product of the ranges of the random variables in the collection, and is defined coordinatewise in the obvious way. Two points of Ω are equivalent in the kernel partition of the joint random variable if and only if they are equivalent in the kernel partitions of each of the random variables separately. So the corresponding partition is the meet (in the partition lattice) of the partitions associated with the individual random variables. It is denoted by H(f1,f2,…).
Information theorists are interested in the following question: given a collection of n random variables, describe the point in (2n−1)-dimensional real space whose coordinates are the entropies of the joint random variables associated with all subsets of the given set of random variables. (The −1 is because the empty set of random variables has zero entropy and can be ignored.)
Now we can get examples from a group G with a family of subgroups K1…Kn. Choose an element uniformly at random from G. With each subgroup is associated a random variable whose kernel partition is the right coset partition associated with the subgroup. Taking joint random variables corresponds to taking the intersection of the corresponding subgroups. Now the entropy of the random variable associated with a subgroup K is simply the logarithm of the index of K in G.
We call this a group family of random variables.
Terence Chan’s theorem asserts the following.
The entropy point associated with any family of random variables can be approximated arbitrarily closely (up to scaling) by the point associated with a family of subgroups of a finite group.
As an immediate corollary,
Any linear inequality satisfied by the entropies of a group family of random variables is also satisfied by the entropies of an arbitrary family of random variables.
This suggests a possible two-way flow of information between group theory and information theory. An “additive” inequality for entropies becomes an “additive” inequality for the logarithms of subgroup indices, and hence a “multiplicative” inequality between the indices themselves; and by Chan’s theorem, the implication holds in the other direction as well.
For example, Shannon’s inequality, a classical result, states that H(f1,f2) ≤ H(f1)+H(f2). Translated into the language of groups and subgroups, this becomes the well-known fact that |G:K1∩K2| ≤ |G:K1|·|G:K2|, or translating to a statement about orders, |G|·|K1∩K2| ≥ |K1|·|K2|. So this simple inequality is equivalent to Shannon’s.
The first “Non-Shannon information inequality” was proved by Zhang and Yeung in 1998, and is shown below. Since then, many more such inequalities have been established.
When translated, we get an inequality for indices of subgroups of a finite group which, according to Chan’s theorem, is precisely equivalent:
Now the philosophical question arises. The paper of Zhang and Yeung was regarded as a very important breakthrough in information theory. Had some group theorist discovered the bizarre inequality above for subgroups of a finite group, is there any chance that it would have been published? Why is one result of so much less perceived value than the other, although they are precisely equivalent?