Here is a cautionary tale to show that not everything that looks like an instance of the ADE classification actually is so.
When I first learned about optimal design in statistics, I was very excited to find that there are three important flavours of optimality, called A, D and E. Indeed, I was encouraged by Ching-Shui Cheng, one of the leaders in the field, to attack a particular problem, using the results on smallest eigenvalues of graphs based on the ADE root systems.
In fact, it is just coincidence (or should I say, in view of the paper I wrote, synchronicity?)
In this post I will give an outline of the types of optimality, and say something rather brief about what I did.
Optimal designs
I have explained before that experimental design can help save resources, and even save lives. The point of statistics is that it can give us information about which of several proposed treatments is the most efficient, and even estimate a measure of how efficient the treatment is. The most important tool in increasing the precision of these estimates is decreasing the variance of the estimators (which, because of uncertainties in the measurements, are random variables).
Imagine two treatments with different effects. Because of random noise, the response of an experiment on the two treatments will give two normal curves with the treatment effects as means, assuming our estimators are unbiased. If the variance of the estimators is large, the two curves will have a big overlap, and it is very likely that our experiment will not distinguish which is best. However, if the variance is small, and the two curves are tightly concentrated about their means, then it is much more likely that we will detect the difference.
So our general principle is:
- Good design reduces variance of estimators.
We are going to consider the situation where a number v of treaments are being compared in an experiment. There are b experimental units (or plots) available for the experiment.
If the plots are all very similar and don’t interact, our best strategy is to use each treatment the same number of times, as nearly as possible (that is, b/v times, rounded up or down if necessary). But things are not always that simple. Often there will be some structure on the set of plots, which will result perhaps in non-zero correlations between the results obtained from these plots, or (worse) an effect whereby the results on some plots will tend to be larger than those on others.
The simplest case (all I shall consider) is that where the plots are divided into blocks each containing k plots. For example, we may be testing fertiliser on different farms, with k plots available on each farm; or a medical intervention may be tested on the two arms of a number of patients. In this case, the design is more subtle; we must choose which treatments to apply to the plots in each block. This is a block design. Optimality applies in much more general situations; but this will do for illustration.
The simplest model is that the response of each plot is the sum of three terms: one is a treatment effect (this is what we are usually interested in); one is a block effect (say a difference in fertility between the farms); and one is random variation or noise, which we cannot avoid. Let ti be the effect of treatment i. We are interested, not in the values of these effects, but in their differences. (Adding a constant to all treatment effects and subtracting it from all block effects would change nothing.) More generally, we are interested in treatment contrasts, linear combinations of treatment effects where the coefficients sum to zero). We make one further technical assumption, that of connectedness: there should be a path from any treatment to any other where consecutive treatments occur together in a block. (If the design is not connected, then some treatment effects cannot be estimated.)
If there are only two treatments, we would simply want to estimate their difference with the smallest possible variance; a design which did this would be optimal. However, if there are more than two, then some decision is required. Do we want to minimise the average variance of the v(v−1)/2 treatment differences? Or do we want to minimise the worst case, the maximum variance of a “normalised” treatment contrast (this means, take the sum of squares of the coefficients to be 1)? Or do we wish to minimise the volume of a “confidence ellipsoid” for the treatment differences (the multidimensional analogue of a “confidence interval” on the line)?
The three types of optimality just described are given the letters A (for “average”), E (for “extreme”), and D (for “determinant”), respectively. The reason for the last term is that the volume of a confidence region is proportional to the value of a certain determinant.
There are other types of optimality, but these are the most widely used. Certain communities seem to prefer one or another for no discernible reason (for example, statisticians in agriculture use A, while those in industry prefer D.) My limited observation suggests that the sitation is a bit like different cults of a religion, where there are no agreed criteria for resolving disputes.
In general there is no unique “best” design, and indeed the three criteria may give entirely different answers to the question “which is best?” But there is one case where they all agree. If there is a balanced incomplete-block design, that is, a design in which any two treatments occur together in a block the same number of times, then it is A-, D- and E-optimal. So, for example, if we are testing seven different fertilisers, and we have three plots available on each of seven farms, then we should use the Fano plane!
Note that there is absolutely no reason why two plots in the same block cannot have the same treatment applied to them. This may come as a surprise to you if you are familiar with combinatorial designs, where a block is identified with a set of points, or treatments. In general, if we take this point of view, a block is a multiset of treatments.
The concurrence graph
The problem of deciding which block design is optimal in any sense can be translated into graph theory. The concurrence graph of the design is the multigraph whose vertices are the treatments; each occurrence of two treatments in a block gives an edge between the corresponding vertices. (So if one treatment occurs twice in a block and another three times, this block contributes six edges joining the two vertices. There are no loops. The connectedness of this graph is equivalent to the connectedness of the design, which we assume.
The Laplacian matrix of a loopless multigraph is the symmetric matrix whose (i,i) entry is the valency of vertex i, and whose (i,j) entry for i≠j is the number of edges joining vertices i and j. This matrix has row sums zero, so has an eigenvalue 0; connectedness implies that this eigenvalue is simple. The other eigenvalues are non-trivial. Now we can describe optimality in terms of the non-trivial eigenvalues, as follows. Maximisation is over all designs in the relevant class (here block designs with v treatments and b plots in blocks of size k).
- A design is A-optimal if and only if it maximises the harmonic mean of its non-trivial Laplacian eigenvalues.
- A design is D-optimal if and only if it maximises the geometric mean of its non-trivial Laplacian eigenvales.
- A design is E-optimal if and only if it maximises the minimum non-trivial Laplacian eigenvale.
These properties are very natural in graph theory terms. A-optimality corresponds to minimising the sum of resistances between all pairs of vertices, when the graph is regarded as an electrical network (with each edge a one-ohm resistor); this is also related to ths hitting time of a random walk on the graph. D-optimality corresponds to maximising the number of spanning trees of the graph. E-optimality doesn’t have a simple translation, but is bounded above and below by functions of the isoperimetric number of the graph.
Note that these are all “global” measures of the connectivity of the graph, unlike the usual graph-theoretic connectedness (the smallest number of vertices which must be deleted to disconnect the graph), which is a very “local” condition (it is not greater than the minimum valency).
For more details, see my paper with R. A. Bailey in the invited talks volume of the 2009 British Combinatorial Conference.
Root systems and optimal designs
I will not go into much detail about the paper, which is in the volume of the Michigan Mathematical Journal dedicated to Donald Higman, who was one of my DPhil examiners, as well as co-discoverer of the Higman–Sims group.
In brief: as we have seen, a balanced block design is optimal if it exists; but usually it doesn’t. Cheng’s idea was that we could tinker with the design so that we increase some concurrences by 1 and decrease others by 1, without decreasing the eigenvalues too much. In other words, add a matrix with entries +1, 0 and −1 whose smallest eigenvalue is not too small (say, for example, greater than −2). But this is just the matrix of inner products of vectors in a root system under the ADE classification!
In the paper, I give a description of all such symmetric matrices with constant row sums, and determine all those in small root systems (including all those in types E6, E7 or E8).
There may be optimal designs lurking in some of these examples; I did not examine this.