## Algebraic statistics

Henry Wynn and Hugo Maruri ran a two-day workshop at the Isaac Newton Institute on algebraic statistics. There are more connections between algebra and statistics than I will describe here, but the one day of the workshop that I was able to attend working around my teaching schedule at least allows me to speak with more clarity than I could have managed before about a particular connection between commutative algebra and statistics. I am grateful to Henry, Hugo, and the other speakers for their insights. (But maybe I have completely misunderstood!)

### Term order and column selection

Suppose that we are investigating a chemical engineering process, where there are several parameters under our control: perhaps temperature, concentration of various reagents, process time, flow speed, etc. Each run of the experiment can be described by a point in the multi-dimensional parameter space, and the result of the run is the measurement of the process yield. So the experimental design is a finite subset S of Rd.

Our aim is to fit a function, as “simple” as possible, on Rd, which agrees with the observations at the points of S. This function can then be used to predict where to look for optimal or near-optimal yield. Note that there is no mention of experimental error or blocking here.

What does “simple” mean?

• We will use polynomial functions. Any function on S can be interpolated by a polynomial, in many different ways; in principle, we can determine as many coefficients as there are points in S, say n. So our model is a set of n monomials, or more precisely, the set of all linear combinations of these monomials. As suggested, we require that any function on S can be uniquely interpolated by such a linear combination. We say the model is uniquely identifiable.
• We assume that, if a given monomial is in the model, then so are all monomials which divide it. Such a model is called hierarchical, or satisfies the marginality principle. This condition is statistically desirable, though I can’t say precisely why. The set of all models which are uniquely identifiable and hierarchical is called the statistical fan of S.

Each monomial in the d variables x1,…,xd is specified by giving the exponents of the variables, a d-tuple α of non-negative integers. We let xα denote the monomial with exponents α1,…,αd.

The statistical fan is still too large, in cases of even moderate dimension and size. Algebraic statistics provides a further selection principle.

A term order is an ordering of monomials such that

• the minimal element is the monomial 1;
• if xα<xβ, then xα+γ<xβ+γ for any d-tuple γ of non-negative integers.

Given a term order, a model is chosen by the following column selection algorithm. Form a matrix whose rows are indexed by points of S and columns by monomials in term order, with the entry in row p and column xα being the value of the monomial xα at p. Now the model consists of those monomials for which the corresponding column of the matrix is linearly independent of the preceding columns. Clearly this model is uniquely identifiable.

The set of models arising from different term orders is called the algebraic fan of S.

Several questions arise:

• Is an algebraic model hierarchical? Yes, this is a simple argument which I leave to the reader.
• Is every hierarchical, uniquely identifiable model algebraic? No, here is an example. Let S be the set {(0,0),(1,0),(0,1),(−1,1),(1,−1)}. Every coordinate is equal to its cube, so we do not need to consider powers higher than the second. There are two term orders, defined by x1<x2 and the reverse; they give rise to the models {1,x1,x2,x12,x1x2} and {1,x2,x1,x22,x1x2}. The model {1,x1,x12,x2,x22} is not algebraic. (For example, if x1<x2, then x12<x1x2<x22.)
• Does the algebraic method select the interesting models? Of this I am not convinced. It seems to me that, if the effects of the two variables x1 and x2 are additive, then the excluded model might be the best.

The claimed advantage is that, for even moderate values of d and/or numbers of points in the design, the number of models in the statistical fan may be vast; the algebraic fan contains substantially fewer (though still perhaps a large number).

### Monomial ideals and Gröbner bases

We begin as traditional in commutative algebra. Associated with the finite set S is the ideal I(S) in the polynomial ring R(X) consisting of all polynomials which vanish on S. We can recover S as the set of common zeros of these polynomials. (This is not quite Hilbert’s Nullstellensatz, since R is not algebraically closed.) We use, not this ideal, but a different one which is easier to work with.

Given a term order, the Gröbner basis algorithm finds a generating set for the ideal I(S). We now let J be the ideal generated by the leading terms (with respect to the term order) of the elements of the Gröbner basis. These leading terms are the monomials which are minimal (with respect to divisibility) subject to not lying in the set found by the column selection algorithm. So there is a vector space isomorphism between R(x)/J and the space of real functions on S. Since J is a monomial ideal (one generated by monomials), it is easier to deal with, both theoretically and by computational tools, than I(S).

The monomials in J are precisely those not selected by the column selection algorithm. Clearly the crucial combinatorial data is the specification of the boundary between the two sets of monomials. This boundary is defined by the Betti numbers of the quotient ring (the ranks of the modules occurring in a minimal free resolution).

### Other applications

Two other applications have nothing to do with the original problem, but use similar ideas.

#### Reliability

The prototype here is a network where each edge may be open or closed; the network will operate if there is a path from source to sink consisting of open edges. More generally, there is a set S of components of a system, each of which may fail; there is a collection of subsets, closed upwards, such that the system will operate if the components in one of these sets have not failed. We want to calculate the probability that the system operates.

This can be calculated using inclusion-exclusion if we know the probabilities of the minimum sets in the collection and all of their intersections.

It may be possible to simplify the formula. We can encode the subsets of S for which the system works as monomials in variables indexed by S. Now the inclusion-exclusion formula can be simplified to one involving the Betti numbers of the quotient of the polynomial ring by this monomial ideal. The resulting formula may have many fewer than the 2|S| terms involved in the PIE formula.

There is another advantage. The PIE formula can be truncated to give upper and lower bounds for the true value (the Bonferroni inequalities). The same is true of the formula involving the Betti numbers.

#### Markov bases

This comes from the work of Diaconis and Sturmfels on contingency tables. A contingency table is simply a matrix of non-negative integers with prescribed row and column sums. It is difficult to count contingency tables, even approximately, but it is important to be able to choose one at random. One possible tool for this is to create a Markov chain, in which a sequence of simple moves takes us to a random table. One simple move is: reduce the (i,k) and (j,l) entries by 1, and increase the (i,l) and (j,k) entries by 1.

More generally, given a fixed matrix A with non-negative integer entries, we want a Markov chain on the set of solutions of Ax = b, for given b. We can add to x any element of the kernel of A without changing b; so we need to take a set of generators of Ker(A) as a module over the integers. So we need to test whether given elements generate this kernel. This can be done by translating into commutative algebra. Any integer vector y in Ker(A) can be written as y+y, where the two vectors have non-negative integer coefficients.

A non-negative integer vector z can be translated into a monomial m(z) by using its entries as exponents. Now the vectors y+y generate the kernel if and only if the binomials m(y+)−m(y) generate the corresponding ideal. Now binomial ideals are not as simple as monomial ideals, but are still simpler (theoretically and computationally) than arbitrary ideals; so this is a reasonable way to proceed.

However this is not the final answer, since whatever moves we make in the Markov chain on solutions of Ax=b for fixed b must not take us outside the set of non-negative integer vectors, so some moves will have to be disallowed. There is a “local” form of the Diaconis–Sturmfels Theorem which covers this.

For a small example, consider the equation x+y=b. The “fibre” consists of the integer points on the line segment from (0,b) to (b,0). The moves ±(1,−1), restricted so as not to leave the fibre, clearly reach every point. If instead we use the moves ±(2,−2) and ±(3,−3), the fibres with b≥3 are connected, but the fibres with b=1 and b=2 each fall into two components. The corresponding binomials x2y2 and x3y3 generate an ideal of codimension 1 in the ideal generated by xy.