Yesterday my colleague Collin Bleak sent me an email containing the number 1247. I will explain why this made me so pleased.

#### Automata

Our problem concerns automata. If you know Collin, you may suspect that it has something to do with the Thompson groups; indeed it does, but I won’t explain this now.

An *automaton* with *N* states over an alphabet *A* is a machine whose action consists of reading a symbol from *A* and as a result undergoing a transition to a possibly different state. It does this repeatedly. No question of start and end states or languages accepted.

So an automaton can be represented by a directed graph with *N* vertices (corresponding to the states), edges labelled by *A*, and with the property that there is exactly one edge with any given label *leaving* any vertex. If it is in state *v* and reads a symbol *a* then it follows the unique directed edge with label *a* leaving *v*, and changes to the state corresponding to the terminus of this edge.

An automaton is said to be *k*–*determined* if, whenever it reads a word *w* = *a*_{1}…*a _{k}* of length

*k*, its final state depends only on the word

*w*and not on the initial state. In other words, all information about the initial state is lost after

*k*steps.

Our basic question is: How many *k*-determined automata are there over an alphabet with *n* symbols? (To avoid trivialities we assume every vertex can be reached by reading some sequence.) The number 1247 is the answer for *k* = 4 and *n* = 2. The sequence for *n* = 2 and increasing *k* runs 2, 5, 30, 1247, …. This sequence is not (yet!) in the OEIS.

### De Bruijn graphs

De Bruijn showed that given a positive integer *m* and a finite alphabet *A*, there is a circular sequence of length |*A*|^{m} which contains every *m*-tuple exactly once as a consecutive subsequence.

He proved this by constructing a directed graph as follows. The vertices are all the (*m*−1)-tuples over *A*. Each *m*-tuple *a*_{1}…*a _{m}* labels an edge which joins its initial (

*m*−1)-tuple

*a*

_{1}…

*a*

_{m−1}to its final (

*m*−1)-tuple

*a*

_{2}…

*a*. The digraph is connected and has in-degree and out-degree both equal to |

_{m}*A*|. By Euler’s theorem, it has a closed trail passing once along each edge. Reading the symbols on this trail gives the required

*de Bruijn sequence*.

If we instead label the edge just by the last symbol in its *m*-tuple, then the graph becomes an automaton. Moreover, it is (*m*−1)-determined, since if we read a sequence of *m*−1 symbols we arrive at the vertex with that sequence as label. Even more, the de Bruijn graph is the “universal” (*m*−1)-determined automaton, since different sequences lead to different vertices.

To avoid confusion, I will now use *k* rather than *m*−1.

The picture shows the de Bruijn graph with *k* = 3 and alphabet {0,1}, drawn as an automaton.

### Foldings of de Bruijn graphs

Any *k*-determined automaton gives an equivalence relation on the vertex set of the de Bruijn graph: two vertices are equivalent if their labelling strings put the automaton into the same state. This equivalence relation has the further property that, if two vertices with labels *u* and *w* are equivalent, then for any letter *a* in the alphabet, the vertices with labels *v ^{–}a* and

*w*are also equivalent, where

^{–}a*v*is obtained by dropping the first symbol of

^{–}*v*.

Such an equivalence relation is called a *folding* of the graph.

Conversely, given a folding, there is a unique automaton which gives rise to it: we can take the vertices to be the equivalence classes, and define the transitions in the obvious way.

So to count the *k*-determined automata, we only have to count foldings of the de Bruijn graph whose vertices are labelled by *k*-tuples.

This is what we have done for a variety of small values, including *k* = 4 and |*A*| = 2, as noted above.

The number 1247 (and indeed most of the others) has been calculated twice. Collin used the strategy of taking the set of all strings, making an identification, working out its consequences, making another, and so on, keeping track of what has already been found to avoid duplications; he programmed in C++ and the calculation of the number 1247 took less than a minute.

I worked from the other end, iterating over all partitions of the set of strings and checking the “closure” condition in the definition of a folding, so that no check for duplicates was necessary. I programmed it in GAP, and my program took about ten minutes to calculate the number. A reasonable difference, I think.

Incidentally, GAP contains iterators for some combinatorial objects, but not for set partitions, so I had to write my own. In my Combinatorics textbook, written in the mid-1990s, I emphasised the value of being able to iterate over a large set of objects, rather than generating them all at the start, so it was a good opportunity to practise my own philosophy.

I hope to say more about the relevance of this to Thompson groups sometime in the future. There is an interesting story there too.

Reblogged this on MaTHsperger Thoughts.