(Picture by New York Zoological Society, from Wikimedia Commons)

Ford was wildly excited.

“Arthur!” he said, “this is fantastic! We’ve been picked up by a ship powered by the Infinite Improbability Drive! This is incredible! I heard rumours about it before! They were all officially denied, but they must have done it! They’ve built the Improbability Drive! Arthur, this is … Arthur? What’s happening?”

Arthur had jammed himself against the door to the cubicle, trying to hold it closed, but it was ill fitting. Tiny furry little hands were squeezing themselves through the cracks, their fingers were inkstained; tiny voices chattered insanely.

Arthur looked up.

“Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us about this script for *Hamlet* they’ve worked out.”

Douglas Adams, *Hitch-Hikers’ Guide to the Galaxy*

The idea that an infinite number of monkeys, given an infinite amount of time, will produce the complete works of Shakespeare is no doubt one of the most compelling images of infinity, and is widely used. It provided the title sequence for the *Horizon* program “To Infinity and Beyond” I appeared on last year.

I don’t know who invented it. I have seen it credited to Aristotle, but I don’t believe he used a typewriter. Michael Harris, in an article entitled “Do androids prove theorems in their sleep?” in *Circles Disturbed: the Interplay of Mathematics and Narrative* (a recent collection edited by Apostolos Doxiadis and Barry Mazur), attributes the “infinite monkey theorem” to Émile Borel, on the basis of a citation in Wikipedia.

Infinite monkey theorem?

The *Horizon* programme featured David Spiegelhalter, who had programmed a computer in Cambridge to emulate a monkey on a typewriter. After several weeks’ typing, it had produced a couple of short stretches of characters which were recognisable English words.

According to David, he explained to the producer that, like all theorems, the infinite monkey theorem has hypotheses. It is true if we require that the monkey chooses keypresses independently and that each key is equally likely to be chosen. (In fact, less stringent conditions suffice. “Approximate independence” will do and it is enough if all the key probabilities are constant and non-zero; they may even vary, and tend to zero, provided they do so sufficiently slowly.) The theorem says that, if these conditions hold, then with probability 1, a single monkey will indeed type Shakespeare’s works, given infinitely long: some substring of consecutive characters in the output string will be a perfect match of the Complete Works. Unfortunately, these assumptions were not made very clear in the finished programme.

I will make a short digression here to discuss “probability 1”, or more conveniently, “probability 0”. A *null set* is one which has probability 0.

We can make sense of arbitrary probabilities if we understand null sets and have a way of repeating an experiment infinitely often (with different runs independent). For to say that a single event (a *Bernoulli trial*) has probability *p* is, according to the Law of Large Numbers, equivalent to the fact that in an infinite sequence of independent trials, the relative frequency of its occurrence is *p* except on a null set. Thus *p* is determined objectively; there can only be one number for which the above statement is true. This is the basis (perhaps not very secure) of the frequentist interpretation of probability.

The monkey theorem can also be finitised: we can compute the probability that the text of Shakespeare has appeared by the time the monkey has typed *N* characters, as a function of *N*; it tends to 1 as *N* tends to ∞.

Suppose that there are *k* characters available on the typewriter keyboard, and that the number of characters in Shakespeare’s complete works is *n*. Then (assuming the characters independent and equally likely) the probability that a given sequence of *n* characters of output contains Shakespeare’s works is 1/*k ^{n}*. The time taken for the sequence to appear with reasonably high probability will be roughly the reciprocal of this number, which is far far longer than the age of the universe in nanoseconds (by several orders of magnitude in the exponent).

Of course, as well as the complete works of Shakespeare, and every other text which has ever been written, there will be many others (as Jorge Luis Borges describes in his story “The Library of Babel”). There will be a copy of Shakespeare identical to the original except that the name Hamlet is replaced by Harris, or Hitler, or Rumplestiltskin, or anything else you like. There will be alternative versions in which *Hamlet* is a comedy and *Twelfth Night* a tragedy. And so on. How long will it take before the monkey produces something which bears at least a superficial resemblance to Shakespeare?

It is interesting to vary the assumptions in the theorem. One way of doing this was investigated by Claude Shannon in his study of the entropy of natural language. One can analyse large quantities of English text to estimate the relative frequencies of different letters; the monkey’s output using these as probabilities instead of the uniform distribution already looks much more like English. But independence fails in English: Q is never followed by anything except U (except in foreign words), and T is followed more often by H than independence would allow. So one can modify the monkeys’ typing to match the frequency of letters following the one last typed. Similarly, one can match the frequency of trigrams (triples of letters).

When I designed a course on cryptography, I repeated some of Shannon’s experiments, but rather than analysing a large body of material, I simply used a long text I had to hand, Lewis Carroll’s *Alice’s Adventures in Wonderland*. Here are the results. I took the alphabet to consist just of 26 letters and a space. You can see how the input text influences the result, especially when trigrams are matched; subwords of “Alice” are unexpectedly common.

Matching letter frequencies:

garyrndtdbleayir hedryeeabeslt tyt watat vnot sooannaheoynoc hhh ndn e n mom scie cehealiiea yneuries u imn h utootpn eomvtet ia ecadehatyba eub e lsrv utl ecnrhmer etwtata nstp thttwttl ht tth dg uyatnpbs toinhpitehttesttthotrehushilwlhtaehyto rovt aget eaeaflrwu gnat asrl eeri luikghreborelephre hhvde egnso nodieiha dcoeothgoa tsabns s cneo ndnhfbtsont ne cpnoed m t old fzl rohuiinirtosthe arrn genialendtr hhntn tsmtr osnol ngohne aiauumnie p hhb te t gtt o araswc tak omlhidtaoi er rlumh ceca tlo acnimal tto sosi ah htoe c sty laaahsouseshi oae oh afasth wnsihnaeoawoi aesnhi yb vresptn gas elplteot or annner en s e dfhat tso nmlr te euhdre ltsnsr f reesd s cchtehavns uhtiwalo tahot lrrnnt

Matching digram frequencies:

tre wherrltau ar a inor hee ly goove aye abinglothased as an nontte fin whike it im yon coveng a per weker ligo d ated ay s red ase ous andldrthi i anory acke owhalist the w an thi tuth abinwaly lyton bofforyilenour t n ns art asod h athostugir telidademifure bing hee hedertliryouricell araks edshe capl asove a asino thaf ar at heldryirry id and aghanorsith anesance age angh oum st athed w waronoubit ir bellea a d a at alle t quceendld hello ag t we mar ncerin avesabout ag thedoed sherkishe ano ai t ithe alkeyorated abomor p rs he ag a itainokittina acerr s abupped iranchendl whecthede awhe athai asus oo i and a s shermfu bar and a thre mer s aig it at a an y b alerd a taryouga shed f aithon iseal anghetheme as put m s n d

Matching trigram frequencies:

i loome four shone shemalice cou at sion to one se al the sped ithe gand nerse shereaverybottly embecon unnoth there pen the droqueelf land gloorger an tol the came in go the could ner so des on a wit ite bee ot the spearep onfor hown aft she is ander han ithe quive cut of ano mut andly wit it wrilice dookinam ther heseen everse ter and owles a saing alice way le jusishe s to its torrock ing teopersed show as dif to happen theirs itte heam whis way vered ant his a sairs handeauteree way murse begs a as sid s yout of ence wo cho and th ord des ned be that speopead the timessizaris ank th all guittelf to holl his and execin hand th t

It looks like some of the spam I get.

For fun, I also turned some of the output into doggerel verse: impossible to recite with a straight face! (Try it with the paragraph above.)

Michael Harris, in his article, is trying to get the monkeys to type out the proof of, say, Fermat’s last theorem in a suitable formal language. So rather than probabilistic restrictions, he imagines a mechanism in the typewriter so that any line which is accepted must be a well-formed formula of the language and must follow from previously typed formulae by valid rules of inference. It would be interesting to estimate how long a monkey would take to do this.

The “monkeys and typewriters” theory was tested experimentally at Paignton Zoo in 2003. (The BBC report is here.) A computer terminal was put in the cage of the Sulawesi crested macaques. After a month, the monkeys’ five pages of output consisted mostly of the letter S, and they had terminated the experiment by defecating on the keyboard. The scientific officer at the zoo said, “The work was interesting but had little scientific value, except to show that the ‘infinite monkey’ theory is flawed.” I would say, rather, that it demonstrates very well what David Spiegelhalter said. (More: lack of understanding, by bankers attempting to use mathematics, of the need for hypotheses to be satisfied, certainly contributed to the economic meltdown.) As I said earlier, the conclusion requires assumptions along the lines of independence of the keypresses and non-zero probability for each key, and what the experiment demonstrated most clearly is that these assumptions do not hold for real monkeys and typewriters.

*Nature* has a one-page science fiction story in each issue under the heading “Futures”. In a recent issue, the story Monkeys was based on the Paignton Zoo affair. I won’t give the plot away, but here is a hint.

Many years ago, NASA launched a space probe which was designed to leave the solar system after exploring the outer planets and head for interstellar space. The probe contained various artefacts, including a line drawing of a man and a woman. *New Scientist* ran a competition inviting readers to draw the alien that the man and woman would meet, and the conversation that would ensue. The winning entry went something like this:

**Man**: I see no intelligent life here.

**Woman**: I see no intelligent life either.

**Alien** (resembling a child’s drawing of a flower): *E* = *mc*^{2}.

I don’t know if I like the ‘infinite monkeys’ explanation for how infinity is special. For instance if you had a writing team consisting of Graham’s number of monkeys, it wouldn’t take very long for one of them to type out the complete works of Shakespeare. (Granted, it wouldn’t be certain in any specified time, but as soon as you give each monkey long enough to make the required number of key presses, it’s ‘beyond reasonable doubt’.) I see it more as an example of how combinatorics does better than most subjects at getting us to think of big numbers.

Yes, one of the puzzles is how this came to be a metaphor for infinity.

I think it’s an echo of ancient ideas about “eternal return” that trace back to the dim mists of pre-history in both Eastern and Western cultures. The (probably apocryphal) Pythagorean “horror of irrationality”, the Platonic Year, the Chinese Remainder Theorem, and so on.

A two-level formal language consists of a set of “words” and a set of “sentences”, where a word is defined as a finite sequence of symbols from a finite alphabet, and a sentence is defined as a finite sequence of words. The finite-state structure of a two-level formal language can be captured in a couple of finite-state transition trees, one for the words and one for the sentences. Recording data about the frequencies of node traversals allows us to compute relative probabilities and local entropies and many other statistics of interest, and the resulting structures can be used to predict likely completions of word and sentence fragments.

I spent a couple of my parallel lives in the 1980s developing an AI sort of program that combined faculties for data-driven empirical learning with faculties for concept-driven logical modeling. One of the things I learned along the way is that there appears to be a trade-off between these two modes of processing that makes it very difficult to integrate empiricist and rationalist faculties within a single intelligent agent. Explains a lot about the history of thought, I think.

Here is a pointer to what documentation I have on line —

• Theme One Program