To Glasgow on Monday for the 36th Fisher Memorial Lecture, Stephen Senn talking on “And thereby hangs a tail (the strange history of p-values)”.
p-values have taken a bit of a hammering lately. I understand that they are being blamed for the “crisis in irreproducibility” of scientific results; at least one journal has banned them; and even that fine ranter David Colquhoun has weighed in against them. In this debate, R. A. Fisher has been cast as the villain. Robert Matthews wrote in the popular press that “The plain fact is that 70 years ago Ronald Fisher gave scientists a mathematical machine for turning baloney into breakthroughs”. Some biographers delight in pointing out his feet of clay; we are all encouraged to reject his methods and become pure and honest Bayesians.
As you might expect (at least, for me, my prior probability on this would be quite high), things are not so straightforward.
Now I am not a statistician, and the lecture, though a delight as a presentation, did go too fast for me to be able to take notes, and I have not been able to find the PowerPoint on the web. So I apologise in advance if I have got things wrong.
What is the probability that the sun will rise tomorrow? Believe it or not, Bayesian statistics can calculate this. The argument might go back to Laplace. Starting with a prior that it is equally as likely or not, if I see the sun rise every day for m days, then I calculate the posterior probability that it will rise tomorrow to be (m+1)/(m+2), a satisfyingly high value.
However, the Cambridge philosopher C. D. Broad pointed out the flaw in this argument as a proof of a scientific theory. A very slight extension of this argument shows that the probability that the sun will rise every day for the next n days is (m+1)/(m+n+1), which is small if n is large, and indeed tends to zero as n goes to infinity. A five-year-old Bayesian (if there is such a thing) will be fairly sure that the sun will fail to rise one day before the end of his life.
[As a paranthetical remark, I read Broad’s book The Mind and its Place in Nature when I was much younger; it influenced my thinking, though at this remove I cannot really say how much.]
Anyway, Stephen Senn showed us extracts from the work of W. S. Gosset (“Student”) who clearly regarded the numbers that came out of his calculation as “the probability that the hypothesis is correct”.
Stephen showed us a very simple diagrammatic argument to show that the number that comes out of Gosset’s calculation (under a reasonable assumption on the prior, in Bayesian terms) is identical with Fisher’s p-value. To be slightly more precise, under assumptions of normality, if the prior assumption is that treatment B is better than treatment A, and an experiment shows that A beats B by two standard deviations, the probability that our assumption is correct drops to 5%. [I don’t really know what probability is, and I have absolutely no idea how you can assign probabilities to things like this.]
But Fisher’s interpretation is considerably more nuanced. Rather than talk about these meaningless(?) probabilities, he would say that, if it is true that there is no difference between the treatments, only one time in 20 would random experimental variation give a result as extreme as this. This is a much more realistic thing to say, in my opinion. (Fisher’s method also gives an added safety factor, the possibility of using a two-tailed test.)
So where is the problem? Is it that scientists simply don’t understand what Fisher said? It seems to me, contra Matthews, that a computing technique which gives you the probability that a scientific theory is correct is the real baloney machine! But what would I, a mere mathematician, know? (Though you may recall that here I did report a talk by Bollobás in which he computed the probability that his theorem was true; I was not the only audience member a bit worried by this! Perhaps a Fisherian interpretation would be more honest.)
Stephen Senn also remarked that the result of the Bayesian calculation depends crucially on the prior assumption; if a non-zero probability is given to the event that treatments A and B have the same effect, then the answer will be different. But I can’t say I followed the hints about this calculation that he gave us.
The conclusion he drew is that the current fuss is really a turf war between two camps of Bayesians, with Fisher caught in the firing line.
As I said, it was a blistering performance, and finished in good time for us to get to the drinks reception in the wonderful Glasgow City Chambers, where we were two months ago for the BCC drinks reception.