Letter to Stevan Harnad

Dear Dr Harnad,

You were kind enough to visit my blog yesterday and comment on my musings on citations, giving details of some of your papers on validation of metrics. I tried to read your paper on “Validating research performance metrics against peer review rankings”.

Unfortunately I choked on the first sentence of the second paragraph, where you claim that metrics provide objective, and peer review subjective, evaluation of research. (Your italics.)

Of course peer review is subjective (like everything we do, from marking students’ coursework to deciding their class of degree). However, to say that metrics give objective evaluation of research is like saying that phrenology (measuring bumps on the skull) provides objective evaluation of tendencies to criminality or lunacy.

In fact it is worse. Nobody can disagree with the actual measurement of a skull bump. But even your input data is subjective. A paper cannot be cited until it is published, and this depends on the subjective judgment of referees. Maybe it hasn’t happened to you, but most researchers can tell stories of good papers that languished because of bad referees’ reports. I had one which was never published, and the result was rediscovered more than thirty years after I had abandoned hope of getting it into print.

Not only that, but your criteria are not as objective as you make out, either. I urge you to read a paper by Adler, Ewing and Taylor (and several comments) on “Citation statistics” in the journal Statistical Science 24 (2009), 1-28. Professional statisticians have serious concerns which need to be addressed. I feel that you would agree with much that is said there.

One of my pet hates in our current Newspeak is the word “transparent”. Some people think it means “fair”. Others think that if they use it often enough they will persuade the rest of us that it does mean “fair”. Perhaps you should have said “transparent” rather than “objective” in the opening sentence of your paper. This touches on the real problem with metrics, of course: they become targets, as we have seen with school league tables.

Nobody should doubt that the bean-counters want to introduce metrics. We must address the issues now.

Peter Cameron

About these ads

About Peter Cameron

I count all the things that need to be counted.
This entry was posted in maybe politics. Bookmark the permalink.

5 Responses to Letter to Stevan Harnad

  1. Dear Peter (if I may),

    Sorry about the negative associations evoked by the “objective/subjective” distinction. Of course all conscious human behaviour is subjective, and when we “objectively” count either peer evaluations or citations we are merely objectivising the subjective.

    But I didn’t say “metrics give objective evaluation of research.” I said they need to be tested against subjective evaluations of research, to determine how well they correlate with (and hence predict) subjective evaluations of research.

    For surely you agree that if (mirabile dictu) it turned out there was a 99% correlation between a (cheap) metric and (expensive, time-consuming) peer evaluations, it might be a good idea to rely on (or at least consult) the cheap metric now and again, rather than the pricey peer evaluations?

    I don’t believe for a minute that one metric will have a correlation anywhere near as high as that — but a battery of multiple metrics might come closer…

    By the way, the same reasoning would have applied to phrenological bumps (my own speciality) — if they had indeed been cross-validated against neurological evidence about deficits or psychometric evidence about abilities. But of course the bumps would have failed the test miserably. (These days it is the objective data of neural imagery that some hope to stand in for the bumps, though the chances of success are not stellar.)

    I am no great friend of bean-counting either. But if beans are to be on the menu, they should at least be pre-tested for toxicity and nutritional value…

    I append two references to an analogy with Bradley that I’ve used before in this context. Perhaps worth reflecting upon.

    Stevan

    Harnad, S. (2009) Collini on “Impact on humanities” in Times Literary Supplement. 28 November 2009 http://openaccess.eprints.org/index.php?/archives/662-guid.html

    – “One can agree whole-heartedly with Professor Collini that much of the spirit and the letter of the RAE and the REF and their acronymous successors are wrong-headed and wasteful — while still holding that measures (“metrics”) of scholarly/scientific impact are not without some potential redeeming value, even in the Humanities. After all, even expert peer judgment, if expressed rather than merely silently mentalized, is measurable. (Bradley’s observation on the ineluctability of metaphysics applies just as aptly to metrics: ‘Show me someone who wishes to refute metaphysics and I’ll show you a metaphysician with a rival system.’)”

    Harnad, Stevan (2008) Open Access Book-Impact and “Demotic” Metrics. Open Access Archivangelism. 10 October 2008 http://openaccess.eprints.org/index.php?/archives/467-guid.html

    – “In ‘Appearance and Reality,’ Bradley (1897/2002) wrote (of Ayer) that ‘the man who is ready to prove that metaphysics is wholly impossible … is a brother metaphysician with a rival theory.

    – “Well, one might say the same of those who are skeptical about metrics: There are only two ways to measure the quality, importance or impact of a piece of work: Subjectively, by asking experts for their judgment (peer review: and then you have a polling metric!) or objectively, by counting objective data of various kinds. But of course counting and then declaring those counts “metrics” for some criterion or other, by fiat, is not enough. Those candidate metrics have to be validated against that criterion, either by showing that they correlate highly with the criterion, or that they correlate highly with an already validated correlate of the criterion. One natural criterion is expert judgment itself: peer review. Objective metrics can then be validated against peer review.

    Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35.
    http://www.ariadne.ac.uk/issue35/harnad/

  2. First, I am a theorist, and if there were a 99% correlation between metrics and peer review “scores”, I would want to know why, and if nobody could tell me why, I would assume the data was faked. This kind of correlation just doesn’t happen.
    Second, I think I believe that spinach is a better diet than beans; if you will permit the analogy, I think a lot of academics share my view, and until you find your mythical 99% correlation they are not going to be impressed.
    Finally, I hate to be the spectre at the feast, but metrics are very useful not just for saving money on research assessments [this is the real reason why the Treasury are pushing for them!] but also for behaviour modification, i.e. “encouraging” researchers to work on something more “useful” than what in their judgment is the best research they can do. I trust you do not think that is a good idea, but that is what your data validation will be used to support. HEFCE have said so clearly in their consultation document for the REF, and I have seen it in other places too.

  3. Pingback: Citation frustration « Since it is not …

    • This is an important point. And it is not unconnected with bean-counting, for two reasons.
      Firstly, these “automatic” references to big papers call into question the fairness of citations.
      Second and maybe worse, even when the papers are read by “experts”, nobody can be expert in everything, and a statement like the one you quote is likely to pass unchallenged as a sign of a “deep” paper whereas a clear explanation makes the whole thing look “trivial”.
      Once, when printing costs were high, journals encouraged that sort of brevity. Now in the electronic age there is no need for us to be slaves to that viewpoint any more. But we have to make a conscious effort to escape!

  4. (1) The correlation between predictive metrics and peer-rankings as criteria would not have to be quite as high (or suspicious) as .99 to be useful, and useable (or economical).

    (2) It is far harder to fudge or manipulate a large and diverse battery of independent metrics than a single metric. That is why it is important that the metric correlations with peer rankings should be jointly validated for a broad spectrum of multiple metrics.

    (3) The crude utilitarian criteria that some politicians have proposed a-priori have no face-validity either. Whether as predictive metrics or as criteria, they too need first to be tested and validated, among or against as rich and diverse a battery of metrics as we can muster — and, of course, peer rankings as criteria.

    The point about Bradley and metaphysics was that “spinach” (if you’ll pardon the miscegenated mataphor) will always turn out to be just another bean.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s