Friday, July 1, 2016

is the referee process fair?

I'm at the Western Economic Association International annual conference in Portland and just saw a fascinating keynote address by David Card, on his work in progress with Stefano DellaVigna on "What Gets In" top economic journals. The paper is unfortunately not yet available online, so I can't excerpt or show any of the very interesting diagrams that go with the analysis, but in the meantime I can tell you to keep a (skeptical) eye out for it when it does get posted.

The paper aims to confirm or dispel the common belief that the editorial process is unfair because of some combination of three factors: 1) Referees aren't good at assessing quality, 2) the process is biased in favor of big name authors, and 3) the editors overweight their own priors relative to referee recommendations. The authors acquired data from QJE, JEEA, ReStud, and Restat* and looked at three stages of the editorial process in comparison to ex post citation rates (controlling for journal and time) as the measure of paper quality. The three stages of the referee process are 1) the decision to desk reject, 2) the decision to send the paper to a particular number of particular referees, and 3) the decision to reject or RR/accept after receiving the reports.

Bullet points:

  • Referees are good at assessing paper quality in the sense that their ratings (from 1-7; definitely reject, reject, no rating, weak R&R, R&R, strong R&R, accept) line up well with ex post citations.
  • Higher quality referees, measured either by citation counts or publication numbers in 35 top journals in the preceding 5 years (I can't remember which), aren't better at assessing paper quality.
  • Papers that are sent to a larger number of reviewers are cited more, so the number of reviewers is a proxy for the editor's prior belief about the paper.
  • Prolific authors (measured by publication numbers in 35 top journals in the preceding 5 years) get many more citations controlling for reviewer rating.
  • So do papers with more authors.
  • Editors increase the citations of published articles by publishing papers by prolific authors more often, conditional on reviewer rating, but they could go further and do even better.
  • Editors do not seem to take into account the number of authors, and could increase citations by publishing more of these articles.
  • Editors could also increase citations by putting larger weight on their own prior relative to reviewer ratings.

The conclusions David drew are that 1) referees are indeed good at assessing quality, 2) the process contains affirmative action for junior/less prolific authors, and 3) editors are not overconfident. Thus, the myth of unfairness is dispelled.

The assumption this story rests on is glaring and glaringly fragile: ex post citations is the relevant measure of paper quality when people assess whether papers are fairly treated.

From the perspective of editors, I completely understand why you would focus on citations. That's how your journal gains prominence. But as a scientist, what I want and what I believe is the gold standard for fairness is that papers are published and cited in proportion to their quality. Treating citation rates as quality assumes away half of the problem.

Are citation numbers just the best measure of quality that we're stuck with? Well I'm sure that was the reason for using it, and I'm sure citations are correlated with quality, but as they show, referee ratings are also correlated with citation numbers. Since the citation process is self-evidently biased in favor of prolific authors** (I'm sure you can prove this to yourself through introspection just as easily as I did), and since referees are several of a very small number of people who thoroughly study any given paper, it seems utterly bizarre that the former, and not the latter, would be treated as the primary proxy measure of quality (if the goal of the paper is in fact to assess fairness rather than to assess journal performance.)

If we consider referee ratings the better measure of quality, the conclusions exactly reverse and exactly confirm some of the common suspicions of the editorial process: 1) Citations are a good measure of quality but substantially biased in favor of prolific authors and multi-author papers, 2) editors are biased in favor of prolific authors, but not as much as citations are, and they are not biased in favor of multi-author papers, and 3) editors could reduce their bias by putting less weight on their personal priors.***

I do suspect citations are a better proxy for quality in the sense that they are less noisy (but more biased). I'm sure this noise is why people complain about the competence of referees, in fact. This does mean that saying a particular paper was treated unfairly based on the average of three wildly different referee ratings isn't going to be credible. But when we're looking at data from 30,000 paper submissions, the signal shines through the noise and bias is much more important to worry about.

~~~

*Iirc, which applies to the entire summary.

**and it certainly makes sense to me that it could be biased in favor of multi-author papers as well, since more authors are necessarily more in contact with potential citers. Then again it also makes sense to me that multi-author papers could be higher quality, since there are more eyes on every step of the process.

***I asked David about this at the end of the talk (and several people immediately thanked me for it), and he readily admitted the alternative interpretation. I appreciate that and don't wish to accuse him of any suspect interpretation of data when I can't even read the paper yet, but it's a point worth discussing even if the paper makes it much more clearly than he did in his talk.

Tuesday, May 17, 2016

a failure of inference

Some idiots put a baby bison in their car in Yellowstone National Park out of "misplaced concern" for its wellbeing. He imprinted on humans and cars so quickly that he could not be persuaded to rejoin its herd, and the herd rejected him as well, including his mother. The calf was causing a danger to cars in his insistence on returning to the road, and so for reasons detailed below, park staff were forced to euthanize the calf.

Cue 13,000 comments on Yellowstone's facebook page accusing them of being heartless murderers.

There are plenty of fact-based suggestions and objections to be made on both sides, and the NPS has responded to most of these comments with the form response "In order to ship the calf out of the park, it would have had to go through months of quarantine to be monitored for brucellosis. No approved quarantine facilities exist at this time, and we don't have the capacity to care for a calf that's too young to forage on its own. Nor is it the mission of the National Park Service to rescue animals: our goal is to maintain the ecological processes of Yellowstone. Even though humans were involved in this case, it is not uncommon for bison, especially young mothers, to lose or abandon their calves. Those animals typically die of starvation or predation."*

But that's beside the point. I don't have to know any of the facts involved in order to have an opinion on the matter, because of all people, the park service is staffed by the ones most likely to go to the end of the earth to care for wildlife, especially in this heartwrenching case of a baby calf rejected by its mother due to human interference. Not only do I know for certain that they are much better informed of the options and issues than I am, I know that they have infallible intentions when it comes to conservation as well. So, I don't even have to "trust" them to make the right decision (since "trust" connotes a leap of faith that the right thing will be done despite conflicting personal incentives), I can infer with high confidence that they will do, and did, the right thing. Because if there were any kinder option, I know the people involved would have wanted to take it.

I sure hope these 13,000 commentators aren't representative of humanity overall, because the signaling models I'm so fond of are doomed if they are. I know people underestimate the intentions of others when they disagree, but in this case everything lines up including intentions; there is no basis for doubt that the right thing was done.

*They probably could have left off the part about their mission, which is completely reasonable and accurate but doesn't help project a superficial image of compassion (emphasis on superficial).