Friday, July 1, 2016

is the referee process fair?

I'm at the Western Economic Association International annual conference in Portland and just saw a fascinating keynote address by David Card, on his work in progress with Stefano DellaVigna on "What Gets In" top economic journals. The paper is unfortunately not yet available online, so I can't excerpt or show any of the very interesting diagrams that go with the analysis, but in the meantime I can tell you to keep a (skeptical) eye out for it when it does get posted.

The paper aims to confirm or dispel the common belief that the editorial process is unfair because of some combination of three factors: 1) Referees aren't good at assessing quality, 2) the process is biased in favor of big name authors, and 3) the editors overweight their own priors relative to referee recommendations. The authors acquired data from QJE, JEEA, ReStud, and Restat* and looked at three stages of the editorial process in comparison to ex post citation rates (controlling for journal and time) as the measure of paper quality. The three stages of the referee process are 1) the decision to desk reject, 2) the decision to send the paper to a particular number of particular referees, and 3) the decision to reject or RR/accept after receiving the reports.

Bullet points:

  • Referees are good at assessing paper quality in the sense that their ratings (from 1-7; definitely reject, reject, no rating, weak R&R, R&R, strong R&R, accept) line up well with ex post citations.
  • Higher quality referees, measured either by citation counts or publication numbers in 35 top journals in the preceding 5 years (I can't remember which), aren't better at assessing paper quality.
  • Papers that are sent to a larger number of reviewers are cited more, so the number of reviewers is a proxy for the editor's prior belief about the paper.
  • Prolific authors (measured by publication numbers in 35 top journals in the preceding 5 years) get many more citations controlling for reviewer rating.
  • So do papers with more authors.
  • Editors increase the citations of published articles by publishing papers by prolific authors more often, conditional on reviewer rating, but they could go further and do even better.
  • Editors do not seem to take into account the number of authors, and could increase citations by publishing more of these articles.
  • Editors could also increase citations by putting larger weight on their own prior relative to reviewer ratings.

The conclusions David drew are that 1) referees are indeed good at assessing quality, 2) the process contains affirmative action for junior/less prolific authors, and 3) editors are not overconfident. Thus, the myth of unfairness is dispelled.

The assumption this story rests on is glaring and glaringly fragile: ex post citations is the relevant measure of paper quality when people assess whether papers are fairly treated.

From the perspective of editors, I completely understand why you would focus on citations. That's how your journal gains prominence. But as a scientist, what I want and what I believe is the gold standard for fairness is that papers are published and cited in proportion to their quality. Treating citation rates as quality assumes away half of the problem.

Are citation numbers just the best measure of quality that we're stuck with? Well I'm sure that was the reason for using it, and I'm sure citations are correlated with quality, but as they show, referee ratings are also correlated with citation numbers. Since the citation process is self-evidently biased in favor of prolific authors** (I'm sure you can prove this to yourself through introspection just as easily as I did), and since referees are several of a very small number of people who thoroughly study any given paper, it seems utterly bizarre that the former, and not the latter, would be treated as the primary proxy measure of quality (if the goal of the paper is in fact to assess fairness rather than to assess journal performance.)

If we consider referee ratings the better measure of quality, the conclusions exactly reverse and exactly confirm some of the common suspicions of the editorial process: 1) Citations are a good measure of quality but substantially biased in favor of prolific authors and multi-author papers, 2) editors are biased in favor of prolific authors, but not as much as citations are, and they are not biased in favor of multi-author papers, and 3) editors could reduce their bias by putting less weight on their personal priors.***

I do suspect citations are a better proxy for quality in the sense that they are less noisy (but more biased). I'm sure this noise is why people complain about the competence of referees, in fact. This does mean that saying a particular paper was treated unfairly based on the average of three wildly different referee ratings isn't going to be credible. But when we're looking at data from 30,000 paper submissions, the signal shines through the noise and bias is much more important to worry about.


*Iirc, which applies to the entire summary.

**and it certainly makes sense to me that it could be biased in favor of multi-author papers as well, since more authors are necessarily more in contact with potential citers. Then again it also makes sense to me that multi-author papers could be higher quality, since there are more eyes on every step of the process.

***I asked David about this at the end of the talk (and several people immediately thanked me for it), and he readily admitted the alternative interpretation. I appreciate that and don't wish to accuse him of any suspect interpretation of data when I can't even read the paper yet, but it's a point worth discussing even if the paper makes it much more clearly than he did in his talk.


Anonymous said...

Referees "thoroughly study any given paper"?

One time in four, I would guess.

The only sensible attitude is perpetual skepticism, way past publication, without bitterness. It is what it is.

Anonymous said...

What about incentives? Did economists go from studying economics (=incentives) to studying everything, to studying everything but economics? The less weight editors put on referee reports (and the more they put on external factors, like authors' reputation, affiliation, and number, the more moral hazard we have. Why should authors work hard to improve the paper if the marginal effect on publication is small?

Anonymous said...

Did he control for the number of citations the papers had before submission? It seems to me a huge source of bias. Big names get those, and editors can maximize citation count by letting these in easy.

Vera said...

That's an interesting point! I wish I knew the answer.

Michael E. Rose said...

Thanks a lot for this information - it's good to see studies on the science of science are making it more and more on the big conferences! We really need to think about our profession and the way we do science more often.

You're also absolutely right in the alternative interpretations: When you evaluate science, you always have to make the assumption how to measure quality. There are citations or few expert opinions, but both are sometimes contradicting and offer alternative interpretations. There is even a third possibility, namely to judge science based on opinion of many experts, which is what happens when papers are awarded prices or in funding decisions... Additionally, there are alt-metrics, i.e. media coverage or impact on social networks and wikipedia.

Maybe we should go away from calling it "quality" what we want to measure? Rather, citations quantify scientific impact/influence, while journal reputation proxies relevance, but none truly speaks about quality.