Academy of Finland, Research Council for Health, FI-00501 Helsinki, Finland.
J Clin Epidemiol. 2012 Jan;65(1):47-52. doi: 10.1016/j.jclinepi.2011.05.001. Epub 2011 Aug 9.
Peer review is the gold standard for evaluating scientific quality. Compared with studies on inter-reviewer variability, research on panel evaluation is scarce. To appraise the reliability of panel evaluations in grant review, we compared scores by two expert panels reviewing the same grant proposals. Our main interest was to evaluate whether panel discussion improves reliability.
Thirty reviewers were randomly allocated to one of the two panels. Sixty-five grant proposals in the fields of clinical medicine and epidemiology were reviewed by both panels. All reviewers received 5-12 proposals. Each proposal was evaluated by two reviewers, using a six-point scale. The reliability of reviewer and panel scores was evaluated using Cohen's kappa with linear weighting. In addition, reliability was also evaluated for the panel mean scores (mean of reviewer scores was used as panel score).
The proportion of large differences (at least two points) was 40% for reviewers in panel A, 36% for reviewers in panel B, 26% for the panel discussion scores, and 14% when the means of the two reviewer scores were used. The kappa for panel score after discussion was 0.23 (95% confidence interval: 0.08, 0.39). By using the mean of the reviewer scores, the panel coefficient was similarly 0.23 (0.00, 0.46).
The reliability between panel scores was higher than between reviewer scores. The similar interpanel reliability, when using the final panel score or the mean value of reviewer scores, indicates that panel discussions per se did not improve the reliability of the evaluation.
同行评议是评估科学质量的金标准。与研究评审员间可变性相比,对专家组评估的研究较少。为了评估专家组评审在资助评审中的可靠性,我们比较了两个专家小组对相同资助提案的评分。我们主要关注的是评估小组讨论是否能提高可靠性。
30 名评审员被随机分配到两个小组之一。两个小组对临床医学和流行病学领域的 65 个资助提案进行了审查。每位评审员收到 5-12 份提案。每个提案都由两名评审员使用六点量表进行评估。使用线性加权的 Cohen's kappa 评估评审员和小组评分的可靠性。此外,还评估了小组平均分数的可靠性(将评审员分数的平均值用作小组分数)。
在小组 A 中,评审员的大差异(至少两点)比例为 40%,小组 B 中的评审员为 36%,小组讨论评分的比例为 26%,当使用两个评审员评分的平均值时,比例为 14%。讨论后小组评分的 Kappa 值为 0.23(95%置信区间:0.08,0.39)。通过使用评审员评分的平均值,小组系数也同样为 0.23(0.00,0.46)。
小组评分之间的可靠性高于评审员评分之间的可靠性。当使用最终小组评分或评审员评分的平均值时,小组之间的相似可靠性表明,小组讨论本身并未提高评估的可靠性。