Department of Economics and Statistics, University of Siena, Siena, Italy.
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
PLoS One. 2020 Nov 18;15(11):e0242520. doi: 10.1371/journal.pone.0242520. eCollection 2020.
This paper analyzes the concordance between bibliometrics and peer review. It draws evidence from the data of two experiments of the Italian governmental agency for research evaluation. The experiments were performed by the agency for validating the adoption in the Italian research assessment exercises of a dual system of evaluation, where some outputs were evaluated by bibliometrics and others by peer review. The two experiments were based on stratified random samples of journal articles. Each article was scored by bibliometrics and by peer review. The degree of concordance between the two evaluations is then computed. The correct setting of the experiments is defined by developing the design-based estimation of the Cohen's kappa coefficient and some testing procedures for assessing the homogeneity of missing proportions between strata. The results of both experiments show that for each research areas of science, technology, engineering and mathematics the degree of agreement between bibliometrics and peer review is-at most-weak at an individual article level. Thus, the outcome of the experiments does not validate the use of the dual system of evaluation in the Italian research assessments. More in general, the very weak concordance indicates that metrics should not replace peer review at the level of individual article. Hence, the use of the dual system in a research assessment might worsen the quality of information compared to the adoption of peer review only or bibliometrics only.
本文分析了文献计量学和同行评议的一致性。它从意大利政府研究评估机构的两项实验数据中得出证据。这些实验是为了验证在意大利研究评估中采用双重评估体系而进行的,其中一些产出通过文献计量学进行评估,而另一些则通过同行评议进行评估。这两项实验是基于期刊文章的分层随机抽样进行的。每篇文章都通过文献计量学和同行评议进行评分。然后计算这两种评估之间的一致性程度。通过开发基于设计的 Cohen's kappa 系数估计和一些用于评估层间缺失比例同质性的测试程序,正确设置了实验。这两项实验的结果均表明,对于科学、技术、工程和数学的各个研究领域,文献计量学和同行评议之间的一致性程度——最多——在单个文章层面上是较弱的。因此,实验结果并没有验证在意大利研究评估中使用双重评估体系的合理性。更一般地说,非常弱的一致性表明,在单个文章层面上,指标不应该取代同行评议。因此,与仅采用同行评议或文献计量学相比,在研究评估中使用双重系统可能会降低信息质量。