期刊同行评审的可靠性综合研究：评分者间可靠性及其决定因素的多级元分析。

A reliability-generalization study of journal peer reviews: a multilevel meta-analysis of inter-rater reliability and its determinants.

机构信息

Max Planck Society, Munich, Germany.

出版信息

PLoS One. 2010 Dec 14;5(12):e14331. doi: 10.1371/journal.pone.0014331.

DOI:10.1371/journal.pone.0014331

PMID:21179459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3001856/

Abstract

BACKGROUND

This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree.

METHODOLOGY/PRINCIPAL FINDINGS: Altogether, 70 reliability coefficients (Cohen's Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2=.34, mean Cohen's Kappa=.17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed.

CONCLUSIONS/SIGNIFICANCE: Studies that report a high level of IRR are to be considered less credible than those with a low level of IRR. According to our meta-analysis the IRR of peer assessments is quite limited and needs improvement (e.g., reader system).

摘要

背景

本文是第一篇针对期刊同行评审者间可靠性（IRR）的荟萃分析。IRR 定义为两个或更多独立评审同一科学文献的一致性程度。

方法/主要发现：荟萃分析共纳入了 48 项研究中的 70 个可靠性系数（Cohen's Kappa、组内相关系数 [ICC] 和 Pearson 积矩相关系数 [r]）。这些研究基于总共 19443 篇手稿；平均而言，每项研究的样本量为 311 篇手稿（最小：28，最大：1983）。荟萃分析的结果证实了迄今为止发表的叙述性文献综述的发现：IRR 水平（平均 ICC/r2=.34，平均 Cohen's Kappa=.17）较低。为了解释 IRR 系数的研究间差异，使用七个协变量进行了元回归分析。元回归分析中出现的两个具有统计学意义的协变量表明，首先，研究基于的手稿越多，报告的 IRR 系数越小。其次，如果研究报告了评审员评分系统的信息，则与未传达信息相比，IRR 系数较小。

结论/意义：报告高 IRR 水平的研究被认为不如低 IRR 水平的研究更可信。根据我们的荟萃分析，同行评估的 IRR 相当有限，需要改进（例如，读者系统）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3db6/3001856/888394caab3f/pone.0014331.g001.jpg

相似文献

A reliability-generalization study of journal peer reviews: a multilevel meta-analysis of inter-rater reliability and its determinants.期刊同行评审的可靠性综合研究：评分者间可靠性及其决定因素的多级元分析。

PLoS One. 2010 Dec 14;5(12):e14331. doi: 10.1371/journal.pone.0014331.

Inter-rater reliability of AMSTAR is dependent on the pair of reviewers.AMSTAR的评分者间信度取决于评审者对。

BMC Med Res Methodol. 2017 Jul 11;17(1):98. doi: 10.1186/s12874-017-0380-y.

Scientific basis of the OCRA method for risk assessment of biomechanical overload of upper limb, as preferred method in ISO standards on biomechanical risk factors.OCRA 方法评估上肢生物力学过载风险的科学基础，作为 ISO 生物力学风险因素标准中的首选方法。

Scand J Work Environ Health. 2018 Jul 1;44(4):436-438. doi: 10.5271/sjweh.3746.

Editorial peer reviewers' recommendations at a general medical journal: are they reliable and do editors care?医学期刊编辑同行评议人的推荐：可靠吗？编辑会在意吗？

PLoS One. 2010 Apr 8;5(4):e10072. doi: 10.1371/journal.pone.0010072.

Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review.文献人工综述员间可靠性及对引入机器辅助系统综述的意义：混合方法综述。

BMJ Open. 2024 Mar 19;14(3):e076912. doi: 10.1136/bmjopen-2023-076912.

Inter- and intra-rater reliability for measurement of range of motion in joints included in three hypermobility assessment methods.三种关节过度活动评估方法中所包含关节活动范围测量的评分者间信度和评分者内信度。

BMC Musculoskelet Disord. 2018 Oct 17;19(1):376. doi: 10.1186/s12891-018-2290-5.

Inter-Rater Reliability and Intra-Rater Reliability of Assessing the 2-Minute Push-Up Test.评估两分钟俯卧撑测试的评分者间信度和评分者内信度。

Mil Med. 2016 Feb;181(2):167-72. doi: 10.7205/MILMED-D-14-00533.

Inter-rater reliability of risk of bias tools for non-randomized studies.偏倚风险评估工具在非随机研究中的评价者间信度。

Syst Rev. 2023 Dec 7;12(1):227. doi: 10.1186/s13643-023-02389-w.

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies.STARD 清单的可重复性：一种评估诊断准确性研究报告质量的工具。

BMC Med Res Methodol. 2006 Mar 15;6:12. doi: 10.1186/1471-2288-6-12.

Inter-rater reliability of the infectious disease modeling reproducibility checklist (IDMRC) as applied to COVID-19 computational modeling research.传染病建模再现性清单（IDMRC）在 COVID-19 计算建模研究中的评价者间可靠性。

BMC Infect Dis. 2023 Oct 27;23(1):733. doi: 10.1186/s12879-023-08729-4.

引用本文的文献

The present and future of peer review: Ideas, interventions, and evidence.同行评审的现状与未来：理念、干预措施及证据

Proc Natl Acad Sci U S A. 2025 Feb 4;122(5):e2401232121. doi: 10.1073/pnas.2401232121. Epub 2025 Jan 27.

The costs of competition in distributing scarce research funds.在分配稀缺研究资金方面竞争的成本。

Proc Natl Acad Sci U S A. 2024 Dec 10;121(50):e2407644121. doi: 10.1073/pnas.2407644121. Epub 2024 Dec 2.

Examining uncertainty in journal peer reviewers' recommendations: a cross-sectional study.审视期刊同行评审建议中的不确定性：一项横断面研究。

R Soc Open Sci. 2024 Sep 11;11(9):240612. doi: 10.1098/rsos.240612. eCollection 2024 Sep.

Structured peer review: pilot results from 23 Elsevier journals.结构化同行评审：来自 23 本爱思唯尔期刊的试点结果。

PeerJ. 2024 Jun 25;12:e17514. doi: 10.7717/peerj.17514. eCollection 2024.

How do authors' perceptions of their papers compare with co-authors' perceptions and peer-review decisions?作者对其论文的看法与合著者的看法和同行评审决定相比如何？

PLoS One. 2024 Apr 10;19(4):e0300710. doi: 10.1371/journal.pone.0300710. eCollection 2024.

Development and Validation of a Scoring Rubric for Editorial Evaluation of Peer-review Quality: A Pilot Study.同行评审质量编辑评估评分表的制定和验证：一项试点研究。

West J Emerg Med. 2024 Mar;25(2):254-263. doi: 10.5811/westjem.18432.

How to improve scientific peer review: Four schools of thought.如何改进科学同行评审：四种思想流派。

Learn Publ. 2023 Jul;36(3):334-347. doi: 10.1002/leap.1544. Epub 2023 Apr 27.

Older Adults' Perceptions About Using Intelligent Toilet Seats Beyond Traditional Care: Web-Based Interview Survey.老年人对超越传统护理使用智能马桶盖的看法：基于网络的访谈调查。

JMIR Mhealth Uhealth. 2023 Dec 1;11:e46430. doi: 10.2196/46430.

Citation counts and journal impact factors do not capture some indicators of research quality in the behavioural and brain sciences.被引频次和期刊影响因子并不能体现行为科学和脑科学领域某些研究质量指标。

R Soc Open Sci. 2022 Aug 17;9(8):220334. doi: 10.1098/rsos.220334. eCollection 2022 Aug.

Revisiting the Basic/Applied Science Distinction: The Significance of Urgent Science for Science Funding Policy.重新审视基础科学与应用科学的区别：应急科学对科学资助政策的意义。

J Gen Philos Sci. 2022;53(4):477-499. doi: 10.1007/s10838-021-09575-1. Epub 2022 Jan 28.

本文引用的文献

Peer Review Interrater Reliability of Scientific Abstracts: A Study of an Anesthesia Subspecialty Society.科学摘要的同行评审评分者间信度：一项关于麻醉亚专业协会的研究。

J Educ Perioper Med. 2005 Jul 1;7(2):E035. eCollection 2005 Jul-Dec.

Understanding heterogeneity in meta-analysis: the role of meta-regression.理解荟萃分析中的异质性：元回归的作用。

Int J Clin Pract. 2009 Oct;63(10):1426-34. doi: 10.1111/j.1742-1241.2009.02168.x.

Publication bias in clinical trials due to statistical significance or direction of trial results.由于试验结果的统计学显著性或方向导致的临床试验中的发表偏倚。

Cochrane Database Syst Rev. 2009 Jan 21;2009(1):MR000006. doi: 10.1002/14651858.MR000006.pub3.

The effectiveness of the peer review process: inter-referee agreement and predictive validity of manuscript refereeing at Angewandte Chemie.同行评审过程的有效性：《应用化学》稿件评审的评审员间一致性和预测效度

Angew Chem Int Ed Engl. 2008;47(38):7173-8. doi: 10.1002/anie.200800513.

Improving the peer-review process for grant applications: reliability, validity, bias, and generalizability.改进科研基金申请的同行评审过程：可靠性、有效性、偏差与普遍性。

Am Psychol. 2008 Apr;63(3):160-8. doi: 10.1037/0003-066X.63.3.160.

What is submitted and what gets accepted in Indian Pediatrics: analysis of submissions, review process, decision making, and criteria for rejection.在《印度儿科学》上提交的内容与被接受的内容：投稿分析、评审过程、决策制定及退稿标准

Indian Pediatr. 2006 Jun;43(6):479-89.

Random-effects model for meta-analysis of clinical trials: an update.临床试验荟萃分析的随机效应模型：最新进展

Contemp Clin Trials. 2007 Feb;28(2):105-14. doi: 10.1016/j.cct.2006.04.004. Epub 2006 May 12.

How masked is the "masked peer review" of abstracts submitted to international medical conferences?提交给国际医学会议的摘要的“盲审同行评议”有多隐蔽？

Mayo Clin Proc. 2006 May;81(5):705. doi: 10.4065/81.5.705.

An examination of sources of peer-review bias.同行评审偏见来源的考察。

Psychol Sci. 2006 May;17(5):378-82. doi: 10.1111/j.1467-9280.2006.01715.x.

Peer review interrater concordance of scientific abstracts: a study of anesthesiology subspecialty and component societies.科学摘要的同行评审评分者间一致性：麻醉学亚专业及组成学会的一项研究

Anesth Analg. 2006 May;102(5):1501-3. doi: 10.1213/01.ane.0000200314.73035.4d.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

期刊同行评审的可靠性综合研究：评分者间可靠性及其决定因素的多级元分析。

A reliability-generalization study of journal peer reviews: a multilevel meta-analysis of inter-rater reliability and its determinants.

机构信息

出版信息

BACKGROUND

背景

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献