• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化系统评价中的偏倚风险评估:人类研究人员与机器学习系统的实时混合方法比较。

Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system.

机构信息

Division for Health Services, Norwegian Institute of Public Health, Postboks 222 Skøyen, 0213, Oslo, Norway.

Facultad de Cultura Física, Deporte y Recreación, Cra. 9 #51-11, Bogotá, Colombia.

出版信息

BMC Med Res Methodol. 2022 Jun 8;22(1):167. doi: 10.1186/s12874-022-01649-y.

DOI:10.1186/s12874-022-01649-y
PMID:35676632
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9174024/
Abstract

BACKGROUND

Machine learning and automation are increasingly used to make the evidence synthesis process faster and more responsive to policymakers' needs. In systematic reviews of randomized controlled trials (RCTs), risk of bias assessment is a resource-intensive task that typically requires two trained reviewers. One function of RobotReviewer, an off-the-shelf machine learning system, is an automated risk of bias assessment.

METHODS

We assessed the feasibility of adopting RobotReviewer within a national public health institute using a randomized, real-time, user-centered study. The study included 26 RCTs and six reviewers from two projects examining health and social interventions. We randomized these studies to one of two RobotReviewer platforms. We operationalized feasibility as accuracy, time use, and reviewer acceptability. We measured accuracy by the number of corrections made by human reviewers (either to automated assessments or another human reviewer's assessments). We explored acceptability through group discussions and individual email responses after presenting the quantitative results.

RESULTS

Reviewers were equally likely to accept judgment by RobotReviewer as each other's judgement during the consensus process when measured dichotomously; risk ratio 1.02 (95% CI 0.92 to 1.13; p = 0.33). We were not able to compare time use. The acceptability of the program by researchers was mixed. Less experienced reviewers were generally more positive, and they saw more benefits and were able to use the tool more flexibly. Reviewers positioned human input and human-to-human interaction as superior to even a semi-automation of this process.

CONCLUSION

Despite being presented with evidence of RobotReviewer's equal performance to humans, participating reviewers were not interested in modifying standard procedures to include automation. If further studies confirm equal accuracy and reduced time compared to manual practices, we suggest that the benefits of RobotReviewer may support its future implementation as one of two assessors, despite reviewer ambivalence. Future research should study barriers to adopting automated tools and how highly educated and experienced researchers can adapt to a job market that is increasingly challenged by new technologies.

摘要

背景

机器学习和自动化技术越来越多地被用于加快证据综合过程,并使其更能满足政策制定者的需求。在随机对照试验(RCT)的系统评价中,偏倚风险评估是一项资源密集型任务,通常需要两名经过培训的评审员。RobotReviewer 是一种现成的机器学习系统,其功能之一是自动进行偏倚风险评估。

方法

我们使用一项随机、实时、以用户为中心的研究,评估了在一家国家公共卫生机构中采用 RobotReviewer 的可行性。该研究包括来自两个项目的六名评审员,这两个项目分别评估了卫生和社会干预措施的 26 项 RCT。我们将这些研究随机分配到两个 RobotReviewer 平台之一。我们将可行性定义为准确性、时间使用和评审员的可接受性。我们通过人类评审员(对自动评估或另一位人类评审员的评估进行更正的数量)来衡量准确性。我们通过小组讨论和呈现定量结果后的个人电子邮件回复来探索可接受性。

结果

在共识过程中,评审员在二分法测量时,对 RobotReviewer 的判断与对彼此判断的接受程度相同;风险比 1.02(95%置信区间 0.92 至 1.13;p=0.33)。我们无法比较时间使用。研究人员对该程序的接受程度不一。经验较少的评审员通常更为积极,他们认为该程序具有更多的益处,并且能够更灵活地使用该工具。评审员认为人工输入和人际互动优于该过程的半自动化。

结论

尽管评审员看到了 RobotReviewer 与人类表现相当的证据,但他们对修改标准程序以纳入自动化并不感兴趣。如果进一步的研究证实其准确性与手动操作相当,并且时间更短,我们建议尽管评审员存在矛盾情绪,但 RobotReviewer 的优势可能支持将其作为两名评估员之一的未来实施。未来的研究应研究采用自动化工具的障碍,以及受过高等教育和经验丰富的研究人员如何适应新技术日益挑战的就业市场。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/8669c268e817/12874_2022_1649_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/621c9904dbce/12874_2022_1649_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/cb70d7a1b840/12874_2022_1649_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/b526d41813d6/12874_2022_1649_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/8669c268e817/12874_2022_1649_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/621c9904dbce/12874_2022_1649_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/cb70d7a1b840/12874_2022_1649_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/b526d41813d6/12874_2022_1649_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b0/9175313/8669c268e817/12874_2022_1649_Fig4_HTML.jpg

相似文献

1
Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system.自动化系统评价中的偏倚风险评估:人类研究人员与机器学习系统的实时混合方法比较。
BMC Med Res Methodol. 2022 Jun 8;22(1):167. doi: 10.1186/s12874-022-01649-y.
2
Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials.比较机器和人工评审员评估随机对照试验偏倚风险。
Res Synth Methods. 2020 May;11(3):484-493. doi: 10.1002/jrsm.1398. Epub 2020 Mar 3.
3
Agreement in Risk of Bias Assessment Between RobotReviewer and Human Reviewers: An Evaluation Study on Randomised Controlled Trials in Nursing-Related Cochrane Reviews.机器人评估者与人工评估者在偏倚风险评估中的一致性:一项针对 Cochrane 护理相关综述中随机对照试验的评估研究。
J Nurs Scholarsh. 2021 Mar;53(2):246-254. doi: 10.1111/jnu.12628. Epub 2021 Feb 8.
4
Accuracy and Efficiency of Machine Learning-Assisted Risk-of-Bias Assessments in "Real-World" Systematic Reviews : A Noninferiority Randomized Controlled Trial.机器学习辅助“真实世界”系统评价偏倚风险评估的准确性和效率:一项非劣效性随机对照试验。
Ann Intern Med. 2022 Jul;175(7):1001-1009. doi: 10.7326/M22-0092. Epub 2022 May 31.
5
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
6
Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool.技术辅助的系统评价偏倚风险评估:对 RobotReviewer 机器学习工具的前瞻性横断面评估。
J Clin Epidemiol. 2018 Apr;96:54-62. doi: 10.1016/j.jclinepi.2017.12.015. Epub 2017 Dec 28.
7
Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study.机器学习帮助研究人员评估临床试验中的偏倚:一项前瞻性、随机用户研究。
BMC Med Inform Decis Mak. 2019 May 8;19(1):96. doi: 10.1186/s12911-019-0814-z.
8
Towards the automatic risk of bias assessment on randomized controlled trials: A comparison of RobotReviewer and humans.迈向随机对照试验自动偏倚风险评估:RobotReviewer 与人类的比较。
Res Synth Methods. 2024 Nov;15(6):1111-1119. doi: 10.1002/jrsm.1761. Epub 2024 Sep 26.
9
RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.机器人评审员:用于自动评估临床试验偏倚的系统评估
J Am Med Inform Assoc. 2016 Jan;23(1):193-201. doi: 10.1093/jamia/ocv044. Epub 2015 Jun 22.
10
Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study.系统评价中医疗证据质量评估的自动化:模型开发和验证研究。
J Med Internet Res. 2023 Mar 13;25:e35568. doi: 10.2196/35568.

引用本文的文献

1
Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.使用大语言模型(ChatGPT)评估医学干预随机对照试验中的偏倚风险:与人类评审员进行评分者间一致性的初步研究方案
BMC Med Res Methodol. 2025 Jul 31;25(1):182. doi: 10.1186/s12874-025-02631-0.
2
Artificial Intelligence and Machine Learning to Improve Evidence Synthesis Production Efficiency: An Observational Study of Resource Use and Time-to-Completion.人工智能与机器学习助力提高证据综合生成效率:资源利用与完成时间的观察性研究
Cochrane Evid Synth Methods. 2025 May 19;3(3):e70030. doi: 10.1002/cesm.70030. eCollection 2025 May.
3

本文引用的文献

1
Machine learning in systematic reviews: Comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review.系统评价中的机器学习:在快速综述中比较 Lingo3G 自动化文本聚类与人工研究者分类
Res Synth Methods. 2022 Mar;13(2):229-241. doi: 10.1002/jrsm.1541. Epub 2021 Dec 22.
2
Using neural networks to support high-quality evidence mapping.利用神经网络支持高质量证据图谱绘制。
BMC Bioinformatics. 2021 Oct 21;22(Suppl 11):496. doi: 10.1186/s12859-021-04396-x.
3
Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow.
Digital Tools to Support the Systematic Review Process: An Introduction.
支持系统评价过程的数字工具:简介
J Eval Clin Pract. 2025 Apr;31(3):e70100. doi: 10.1111/jep.70100.
4
Testing the utility of GPT for title and abstract screening in environmental systematic evidence synthesis.测试GPT在环境系统证据综合中用于标题和摘要筛选的效用。
Environ Evid. 2025 Apr 23;14(1):7. doi: 10.1186/s13750-025-00360-x.
5
Which specific modes of exercise training are most effective for breast related cancer fatigue? Network meta-analysis.哪种特定的运动训练模式对乳腺癌相关疲劳最有效?网状Meta分析。
Front Oncol. 2025 Feb 26;15:1491634. doi: 10.3389/fonc.2025.1491634. eCollection 2025.
6
Large language models, updates, and evaluation of automation tools for systematic reviews: a summary of significant discussions at the eighth meeting of the International Collaboration for the Automation of Systematic Reviews (ICASR).大型语言模型、更新和系统评价自动化工具的评估:第八届国际系统评价自动化协作(ICASR)会议的重要讨论总结。
Syst Rev. 2024 Nov 27;13(1):290. doi: 10.1186/s13643-024-02666-2.
7
COVID-19 vaccine evidence monitoring assisted by artificial Intelligence: An emergency system implemented by the Public Health Agency of Canada to capture and describe the trajectory of evolving pandemic vaccine literature.人工智能辅助的COVID-19疫苗证据监测:加拿大公共卫生署实施的一个应急系统,用于获取和描述不断演变的大流行疫苗文献的轨迹。
Vaccine X. 2024 Oct 24;21:100575. doi: 10.1016/j.jvacx.2024.100575. eCollection 2024 Dec.
8
An exploration of available methods and tools to improve the efficiency of systematic review production: a scoping review.探索提高系统评价制作效率的可用方法和工具:范围综述。
BMC Med Res Methodol. 2024 Sep 18;24(1):210. doi: 10.1186/s12874-024-02320-4.
9
Conducting two evidence syntheses in six weeks - experiences with and evaluation of a pilot project.在六周内进行两项证据综合 - 试点项目的经验和评估。
BMC Med Res Methodol. 2024 Sep 16;24(1):208. doi: 10.1186/s12874-024-02334-y.
10
A case study of the informative value of risk of bias and reporting quality assessments for systematic reviews.一项关于偏倚风险和报告质量评估对系统评价信息价值的案例研究。
Syst Rev. 2024 Sep 7;13(1):230. doi: 10.1186/s13643-024-02650-w.
文本挖掘支持知识综合的摘要筛选:一种半自动化工作流程。
Syst Rev. 2021 May 26;10(1):156. doi: 10.1186/s13643-021-01700-x.
4
Agreement in Risk of Bias Assessment Between RobotReviewer and Human Reviewers: An Evaluation Study on Randomised Controlled Trials in Nursing-Related Cochrane Reviews.机器人评估者与人工评估者在偏倚风险评估中的一致性:一项针对 Cochrane 护理相关综述中随机对照试验的评估研究。
J Nurs Scholarsh. 2021 Mar;53(2):246-254. doi: 10.1111/jnu.12628. Epub 2021 Feb 8.
5
The views of health guideline developers on the use of automation in health evidence synthesis.健康指南制定者对自动化在健康证据综合中的使用的看法。
Syst Rev. 2021 Jan 8;10(1):16. doi: 10.1186/s13643-020-01569-2.
6
Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews.机器学习减少了工作量,同时最小化了漏检研究的风险:一项用于 Cochrane 综述的随机对照试验分类器的开发和评估。
J Clin Epidemiol. 2021 May;133:140-151. doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.
7
The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application.修订后的 Cochrane 随机对照试验偏倚风险工具(RoB 2)显示出较低的评分者间可靠性和应用方面的挑战。
J Clin Epidemiol. 2020 Oct;126:37-44. doi: 10.1016/j.jclinepi.2020.06.015. Epub 2020 Jun 18.
8
RoB 2: a revised tool for assessing risk of bias in randomised trials.《随机对照试验偏倚风险评估工具2:修订版》
BMJ. 2019 Aug 28;366:l4898. doi: 10.1136/bmj.l4898.
9
Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study.机器学习帮助研究人员评估临床试验中的偏倚:一项前瞻性、随机用户研究。
BMC Med Inform Decis Mak. 2019 May 8;19(1):96. doi: 10.1186/s12911-019-0814-z.
10
Purposive sampling in a qualitative evidence synthesis: a worked example from a synthesis on parental perceptions of vaccination communication.目的抽样在定性证据综合中的应用:来自父母对疫苗接种沟通感知综合研究的一个实例。
BMC Med Res Methodol. 2019 Jan 31;19(1):26. doi: 10.1186/s12874-019-0665-4.