通过机器学习与众包相结合的方法识别随机对照试验（RCT）报告。

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

作者信息

Wallace Byron C, Noel-Storr Anna, Marshall Iain J, Cohen Aaron M, Smalheiser Neil R, Thomas James

机构信息

College of Computer and Information Science, Northeastern University, Boston MA, USA.

Radcliffe Department of Medicine, University of Oxford, Oxford, UK.

出版信息

J Am Med Inform Assoc. 2017 Nov 1;24(6):1165-1168. doi: 10.1093/jamia/ocx053.

DOI:10.1093/jamia/ocx053

PMID:28541493

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5975623/

Abstract

OBJECTIVES

Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML.

METHODS

We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise.

RESULTS

Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone.

CONCLUSIONS

Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.

摘要

目标

识别所有已发表的随机对照试验（RCT）报告是一项重要目标，但即使使用当前的机器学习（ML）方法，也需要大量人工努力才能将RCT与非RCT区分开来。我们旨在通过使用众包和ML的混合方法使这一过程更高效。

方法

我们训练了一个分类器，以区分描述RCT的引文和不描述RCT的引文。然后，我们采用了一种简单的策略，即自动排除分类器认为极不可能是RCT的引文，否则将其交给众包工作者处理。

结果

将ML与众包相结合提供了一种高度敏感的RCT识别策略（我们的估计表明召回率为95%-99%），与仅依靠人工筛选相比，工作量大大减少（我们观察到减少了约60%-80%）。

结论

混合众包-ML策略值得在生物医学编目/注释任务中进一步探索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d9f/5975623/87b1e93f03a5/ocx053f1.jpg

相似文献

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

J Am Med Inform Assoc. 2017 Nov 1;24(6):1165-1168. doi: 10.1093/jamia/ocx053.

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews.

J Clin Epidemiol. 2021 May;133:140-151. doi: 10.1016/j.jclinepi.2020.11.003. Epub 2020 Nov 7.

Citation screening using crowdsourcing and machine learning produced accurate results: Evaluation of Cochrane's modified Screen4Me service.

J Clin Epidemiol. 2021 Feb;130:23-31. doi: 10.1016/j.jclinepi.2020.09.024. Epub 2020 Sep 30.

Cochrane Centralised Search Service showed high sensitivity identifying randomized controlled trials: A retrospective analysis.

J Clin Epidemiol. 2020 Nov;127:142-150. doi: 10.1016/j.jclinepi.2020.08.008. Epub 2020 Aug 13.

An evaluation of Cochrane Crowd found that crowdsourcing produced accurate results in identifying randomized trials.

J Clin Epidemiol. 2021 May;133:130-139. doi: 10.1016/j.jclinepi.2021.01.006. Epub 2021 Jan 18.

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide.

Res Synth Methods. 2018 Dec;9(4):602-614. doi: 10.1002/jrsm.1287. Epub 2018 Feb 7.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Crowd-sourcing and automation facilitated the identification and classification of randomized controlled trials in a living review.

J Clin Epidemiol. 2023 Dec;164:1-8. doi: 10.1016/j.jclinepi.2023.10.007. Epub 2023 Oct 21.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

Artificial Intelligence and Automation in Evidence Synthesis: An Investigation of Methods Employed in Cochrane, Campbell Collaboration, and Environmental Evidence Reviews.

Cochrane Evid Synth Methods. 2025 Aug 28;3(5):e70046. doi: 10.1002/cesm.70046. eCollection 2025 Sep.

Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models.

ArXiv. 2025 Jun 3:arXiv:2506.03321v1.

Yoga for fatigue in people with cancer.

Cochrane Database Syst Rev. 2025 May 27;5(5):CD015520. doi: 10.1002/14651858.CD015520.

Publication Type Tagging using Transformer Models and Multi-Label Classification.

AMIA Annu Symp Proc. 2025 May 22;2024:818-827. eCollection 2024.

Vaccines for preventing infections in adults with haematological malignancies.

Cochrane Database Syst Rev. 2025 May 21;5(5):CD015530. doi: 10.1002/14651858.CD015530.pub2.

Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.

medRxiv. 2025 Apr 28:2025.04.23.25326300. doi: 10.1101/2025.04.23.25326300.

Clinical utility of limited channel sleep studies versus polysomnography for obstructive sleep apnoea.

Cochrane Database Syst Rev. 2025 May 6;5(5):CD013810. doi: 10.1002/14651858.CD013810.pub2.

Vaccines for preventing infections in adults with solid tumours.

Cochrane Database Syst Rev. 2025 Apr 16;4(4):CD015551. doi: 10.1002/14651858.CD015551.pub2.

Issues regarding the Indexing of Adaptive Clinical Trial Articles.

medRxiv. 2025 Mar 11:2025.03.10.25323694. doi: 10.1101/2025.03.10.25323694.

Publication Type Tagging using Transformer Models and Multi-Label Classification.

medRxiv. 2025 Mar 7:2025.03.06.25323516. doi: 10.1101/2025.03.06.25323516.

本文引用的文献

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records.

BMJ. 2012 Jan 3;344:d7501. doi: 10.1136/bmj.d7501.

Seventy-five trials and eleven systematic reviews a day: how will we ever keep up?

PLoS Med. 2010 Sep 21;7(9):e1000326. doi: 10.1371/journal.pmed.1000326.

Semi-automated screening of biomedical citations for systematic reviews.

BMC Bioinformatics. 2010 Jan 26;11:55. doi: 10.1186/1471-2105-11-55.

Retrieving randomized controlled trials from medline: a comparison of 38 published search filters.

Health Info Libr J. 2009 Sep;26(3):187-202. doi: 10.1111/j.1471-1842.2008.00827.x.

Reducing workload in systematic review preparation using automated citation classification.

J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19. doi: 10.1197/jamia.M1929. Epub 2005 Dec 15.

The Cochrane collaboration: preparing, maintaining, and disseminating systematic reviews of the effects of health care.

Ann N Y Acad Sci. 1993 Dec 31;703:156-63; discussion 163-5. doi: 10.1111/j.1749-6632.1993.tb26345.x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过机器学习与众包相结合的方法识别随机对照试验（RCT）报告。

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目标

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献