• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

系统评价中医疗证据质量评估的自动化:模型开发和验证研究。

Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study.

机构信息

School of Computing and Information Systems, University of Melbourne, Melbourne, Australia.

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.

出版信息

J Med Internet Res. 2023 Mar 13;25:e35568. doi: 10.2196/35568.

DOI:10.2196/35568
PMID:36722350
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10131699/
Abstract

BACKGROUND

Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria.

OBJECTIVE

We proposed a quality assessment task that provides an overall quality rating for each body of evidence (BoE), as well as finer-grained justification for different quality criteria according to the Grading of Recommendation, Assessment, Development, and Evaluation formalization framework. For this purpose, we constructed a new data set and developed a machine learning baseline system (EvidenceGRADEr).

METHODS

We algorithmically extracted quality-related data from all summaries of findings found in the Cochrane Database of Systematic Reviews. Each BoE was defined by a set of population, intervention, comparison, and outcome criteria and assigned a quality grade (high, moderate, low, or very low) together with quality criteria (justification) that influenced that decision. Different statistical data, metadata about the review, and parts of the review text were extracted as support for grading each BoE. After pruning the resulting data set with various quality checks, we used it to train several neural-model variants. The predictions were compared against the labels originally assigned by the authors of the systematic reviews.

RESULTS

Our quality assessment data set, Cochrane Database of Systematic Reviews Quality of Evidence, contains 13,440 instances, or BoEs labeled for quality, originating from 2252 systematic reviews published on the internet from 2002 to 2020. On the basis of a 10-fold cross-validation, the best neural binary classifiers for quality criteria detected risk of bias at 0.78 F (P=.68; R=0.92) and imprecision at 0.75 F (P=.66; R=0.86), while the performance on inconsistency, indirectness, and publication bias criteria was lower (F in the range of 0.3-0.4). The prediction of the overall quality grade into 1 of the 4 levels resulted in 0.5 F. When casting the task as a binary problem by merging the Grading of Recommendation, Assessment, Development, and Evaluation classes (high+moderate vs low+very low-quality evidence), we attained 0.74 F. We also found that the results varied depending on the supporting information that is provided as an input to the models.

CONCLUSIONS

Different factors affect the quality of evidence in the context of systematic reviews of medical evidence. Some of these (risk of bias and imprecision) can be automated with reasonable accuracy. Other quality dimensions such as indirectness, inconsistency, and publication bias prove more challenging for machine learning, largely because they are much rarer. This technology could substantially reduce reviewer workload in the future and expedite quality assessment as part of evidence synthesis.

摘要

背景

评估网络上可用的医学证据的质量是系统评价准备工作的关键步骤。现有的自动化部分任务的工具可验证单个研究的质量,但不能验证整个证据体的质量,并且仅关注有限数量的质量标准。

目的

我们提出了一项质量评估任务,可为每个证据体(BoE)提供整体质量评级,并根据推荐评估发展和评估(Grading of Recommendation,Assessment,Development,and Evaluation)正式化框架,为不同的质量标准提供更细粒度的理由。为此,我们构建了一个新的数据集并开发了一个机器学习基线系统(EvidenceGRADEr)。

方法

我们从 Cochrane 系统评价数据库中的所有发现摘要中自动提取与质量相关的数据。每个 BoE 由一组人群、干预、比较和结局标准定义,并根据 Grading of Recommendation,Assessment,Development,and Evaluation 决策框架分配质量等级(高、中、低或极低)以及影响该决策的质量标准(理由)。提取了不同的统计数据、有关综述的元数据和部分综述文本,作为对每个 BoE 进行评分的依据。在使用各种质量检查对生成的数据进行修剪后,我们使用它来训练多个神经网络模型变体。将预测结果与系统评价作者最初分配的标签进行比较。

结果

我们的质量评估数据集 Cochrane 系统评价证据质量包含 13440 个实例,即根据 2002 年至 2020 年在互联网上发表的 2252 篇系统评价对质量进行了标记的 BoE。基于 10 折交叉验证,用于检测偏倚风险的最佳二元神经网络分类器的质量标准为 0.78 F(P=.68;R=0.92),用于检测不精确性的质量标准为 0.75 F(P=.66;R=0.86),而不一致性、间接性和发表偏倚标准的性能较低(F 值在 0.3 到 0.4 之间)。将整体质量等级预测为 4 个等级中的 1 个等级的结果为 0.5 F。当将任务作为二元问题处理,通过合并推荐评估发展和评估(Grading of Recommendation,Assessment,Development,and Evaluation)类别(高质量+中质量与低质量+极低质量证据)时,我们达到了 0.74 F。我们还发现,结果因提供给模型的支持信息而有所不同。

结论

在医学证据系统评价的背景下,不同因素会影响证据的质量。其中一些(偏倚风险和不精确性)可以通过合理的准确性进行自动化处理。其他质量维度,如间接性、不一致性和发表偏倚,对机器学习来说证明更具挑战性,这在很大程度上是因为它们更为罕见。这项技术未来可以大大减轻审查员的工作量,并加快证据综合过程中的质量评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3bbb4b4828ea/jmir_v25i1e35568_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3fa7b4d2be95/jmir_v25i1e35568_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3d3d57eae334/jmir_v25i1e35568_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/adcabe59f9fb/jmir_v25i1e35568_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/5ee2c96c8821/jmir_v25i1e35568_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/ce9469688b4e/jmir_v25i1e35568_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3bbb4b4828ea/jmir_v25i1e35568_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3fa7b4d2be95/jmir_v25i1e35568_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3d3d57eae334/jmir_v25i1e35568_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/adcabe59f9fb/jmir_v25i1e35568_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/5ee2c96c8821/jmir_v25i1e35568_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/ce9469688b4e/jmir_v25i1e35568_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb18/10131699/3bbb4b4828ea/jmir_v25i1e35568_fig6.jpg

相似文献

1
Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study.系统评价中医疗证据质量评估的自动化:模型开发和验证研究。
J Med Internet Res. 2023 Mar 13;25:e35568. doi: 10.2196/35568.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
4
RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.机器人评审员:用于自动评估临床试验偏倚的系统评估
J Am Med Inform Assoc. 2016 Jan;23(1):193-201. doi: 10.1093/jamia/ocv044. Epub 2015 Jun 22.
5
The Effectiveness of Integrated Care Pathways for Adults and Children in Health Care Settings: A Systematic Review.综合护理路径在医疗环境中对成人和儿童的有效性:一项系统评价。
JBI Libr Syst Rev. 2009;7(3):80-129. doi: 10.11124/01938924-200907030-00001.
6
Applying Grading of Recommendations Assessment, Development and Evaluation (GRADE) to diagnostic tests was challenging but doable.将推荐评估、制定与评估(GRADE)应用于诊断测试具有一定挑战性,但并非不可行。
J Clin Epidemiol. 2014 Jul;67(7):760-8. doi: 10.1016/j.jclinepi.2014.01.006. Epub 2014 Apr 13.
7
Interventions for escalation of therapy for acute exacerbations of asthma in children: an overview of Cochrane Reviews.儿童哮喘急性加重期治疗升级的干预措施:Cochrane系统评价综述
Cochrane Database Syst Rev. 2020 Aug 5;8(8):CD012977. doi: 10.1002/14651858.CD012977.pub2.
8
Assessing the quality of evidence in studies estimating prevalence of exposure to occupational risk factors: The QoE-SPEO approach applied in the systematic reviews from the WHO/ILO Joint Estimates of the Work-related burden of disease and Injury.评估估计职业风险因素暴露患病率研究中的证据质量:在世界卫生组织/国际劳工组织联合估计与工作相关的疾病和伤害负担的系统评价中应用的 QoE-SPEO 方法。
Environ Int. 2022 Mar;161:107136. doi: 10.1016/j.envint.2022.107136. Epub 2022 Feb 16.
9
Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system.自动化系统评价中的偏倚风险评估:人类研究人员与机器学习系统的实时混合方法比较。
BMC Med Res Methodol. 2022 Jun 8;22(1):167. doi: 10.1186/s12874-022-01649-y.
10
Transcutaneous electrical nerve stimulation (TENS) for chronic pain - an overview of Cochrane Reviews.经皮电刺激神经疗法(TENS)治疗慢性疼痛——Cochrane系统评价概述
Cochrane Database Syst Rev. 2019 Feb 19;2(2):CD011890. doi: 10.1002/14651858.CD011890.pub2.

引用本文的文献

1
Accelerating clinical evidence synthesis with large language models.利用大语言模型加速临床证据综合分析
NPJ Digit Med. 2025 Aug 8;8(1):509. doi: 10.1038/s41746-025-01840-7.
2
Effects of acupuncture and moxibustion on ulcerative colitis: An overview of systematic reviews.针灸对溃疡性结肠炎的影响:系统评价概述
Heliyon. 2024 Mar 8;10(6):e27524. doi: 10.1016/j.heliyon.2024.e27524. eCollection 2024 Mar 30.
3
PubMed and beyond: biomedical literature search in the age of artificial intelligence.PubMed 及其以外:人工智能时代的生物医学文献检索。

本文引用的文献

1
Toward systematic review automation: a practical guide to using machine learning tools in research synthesis.迈向系统评价自动化:在研究综合中使用机器学习工具的实用指南。
Syst Rev. 2019 Jul 11;8(1):163. doi: 10.1186/s13643-019-1074-9.
2
Machine learning to help researchers evaluate biases in clinical trials: a prospective, randomized user study.机器学习帮助研究人员评估临床试验中的偏倚:一项前瞻性、随机用户研究。
BMC Med Inform Decis Mak. 2019 May 8;19(1):96. doi: 10.1186/s12911-019-0814-z.
3
Mediterranean-style diet for the primary and secondary prevention of cardiovascular disease.
EBioMedicine. 2024 Feb;100:104988. doi: 10.1016/j.ebiom.2024.104988. Epub 2024 Feb 1.
用于心血管疾病一级和二级预防的地中海式饮食。
Cochrane Database Syst Rev. 2019 Mar 13;3(3):CD009825. doi: 10.1002/14651858.CD009825.pub3.
4
A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.一种从生物医学文献中自动识别科学严谨的临床研究报告的深度学习方法:比较分析研究。
J Med Internet Res. 2018 Jun 25;20(6):e10281. doi: 10.2196/10281.
5
Are systematic reviews and meta-analyses still useful research? We are not sure.系统评价和荟萃分析仍然是有用的研究吗?我们不确定。
Intensive Care Med. 2018 Apr;44(4):518-520. doi: 10.1007/s00134-017-5039-y. Epub 2018 Apr 16.
6
High quality of evidence is uncommon in Cochrane systematic reviews in Anaesthesia, Critical Care and Emergency Medicine.在麻醉、重症监护和急诊医学领域的 Cochrane 系统评价中,高质量证据并不常见。
Eur J Anaesthesiol. 2017 Dec;34(12):808-813. doi: 10.1097/EJA.0000000000000691.
7
Automating Biomedical Evidence Synthesis: RobotReviewer.生物医学证据合成自动化:机器人审阅者
Proc Conf Assoc Comput Linguist Meet. 2017 Jul;2017:7-12. doi: 10.18653/v1/P17-4002.
8
Context-aware grading of quality evidences for evidence-based decision-making.基于上下文的质量证据分级,以支持循证决策。
Health Informatics J. 2019 Jun;25(2):429-445. doi: 10.1177/1460458217719560. Epub 2017 Aug 2.
9
Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.利用PROSPERO注册库的数据,分析对医学干预措施进行系统评价所需的时间和人员。
BMJ Open. 2017 Feb 27;7(2):e012545. doi: 10.1136/bmjopen-2016-012545.
10
Rationale-Augmented Convolutional Neural Networks for Text Classification.用于文本分类的基于原理增强的卷积神经网络。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:795-804. doi: 10.18653/v1/d16-1076.