• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Collaborative large language models for automated data extraction in living systematic reviews.用于活体系统评价中自动数据提取的协作式大语言模型
J Am Med Inform Assoc. 2025 Apr 1;32(4):638-647. doi: 10.1093/jamia/ocae325.
2
Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews.用于实时系统评价中自动数据提取的协作式大语言模型
medRxiv. 2024 Sep 23:2024.09.20.24314108. doi: 10.1101/2024.09.20.24314108.
3
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
6
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.拓扑替康治疗卵巢癌的临床有效性和成本效益的快速系统评价。
Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280.
7
Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.纤维素、改性纤维素和合成膜在终末期肾病患者血液透析中的比较。
Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.
8
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
9
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
10
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.

引用本文的文献

1
Artificial intelligence across the cancer care continuum.贯穿癌症护理全过程的人工智能
Cancer. 2025 Aug 15;131(16):e70050. doi: 10.1002/cncr.70050.
2
Using Artificial Intelligence Tools as Second Reviewers for Data Extraction in Systematic Reviews: A Performance Comparison of Two AI Tools Against Human Reviewers.使用人工智能工具作为系统评价中数据提取的二审人员:两种人工智能工具与人类评审员的性能比较
Cochrane Evid Synth Methods. 2025 Jul 14;3(4):e70036. doi: 10.1002/cesm.70036. eCollection 2025 Jul.
3
Harnessing the power of large language models for clinical tasks and synthesis of scientific literature.利用大语言模型的能力来完成临床任务和综合科学文献。
J Am Med Inform Assoc. 2025 Jun 1;32(6):983-984. doi: 10.1093/jamia/ocaf071.
4
Systemic treatment options for metastatic castration resistant prostate cancer: A living systematic review.转移性去势抵抗性前列腺癌的全身治疗选择:一项实时系统评价。
medRxiv. 2025 Apr 16:2025.04.15.25325837. doi: 10.1101/2025.04.15.25325837.
5
Advancing the application and evaluation of large language models in health and biomedicine.推进大语言模型在健康与生物医学领域的应用与评估。
J Am Med Inform Assoc. 2025 Apr 1;32(4):603-604. doi: 10.1093/jamia/ocaf043.

本文引用的文献

1
Performance of two large language models for data extraction in evidence synthesis.两种大型语言模型在证据综合数据提取中的性能比较。
Res Synth Methods. 2024 Sep;15(5):818-824. doi: 10.1002/jrsm.1732. Epub 2024 Jun 19.
2
Data extraction for evidence synthesis using a large language model: A proof-of-concept study.使用大型语言模型进行证据综合的数据提取:概念验证研究。
Res Synth Methods. 2024 Jul;15(4):576-589. doi: 10.1002/jrsm.1710. Epub 2024 Mar 3.
3
Therapy for Stage IV Non-Small Cell Lung Cancer Without Driver Alterations: ASCO Living Guideline, Version 2023.3.无驱动基因改变的IV期非小细胞肺癌的治疗:美国临床肿瘤学会生存指南,2023.3版
J Clin Oncol. 2024 Apr 10;42(11):e23-e43. doi: 10.1200/JCO.23.02746. Epub 2024 Feb 28.
4
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.将大型语言模型集成到系统评价中:使用 ROBINS-I 进行偏倚风险评估的框架和案例研究。
BMJ Evid Based Med. 2024 Nov 22;29(6):394-398. doi: 10.1136/bmjebm-2023-112597.
5
Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.人工智能实现网络荟萃分析自动化:四项评估大语言模型潜在应用的案例研究
Pharmacoecon Open. 2024 Mar;8(2):205-220. doi: 10.1007/s41669-024-00476-9. Epub 2024 Feb 10.
6
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
7
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
8
First-line Systemic Treatment Options for Metastatic Castration-Sensitive Prostate Cancer: A Living Systematic Review and Network Meta-analysis.转移性去势敏感型前列腺癌的一线系统治疗选择:一项实时系统评价和网络荟萃分析。
JAMA Oncol. 2023 May 1;9(5):635-645. doi: 10.1001/jamaoncol.2022.7762.
9
Using artificial intelligence methods for systematic review in health sciences: A systematic review.利用人工智能方法进行健康科学系统评价:系统评价。
Res Synth Methods. 2022 May;13(3):353-362. doi: 10.1002/jrsm.1553. Epub 2022 Feb 28.
10
Resource use during systematic review production varies widely: a scoping review: response to Nussbaumer-Streit et al.系统综述制作过程中的资源使用差异很大:一项范围综述:对努斯鲍默 - 施特赖特等人的回应
J Clin Epidemiol. 2022 Feb;142:319-320. doi: 10.1016/j.jclinepi.2021.10.008. Epub 2021 Oct 16.

用于活体系统评价中自动数据提取的协作式大语言模型

Collaborative large language models for automated data extraction in living systematic reviews.

作者信息

Khan Muhammad Ali, Ayub Umair, Naqvi Syed Arsalan Ahmed, Khakwani Kaneez Zahra Rubab, Sipra Zaryab Bin Riaz, Raina Ammad, Zhou Sihan, He Huan, Saeidi Amir, Hasan Bashar, Rumble Robert Bryan, Bitterman Danielle S, Warner Jeremy L, Zou Jia, Tevaarwerk Amye J, Leventakos Konstantinos, Kehl Kenneth L, Palmer Jeanne M, Murad Mohammad Hassan, Baral Chitta, Riaz Irbaz Bin

机构信息

Department of Medicine, Mayo Clinic, Phoenix, AZ, 85054, United States.

Department of Medicine, University of Arizona, Tucson, AZ, 85721, United States.

出版信息

J Am Med Inform Assoc. 2025 Apr 1;32(4):638-647. doi: 10.1093/jamia/ocae325.

DOI:10.1093/jamia/ocae325
PMID:39836495
原文链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12005628/
Abstract

OBJECTIVE

Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process.

MATERIALS AND METHODS

A dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n = 5) and held-out test sets (n = 17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the 2 LLMs were considered concordant if they were the same for a given variable. The discordant responses from each LLM were provided to the other LLM for cross-critique. Accuracy, ie, the total number of correct responses divided by the total number of responses, was computed to assess performance.

RESULTS

In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, increasing accuracy to 0.76.

DISCUSSION

Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.

CONCLUSION

Large language models, when simulated in a collaborative, 2-reviewer workflow, can extract data with reasonable performance, enabling truly "living" systematic reviews.

摘要

目的

从已发表的文献中提取数据是进行实时系统评价(LSR)最费力的步骤。我们旨在构建一种可推广的自动化数据提取工作流程,利用大语言模型(LLM)来模拟真实世界中两名评审员的流程。

材料与方法

使用了一个已发表的LSR中的10项试验(22篇出版物)的数据集,重点关注与试验、人群和结果数据相关的23个变量。该数据集被分为提示开发集(n = 5)和保留测试集(n = 17)。使用GPT-4-turbo和Claude-3-Opus进行数据提取。如果两个LLM对于给定变量的回答相同,则认为它们的回答一致。每个LLM的不一致回答被提供给另一个LLM进行交叉评审。计算准确率,即正确回答总数除以回答总数,以评估性能。

结果

在提示开发集中,110个(96%)回答一致,相对于金标准的准确率达到0.99。在测试集中,342个(87%)回答一致。一致回答的准确率为0.94。GPT-4-turbo的不一致回答准确率为0.41,Claude-3-Opus的为0.50。在49个不一致回答中,25个(51%)在交叉评审后变得一致,准确率提高到0.76。

讨论

LLM的一致回答可能是准确的。在回答不一致的情况下,交叉评审可以进一步提高准确率。

结论

当在协作的两名评审员工作流程中进行模拟时,大语言模型可以以合理的性能提取数据,实现真正的“实时”系统评价。