• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SyRACT:基于协同检索增强生成(RAG)和思维链(CoT)的零样本生物医学文档级关系抽取

SyRACT: zero-shot biomedical document-level relation extraction with synergistic RAG and CoT.

作者信息

Dong Xin, Zhao Di, Meng Jiana, Guo Bocheng, Lin Hongfei

机构信息

School of Computer Science and Engineering, Dalian Minzu University, Liaoning 116600, China.

School of Computer Science and Technology, Dalian University of Technology, Liaoning 116024, China.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf356.

DOI:10.1093/bioinformatics/btaf356
PMID:40577808
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12237500/
Abstract

MOTIVATION

With the advancement of large language models (LLMs), the field of biomedical document-level relation extraction (BioDocRE) has encountered new opportunities. However, LLMs often face challenges such as hallucinated generation, insufficient reasoning capabilities, and a lack of interpretability when performing relation extraction tasks.

RESULTS

To address these issues, we propose the SyRACT (Synergistic Retrieval Augmented Generation and Chain of Thought) framework for high precision relation extraction in biomedical documents. This framework is built around three core strategies: (i) reframing the relation extraction task as a question answering problem to better align with the processing logic of LLMs; (ii) leveraging an external database constructed from PubMed to provide LLMs with rich and reliable contextual information, thus mitigating hallucination generation; and (iii) construct a specific Chain of Thought for BioDocRE tasks, thereby enhancing the model's reasoning ability and the interpretability of its output. We validated this approach on three biomedical relation extraction datasets: CDR, GDA, and ADE. Experimental results show that the SyRACT model improves F1 scores by 11.04%, 9.10%, and 41.00% on three datasets, respectively, compared to the DocRE method, which uses standard prompts for LLMs.

AVAILABILITY AND IMPLEMENTATION

Our source code and data are available at https://github.com/donggggxin/SyRACT.

摘要

动机

随着大语言模型(LLMs)的发展,生物医学文档级关系抽取(BioDocRE)领域迎来了新机遇。然而,大语言模型在执行关系抽取任务时常常面临诸如生成幻觉、推理能力不足以及缺乏可解释性等挑战。

结果

为解决这些问题,我们提出了用于生物医学文档高精度关系抽取的SyRACT(协同检索增强生成与思维链)框架。该框架围绕三个核心策略构建:(i)将关系抽取任务重新构建为问答问题,以更好地与大语言模型的处理逻辑相匹配;(ii)利用从PubMed构建的外部数据库为大语言模型提供丰富且可靠的上下文信息,从而减轻幻觉生成;(iii)为BioDocRE任务构建特定的思维链,进而增强模型的推理能力及其输出的可解释性。我们在三个生物医学关系抽取数据集(CDR、GDA和ADE)上验证了这种方法。实验结果表明,与使用标准提示的大语言模型的DocRE方法相比,SyRACT模型在三个数据集上的F1分数分别提高了11.04%、9.10%和41.00%。

可用性与实现

我们的源代码和数据可在https://github.com/donggggxin/SyRACT获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/f3af7691cbbb/btaf356f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/c6ba188865b7/btaf356f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/9dc1161a9712/btaf356f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/df3ae4bb142e/btaf356f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/cbd1499577ad/btaf356f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/72a9c9c24f0c/btaf356f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/6556395d95f2/btaf356f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/05cec7908fbf/btaf356f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/f3af7691cbbb/btaf356f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/c6ba188865b7/btaf356f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/9dc1161a9712/btaf356f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/df3ae4bb142e/btaf356f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/cbd1499577ad/btaf356f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/72a9c9c24f0c/btaf356f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/6556395d95f2/btaf356f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/05cec7908fbf/btaf356f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5a/12237500/f3af7691cbbb/btaf356f8.jpg

相似文献

1
SyRACT: zero-shot biomedical document-level relation extraction with synergistic RAG and CoT.SyRACT:基于协同检索增强生成(RAG)和思维链(CoT)的零样本生物医学文档级关系抽取
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf356.
2
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
3
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.
4
Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型:算法开发研究
JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.
5
RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements.RAMIE:基于大语言模型的膳食补充剂检索增强多任务信息提取
J Am Med Inform Assoc. 2025 Mar 1;32(3):545-554. doi: 10.1093/jamia/ocaf002.
6
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.评估和提高大语言模型中的辨证思维能力:方法开发研究
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.
7
LEAP: LLM instruction-example adaptive prompting framework for biomedical relation extraction.LEAP:用于生物医学关系抽取的 LLM 指令-示例自适应提示框架。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2010-2018. doi: 10.1093/jamia/ocae147.
8
Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.在卫生经济学与结果研究中使用生成式人工智能:技术与突破入门
Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.
9
The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models.第一步是最困难的:为大型语言模型表示和标记时间数据的陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2151-2158. doi: 10.1093/jamia/ocae090.
10
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

本文引用的文献

1
Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化
NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.
2
Biomedical document relation extraction with prompt learning and KNN.基于提示学习和 KNN 的生物医学文档关系抽取。
J Biomed Inform. 2023 Sep;145:104459. doi: 10.1016/j.jbi.2023.104459. Epub 2023 Jul 31.
3
Should Health Care Demand Interpretable Artificial Intelligence or Accept "Black Box" Medicine?
医疗保健应该要求可解释的人工智能还是接受“黑箱”医学?
Ann Intern Med. 2020 Jan 7;172(1):59-60. doi: 10.7326/M19-2548. Epub 2019 Dec 17.
4
An effective neural model extracting document level chemical-induced disease relations from biomedical literature.从生物医学文献中提取文档级化学诱导疾病关系的有效神经网络模型。
J Biomed Inform. 2018 Jul;83:1-9. doi: 10.1016/j.jbi.2018.05.001. Epub 2018 May 8.
5
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
6
Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports.开发一个基准语料库,以支持从医疗病例报告中自动提取与药物相关的不良反应。
J Biomed Inform. 2012 Oct;45(5):885-92. doi: 10.1016/j.jbi.2012.04.008. Epub 2012 Apr 25.