Suppr超能文献

利用方向性增强生物医学关系抽取

Enhancing Biomedical Relation Extraction with Directionality.

作者信息

Lai Po-Ting, Wei Chih-Hsuan, Tian Shubo, Leaman Robert, Lu Zhiyong

出版信息

ArXiv. 2025 Jan 23:arXiv:2501.14079v1.

Abstract

Biological relation networks contain rich information for understanding the biological mechanisms behind the relationship of entities such as genes, proteins, diseases, and chemicals. The vast growth of biomedical literature poses significant challenges updating the network knowledge. The recent Biomedical Relation Extraction Dataset (BioRED) provides valuable manual annotations, facilitating the develop-ment of machine-learning and pre-trained language model approaches for automatically identifying novel document-level (inter-sentence context) relationships. Nonetheless, its annotations lack directionality (subject/object) for the entity roles, essential for studying complex biological networks. Herein we annotate the entity roles of the relationships in the BioRED corpus and subsequently propose a novel multi-task language model with soft-prompt learning to jointly identify the relationship, novel findings, and entity roles. Our results in-clude an enriched BioRED corpus with 10,864 directionality annotations. Moreover, our proposed method outperforms existing large language models such as the state-of-the-art GPT-4 and Llama-3 on two benchmarking tasks. Our source code and dataset are available at https://github.com/ncbi-nlp/BioREDirect.

摘要

生物关系网络包含丰富的信息,有助于理解基因、蛋白质、疾病和化学物质等实体之间关系背后的生物学机制。生物医学文献的大量增长给更新网络知识带来了重大挑战。最近的生物医学关系提取数据集(BioRED)提供了有价值的人工注释,促进了用于自动识别新型文档级(句间上下文)关系的机器学习和预训练语言模型方法的发展。尽管如此,其注释缺乏实体角色的方向性(主语/宾语),而这对于研究复杂的生物网络至关重要。在此,我们对BioRED语料库中关系的实体角色进行注释,并随后提出一种具有软提示学习的新型多任务语言模型,以联合识别关系、新发现和实体角色。我们的结果包括一个丰富的BioRED语料库,其中有10,864个方向性注释。此外,我们提出的方法在两项基准任务上优于现有的大型语言模型,如最先进的GPT-4和Llama-3。我们的源代码和数据集可在https://github.com/ncbi-nlp/BioREDirect获取。

相似文献

2
Enhancing biomedical relation extraction with directionality.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i68-i76. doi: 10.1093/bioinformatics/btaf226.
3
An open-set semi-supervised multi-task learning framework for context classification in biomedical texts.
J Biomed Inform. 2025 Sep;169:104886. doi: 10.1016/j.jbi.2025.104886. Epub 2025 Jul 27.
4
Advancing entity recognition in biomedicine via instruction tuning of large language models.
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
5
PDF Entity Annotation Tool (PEAT).
J Open Source Softw. 2025 Apr 8;10(108):5336. doi: 10.21105/joss.05336.
7
Tailoring task arithmetic to address bias in models trained on multi-institutional datasets.
J Biomed Inform. 2025 Aug;168:104858. doi: 10.1016/j.jbi.2025.104858. Epub 2025 Jun 8.
9
CACER: Clinical concept Annotations for Cancer Events and Relations.
J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.
10
Automatic Segmentation and Alignment of Uterine Shapes from 3D Ultrasound Data.
Comput Biol Med. 2024 Aug;178:108794. doi: 10.1016/j.compbiomed.2024.108794. Epub 2024 Jun 27.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验