• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

JTIS:通过中间步骤的联合训练增强生物医学文档级关系抽取

JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.

作者信息

Li Jiru, Pan Dinghao, Yang Zhihao, Sun Yuanyuan, Lin Hongfei, Wang Jian

机构信息

School of Computer Science and Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116024, China.

出版信息

Database (Oxford). 2024 Dec 19;2024. doi: 10.1093/database/baae125.

DOI:10.1093/database/baae125
PMID:39700498
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11658465/
Abstract

Biomedical Relation Extraction (RE) is central to Biomedical Natural Language Processing and is crucial for various downstream applications. Existing RE challenges in the field of biology have primarily focused on intra-sentential analysis. However, with the rapid increase in the volume of literature and the complexity of relationships between biomedical entities, it often becomes necessary to consider multiple sentences to fully extract the relationship between a pair of entities. Current methods often fail to fully capture the complex semantic structures of information in documents, thereby affecting extraction accuracy. Therefore, unlike traditional RE methods that rely on sentence-level analysis and heuristic rules, our method focuses on extracting entity relationships from biomedical literature titles and abstracts and classifying relations that are novel findings. In our method, a multitask training approach is employed for fine-tuning a Pre-trained Language Model in the field of biology. Based on a broad spectrum of carefully designed tasks, our multitask method not only extracts relations of better quality due to more effective supervision but also achieves a more accurate classification of whether the entity pairs are novel findings. Moreover, by applying a model ensemble method, we further enhance our model's performance. The extensive experiments demonstrate that our method achieves significant performance improvements, i.e. surpassing the existing baseline by 3.94% in RE and 3.27% in Triplet Novel Typing in F1 score on BioRED, confirming its effectiveness in handling complex biomedical literature RE tasks. Database URL: https://codalab.lisn.upsaclay.fr/competitions/13377#learn_the_details-dataset.

摘要

生物医学关系提取(RE)是生物医学自然语言处理的核心,对各种下游应用至关重要。生物学领域现有的关系提取挑战主要集中在句内分析上。然而,随着文献数量的迅速增加以及生物医学实体之间关系的复杂性,通常有必要考虑多个句子才能完全提取一对实体之间的关系。当前的方法往往无法充分捕捉文档中信息的复杂语义结构,从而影响提取准确性。因此,与依赖句子级分析和启发式规则的传统关系提取方法不同,我们的方法专注于从生物医学文献标题和摘要中提取实体关系,并对作为新发现的关系进行分类。在我们的方法中,采用多任务训练方法对生物学领域的预训练语言模型进行微调。基于广泛精心设计的任务,我们的多任务方法不仅由于更有效的监督而提取出质量更高的关系,而且在实体对是否为新发现的分类上也实现了更准确的结果。此外,通过应用模型集成方法,我们进一步提升了模型的性能。广泛的实验表明,我们的方法取得了显著的性能提升,即在BioRED上的关系提取F1分数超过现有基线3.94%,在三元组新类型识别上超过3.27%,证实了其在处理复杂生物医学文献关系提取任务方面的有效性。数据库网址:https://codalab.lisn.upsaclay.fr/competitions/13377#learn_the_details-dataset 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/ef62e3b7a03f/baae125f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/272378507611/baae125f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/a284833acb0f/baae125f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/2099f28ed3f9/baae125f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/ef62e3b7a03f/baae125f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/272378507611/baae125f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/a284833acb0f/baae125f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/2099f28ed3f9/baae125f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1646/11658465/ef62e3b7a03f/baae125f4.jpg

相似文献

1
JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.JTIS:通过中间步骤的联合训练增强生物医学文档级关系抽取
Database (Oxford). 2024 Dec 19;2024. doi: 10.1093/database/baae125.
2
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集(BioRED)赛道概述。
Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.
3
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。
Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.
4
Enhancing the coverage of SemRep using a relation classification approach.利用关系分类方法增强 SemRep 的覆盖范围。
J Biomed Inform. 2024 Jul;155:104658. doi: 10.1016/j.jbi.2024.104658. Epub 2024 May 21.
5
PLRTE: Progressive learning for biomedical relation triplet extraction using large language models.基于大语言模型的生物医学关系三元组抽取的渐进式学习方法(PLRTE)。
J Biomed Inform. 2024 Nov;159:104738. doi: 10.1016/j.jbi.2024.104738. Epub 2024 Oct 18.
6
Exploiting sequence labeling framework to extract document-level relations from biomedical texts.利用序列标注框架从生物医学文本中提取文档级关系。
BMC Bioinformatics. 2020 Mar 27;21(1):125. doi: 10.1186/s12859-020-3457-2.
7
BioRED: a rich biomedical relation extraction dataset.BioRED:一个丰富的生物医学关系抽取数据集。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.
8
Enhancing Biomedical Relation Extraction with Transformer Models using Shortest Dependency Path Features and Triplet Information.利用最短依赖路径特征和三元组信息增强基于 Transformer 的生物医学关系抽取
J Biomed Inform. 2021 Oct;122:103893. doi: 10.1016/j.jbi.2021.103893. Epub 2021 Sep 2.
9
SSGU-CD: A combined semantic and structural information graph U-shaped network for document-level Chemical-Disease interaction extraction.SSGU-CD:一种用于文档级化学-疾病交互作用提取的结合语义和结构信息图 U 形网络。
J Biomed Inform. 2024 Sep;157:104719. doi: 10.1016/j.jbi.2024.104719. Epub 2024 Aug 29.
10
Biomedical relation extraction method based on ensemble learning and attention mechanism.基于集成学习和注意力机制的生物医学关系抽取方法。
BMC Bioinformatics. 2024 Oct 18;25(1):333. doi: 10.1186/s12859-024-05951-y.

本文引用的文献

1
The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII.生物创意 VIII 中生物医学关系提取数据集(BioRED)赛道概述。
Database (Oxford). 2024 Aug 8;2024. doi: 10.1093/database/baae069.
2
Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark.异构网络表示学习:一个包含综述与基准测试的统一框架
IEEE Trans Knowl Data Eng. 2022 Oct;34(10):4854-4873. doi: 10.1109/tkde.2020.3045924. Epub 2020 Dec 21.
3
BioRED: a rich biomedical relation extraction dataset.BioRED:一个丰富的生物医学关系抽取数据集。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac282.
4
NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.NLM-Gene,一个丰富注释的基因实体黄金标准数据集,解决了模糊性和多物种基因识别问题。
J Biomed Inform. 2021 Jun;118:103779. doi: 10.1016/j.jbi.2021.103779. Epub 2021 Apr 9.
5
BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.BERT-GT:使用BERT和图变换器进行跨句子n元关系提取
Bioinformatics. 2021 Apr 5;36(24):5678-5685. doi: 10.1093/bioinformatics/btaa1087.
6
Deep learning for drug response prediction in cancer.深度学习在癌症药物反应预测中的应用。
Brief Bioinform. 2021 Jan 18;22(1):360-379. doi: 10.1093/bib/bbz171.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心:用于生物医学全文文章的自动概念标注。
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.
9
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
10
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations.BRONCO:用于提取基因-变异-疾病-药物关系的生物医学实体关系肿瘤语料库。
Database (Oxford). 2016 Apr 13;2016. doi: 10.1093/database/baw043. Print 2016.