探索之路：一个端到端的系统，用于揭示新的生物医学关系。

Towards discovery: an end-to-end system for uncovering novel biomedical relations.

机构信息

IEETA/DETI, LASI, University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal.

出版信息

Database (Oxford). 2024 Jul 11;2024. doi: 10.1093/database/baae057.

DOI:10.1093/database/baae057

PMID:38994795

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11240158/

Abstract

Biomedical relation extraction is an ongoing challenge within the natural language processing community. Its application is important for understanding scientific biomedical literature, with many use cases, such as drug discovery, precision medicine, disease diagnosis, treatment optimization and biomedical knowledge graph construction. Therefore, the development of a tool capable of effectively addressing this task holds the potential to improve knowledge discovery by automating the extraction of relations from research manuscripts. The first track in the BioCreative VIII competition extended the scope of this challenge by introducing the detection of novel relations within the literature. This paper describes that our participation system initially focused on jointly extracting and classifying novel relations between biomedical entities. We then describe our subsequent advancement to an end-to-end model. Specifically, we enhanced our initial system by incorporating it into a cascading pipeline that includes a tagger and linker module. This integration enables the comprehensive extraction of relations and classification of their novelty directly from raw text. Our experiments yielded promising results, and our tagger module managed to attain state-of-the-art named entity recognition performance, with a micro F1-score of 90.24, while our end-to-end system achieved a competitive novelty F1-score of 24.59. The code to run our system is publicly available at https://github.com/ieeta-pt/BioNExt. Database URL: https://github.com/ieeta-pt/BioNExt.

摘要

生物医学关系抽取是自然语言处理领域的一个持续挑战。它的应用对于理解科学生物医学文献非常重要，有许多用例，如药物发现、精准医学、疾病诊断、治疗优化和生物医学知识图谱构建。因此，开发一种能够有效解决这一任务的工具，有可能通过自动化从研究手稿中提取关系来提高知识发现能力。BioCreative VIII 竞赛的第一个轨道通过引入文献中新颖关系的检测，扩展了这一挑战的范围。本文描述了我们的参与系统最初专注于联合提取和分类生物医学实体之间的新颖关系。然后，我们描述了我们随后的进步到端到端模型。具体来说，我们通过将其集成到一个包括标记器和链接器模块的级联管道中，增强了我们的初始系统。这种集成能够直接从原始文本中全面提取关系并对其新颖性进行分类。我们的实验取得了有希望的结果，我们的标记器模块设法实现了最先进的命名实体识别性能，微观 F1 得分为 90.24，而我们的端到端系统实现了具有竞争力的新颖性 F1 得分 24.59。运行我们系统的代码可在 https://github.com/ieeta-pt/BioNExt 上获得。数据库 URL：https://github.com/ieeta-pt/BioNExt。