Suppr超能文献

调查生物医学关系抽取:对当前数据集的批判性考察及新资源的提出。

Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource.

机构信息

Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan.

National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae132.

Abstract

Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.

摘要

自然语言处理 (NLP) 已成为各个领域的一项重要技术,为数据分析和开发各种 NLP 任务提供了广泛的可能性。在生物医学领域,理解化合物和蛋白质之间的复杂关系至关重要,特别是在信号转导和生化途径方面。在这些关系中,蛋白质-蛋白质相互作用 (PPI) 尤为引人注目,因为它们有可能引发各种生物反应。为了提高预测 PPI 事件的能力,我们提出了蛋白质事件检测数据集 (PEDD),其中包含 6823 篇摘要、39488 个句子和 182937 对基因。我们的 PEDD 数据集已在 AI CUP 生物医学论文分析竞赛中使用,竞赛要求系统预测 12 种不同的关系类型。在本文中,我们回顾了最先进的关系提取研究,并概述了 PEDD 的编译过程。此外,我们还介绍了 PPI 提取竞赛的结果,并评估了几种语言模型在 PEDD 上的性能。本文的研究结果将为未来在 NLP 中进行蛋白质事件检测的研究提供有价值的路线图。通过解决这一关键挑战,我们希望能够在药物发现方面取得突破,并加深我们对各种疾病的分子机制的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5159/11014787/34e84aceae6d/bbae132f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验