Suppr超能文献

一种从生物医学文献中提取药物-蛋白质关系的序列标注框架。

A sequence labeling framework for extracting drug-protein relations from biomedical literature.

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

Database (Oxford). 2022 Jul 19;2022. doi: 10.1093/database/baac058.

Abstract

UNLABELLED

Automatic extracting interactions between chemical compound/drug and gene/protein are significantly beneficial to drug discovery, drug repurposing, drug design and biomedical knowledge graph construction. To promote the development of the relation extraction between drug and protein, the BioCreative VII challenge organized the DrugProt track. This paper describes the approach we developed for this task. In addition to the conventional text classification framework that has been widely used in relation extraction tasks, we propose a sequence labeling framework to drug-protein relation extraction. We first comprehensively compared the cutting-edge biomedical pre-trained language models for both frameworks. Then, we explored several ensemble methods to further improve the final performance. In the evaluation of the challenge, our best submission (i.e. the ensemble of models in two frameworks via major voting) achieved the F1-score of 0.795 on the official test set. Further, we realized the sequence labeling framework is more efficient and achieves better performance than the text classification framework. Finally, our ensemble of the sequence labeling models with majority voting achieves the best F1-score of 0.800 on the test set.

DATABASE URL

https://github.com/lingluodlut/BioCreativeVII_DrugProt.

摘要

未标记

自动提取化合物/药物与基因/蛋白质之间的相互作用,对于药物发现、药物再利用、药物设计和生物医学知识图谱构建具有重要意义。为了促进药物与蛋白质之间的关系提取的发展,BioCreative VII 挑战赛组织了 DrugProt 轨道。本文介绍了我们为此任务开发的方法。除了关系提取任务中广泛使用的传统文本分类框架外,我们还提出了一种序列标记框架来进行药物-蛋白质关系提取。我们首先全面比较了两种框架的最先进的生物医学预训练语言模型。然后,我们探索了几种集成方法来进一步提高最终性能。在挑战赛的评估中,我们的最佳提交(即通过多数投票对两个框架中的模型进行集成)在官方测试集上的 F1 得分为 0.795。此外,我们意识到序列标记框架比文本分类框架更有效,并且能取得更好的性能。最后,我们通过多数投票对序列标记模型进行集成,在测试集上取得了最佳的 F1 得分为 0.800。

数据库 URL:https://github.com/lingluodlut/BioCreativeVII_DrugProt。

相似文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验