命名实体感知迁移学习在生物医学事实问答中的应用。

Named Entity Aware Transfer Learning for Biomedical Factoid Question Answering.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2365-2376. doi: 10.1109/TCBB.2021.3079339. Epub 2022 Aug 8.

Abstract

Biomedical factoid question answering is an important task in biomedical question answering applications. It has attracted much attention because of its reliability. In question answering systems, better representation of words is of great importance, and proper word embedding can significantly improve the performance of the system. With the success of pretrained models in general natural language processing tasks, pretrained models have been widely used in biomedical areas, and many pretrained model-based approaches have been proven effective in biomedical question-answering tasks. In addition to proper word embedding, name entities also provide important information for biomedical question answering. Inspired by the concept of transfer learning, in this study, we developed a mechanism to fine-tune BioBERT with a named entity dataset to improve the question answering performance. Furthermore, we applied BiLSTM to encode the question text to obtain sentence-level information. To better combine the question level and token level information, we use bagging to further improve the overall performance. The proposed framework was evaluated on BioASQ 6b and 7b datasets, and the results have shown that our proposed framework can outperform all baselines.

摘要

生物医学事实问答是生物医学问答应用中的一项重要任务。由于其可靠性,它引起了广泛关注。在问答系统中,单词的更好表示非常重要,适当的词嵌入可以显著提高系统的性能。随着预训练模型在一般自然语言处理任务中的成功,预训练模型已被广泛应用于生物医学领域,并且许多基于预训练模型的方法已被证明在生物医学问答任务中是有效的。除了适当的词嵌入之外,命名实体也为生物医学问答提供了重要信息。受迁移学习概念的启发,在这项研究中,我们开发了一种机制,使用命名实体数据集来微调 BioBERT,以提高问答性能。此外,我们应用 BiLSTM 对问题文本进行编码,以获取句子级别的信息。为了更好地结合问题级别和标记级别信息,我们使用装袋进一步提高整体性能。所提出的框架在 BioASQ 6b 和 7b 数据集上进行了评估,结果表明我们提出的框架可以优于所有基线。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索