Suppr超能文献

在 BLSTM-CRF 中集成语言模型和阅读控制门进行生物医学命名实体识别。

Integrating Language Model and Reading Control Gate in BLSTM-CRF for Biomedical Named Entity Recognition.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):841-846. doi: 10.1109/TCBB.2018.2868346. Epub 2018 Sep 3.

Abstract

Biomedical named entity recognition (Bio-NER) is an important preliminary step for many biomedical text mining tasks. The current mainstream methods for NER are based on the neural networks to avoid the complex hand-designed features derived from various linguistic analyses. However, these methods ignore some potential sentence-level semantic information and general features of semantic and syntactic. Therefore, we propose a novel Long Short Term Memory (LSTM) Networks model integrating language model and sentence-level reading control gate (LS-BLSTM-CRF) for Bio-NER. In our model, a sentence-level reading control gate (SC) is inserted into the networks to integrate the implicit meaning of an entire sentence and the language model is integrated to our model to learn richer potential features. Besides, character-level embeddings are introduced as the input to deal with out-of-vocabulary words. The experimental results conducted on the BioCreative II GM corpus show that our method can achieve an F-score of 89.94 percent, which outperforms all state-of-the-art systems and is 1.33 percent higher than the best performing neural networks.

摘要

生物医学命名实体识别 (Bio-NER) 是许多生物医学文本挖掘任务的重要初步步骤。当前 NER 的主流方法基于神经网络,以避免源自各种语言分析的复杂手动设计特征。然而,这些方法忽略了一些潜在的句子级语义信息和语义和句法的一般特征。因此,我们提出了一种新的长短期记忆 (LSTM) 网络模型,该模型集成了语言模型和句子级阅读控制门 (LS-BLSTM-CRF) ,用于 Bio-NER。在我们的模型中,插入了一个句子级阅读控制门 (SC) 来整合整个句子的隐含意义,并将语言模型集成到我们的模型中,以学习更丰富的潜在特征。此外,还引入了字符级嵌入作为输入来处理词汇外单词。在 BioCreative II GM 语料库上进行的实验结果表明,我们的方法可以达到 89.94%的 F 分数,优于所有最先进的系统,比表现最好的神经网络高出 1.33%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验