Suppr超能文献

基于新型卷积神经网络的生物医学文献疾病命名实体识别

Disease named entity recognition from biomedical literature using a novel convolutional neural network.

机构信息

College of Computer Science and Technology, Dalian University of Technology, Dalian, 116023, China.

Beijing Institute of Health Administration and Medical Information, Beijing, 100850, China.

出版信息

BMC Med Genomics. 2017 Dec 28;10(Suppl 5):73. doi: 10.1186/s12920-017-0316-8.

Abstract

BACKGROUND

Automatic disease named entity recognition (DNER) is of utmost importance for development of more sophisticated BioNLP tools. However, most conventional CRF based DNER systems rely on well-designed features whose selection is labor intensive and time-consuming. Though most deep learning methods can solve NER problems with little feature engineering, they employ additional CRF layer to capture the correlation information between labels in neighborhoods which makes them much complicated.

METHODS

In this paper, we propose a novel multiple label convolutional neural network (MCNN) based disease NER approach. In this approach, instead of the CRF layer, a multiple label strategy (MLS) first introduced by us, is employed. First, the character-level embedding, word-level embedding and lexicon feature embedding are concatenated. Then several convolutional layers are stacked over the concatenated embedding. Finally, MLS strategy is applied to the output layer to capture the correlation information between neighboring labels.

RESULTS

As shown by the experimental results, MCNN can achieve the state-of-the-art performance on both NCBI and CDR corpora.

CONCLUSIONS

The proposed MCNN based disease NER method achieves the state-of-the-art performance with little feature engineering. And the experimental results show the MLS strategy's effectiveness of capturing the correlation information between labels in the neighborhood.

摘要

背景

自动疾病命名实体识别(DNER)对于开发更复杂的生物自然语言处理工具至关重要。然而,大多数基于条件随机场(CRF)的 DNER 系统依赖于精心设计的特征,其选择既费力又耗时。尽管大多数深度学习方法可以在很少进行特征工程的情况下解决 NER 问题,但它们采用了额外的 CRF 层来捕获标签邻域之间的相关性信息,这使得它们变得更加复杂。

方法

在本文中,我们提出了一种新颖的基于多标签卷积神经网络(MCNN)的疾病 NER 方法。在该方法中,我们首先引入了一种多标签策略(MLS),而不是 CRF 层。首先,将字符级嵌入、单词级嵌入和词典特征嵌入连接起来。然后,堆叠几个卷积层在连接的嵌入上。最后,将 MLS 策略应用于输出层,以捕获标签邻域之间的相关性信息。

结果

实验结果表明,MCNN 在 NCBI 和 CDR 语料库上都能达到最先进的性能。

结论

所提出的基于 MCNN 的疾病 NER 方法在很少进行特征工程的情况下就能达到最先进的性能。实验结果表明,MLS 策略在捕获标签邻域之间的相关性信息方面是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5fb/5751782/b368d39a84d4/12920_2017_316_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验