Suppr超能文献

基于文档级注意力的 BiLSTM-CRF 结合疾病词典的疾病命名实体识别。

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.

机构信息

Department of Computer Science, Guangdong University of Technology, Guangzhou, China.

Department of Computer Science, Guangdong University of Technology, Guangzhou, China; Department of Computer Science, City University of Hong Kong, Hong Kong, China.

出版信息

Comput Biol Med. 2019 May;108:122-132. doi: 10.1016/j.compbiomed.2019.04.002. Epub 2019 Apr 7.

Abstract

BACKGROUND

Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention.

METHODS

We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER.

RESULTS

Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods.

CONCLUSION

DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.

摘要

背景

疾病命名实体识别(NER)在生物医学研究中起着重要作用。有许多具有挑战性的问题需要解决;其中,罕见疾病和复杂疾病名称的识别以及标记不一致的问题(即,如果一个实体在文档中被标记为不同)引起了大量的研究关注。

方法

我们提出了一种新的神经网络方法,称为 Dic-Att-BiLSTM-CRF(DABLC),用于疾病 NER。DABLC 应用一种有效的精确字符串匹配方法将疾病实体与疾病词典匹配;这里,词典是基于疾病本体构建的。此外,DABLC 通过结合疾病词典匹配方法和文档级注意力机制构建了一个词典注意层。最后,提出了一个带有词典注意层的双向长短期记忆网络和条件随机场(BiLSTM-CRF),以结合疾病词典来开发疾病 NER。

结果

我们在两个广泛使用的语料库:NCBI 疾病语料库和 BioCreative V CDR 语料库上进行了广泛的实验。我们对每个模型的 10 次执行分别进行了测试,置信区间为 95%。DABLC 实现了最高的 F1 分数(NCBI:精度= 0.883,召回率= 0.89,F1 = 0.886;BioCreative V CDR:精度= 0.891,召回率= 0.875,F1 = 0.883),优于最先进的方法。

结论

DABLC 结合了外部词典资源和深度注意力神经网络的优势。这有助于识别罕见疾病和复杂疾病名称;此外,它减少了标记不一致的影响。特殊疾病 NER 和处理长句的深度学习模型是未来值得关注的领域。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验