基于文档级注意力的 BiLSTM-CRF 结合疾病词典的疾病命名实体识别。

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.

机构信息

Department of Computer Science, Guangdong University of Technology, Guangzhou, China.

Department of Computer Science, Guangdong University of Technology, Guangzhou, China; Department of Computer Science, City University of Hong Kong, Hong Kong, China.

出版信息

Comput Biol Med. 2019 May;108:122-132. doi: 10.1016/j.compbiomed.2019.04.002. Epub 2019 Apr 7.

DOI:10.1016/j.compbiomed.2019.04.002

PMID:31003175

Abstract

BACKGROUND

Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention.

METHODS

We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER.

RESULTS

Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods.

CONCLUSION

DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.

摘要

背景

疾病命名实体识别（NER）在生物医学研究中起着重要作用。有许多具有挑战性的问题需要解决；其中，罕见疾病和复杂疾病名称的识别以及标记不一致的问题（即，如果一个实体在文档中被标记为不同）引起了大量的研究关注。

方法

我们提出了一种新的神经网络方法，称为 Dic-Att-BiLSTM-CRF（DABLC），用于疾病 NER。DABLC 应用一种有效的精确字符串匹配方法将疾病实体与疾病词典匹配；这里，词典是基于疾病本体构建的。此外，DABLC 通过结合疾病词典匹配方法和文档级注意力机制构建了一个词典注意层。最后，提出了一个带有词典注意层的双向长短期记忆网络和条件随机场（BiLSTM-CRF），以结合疾病词典来开发疾病 NER。

结果

我们在两个广泛使用的语料库：NCBI 疾病语料库和 BioCreative V CDR 语料库上进行了广泛的实验。我们对每个模型的 10 次执行分别进行了测试，置信区间为 95%。DABLC 实现了最高的 F1 分数（NCBI：精度= 0.883，召回率= 0.89，F1 = 0.886；BioCreative V CDR：精度= 0.891，召回率= 0.875，F1 = 0.883），优于最先进的方法。

结论

DABLC 结合了外部词典资源和深度注意力神经网络的优势。这有助于识别罕见疾病和复杂疾病名称；此外，它减少了标记不一致的影响。特殊疾病 NER 和处理长句的深度学习模型是未来值得关注的领域。

相似文献

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.基于文档级注意力的 BiLSTM-CRF 结合疾病词典的疾病命名实体识别。

Comput Biol Med. 2019 May;108:122-132. doi: 10.1016/j.compbiomed.2019.04.002. Epub 2019 Apr 7.

Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.基于注意力机制的 BiLSTM-CRF 方法在文档级化学命名实体识别中的应用。

Bioinformatics. 2018 Apr 15;34(8):1381-1388. doi: 10.1093/bioinformatics/btx761.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER：基于深度学习的标签-标签转换模型的生物医学命名实体识别。

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Adversarial active learning for the identification of medical concepts and annotation inconsistency.对抗式主动学习在医学概念识别和标注不一致性中的应用。

J Biomed Inform. 2020 Aug;108:103481. doi: 10.1016/j.jbi.2020.103481. Epub 2020 Jul 18.

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.D3NER：基于条件随机场-双向长短期记忆网络的生物医学命名实体识别，通过各种语言信息的微调嵌入得到改进。

Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.

A hybrid approach for named entity recognition in Chinese electronic medical record.中文电子病历命名实体识别的混合方法。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):64. doi: 10.1186/s12911-019-0767-2.

引用本文的文献

Psychomedical named entity recognition method based on multi-level feature extraction and multi-granularity embedding fusion.基于多层次特征提取与多粒度嵌入融合的精神医学命名实体识别方法

Sci Rep. 2025 May 15;15(1):16927. doi: 10.1038/s41598-025-90939-8.

Chinese Clinical Named Entity Recognition With Segmentation Synonym Sentence Synthesis Mechanism: Algorithm Development and Validation.基于分词、同义词和句子合成机制的中文临床命名实体识别：算法开发与验证

JMIR Med Inform. 2024 Nov 21;12:e60334. doi: 10.2196/60334.

Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.中文医学命名实体识别中大型语言模型的比较分析

Bioengineering (Basel). 2024 Sep 29;11(10):982. doi: 10.3390/bioengineering11100982.

A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.一种结合手动标注和深度学习自然语言处理的遗传性疾病相关生物医学文献中精确实体抽取方法的研究。

Interdiscip Sci. 2024 Jun;16(2):333-344. doi: 10.1007/s12539-024-00605-2. Epub 2024 Feb 10.

Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach.从 COVID-19 临床病例报告中提取实体和关系：一种自然语言处理方法。

BMC Med Inform Decis Mak. 2023 Jan 26;23(1):20. doi: 10.1186/s12911-023-02117-3.

Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora.通过挖掘非结构化生物医学语料库提取新冠病毒知识图谱。

Comput Biol Chem. 2023 Feb;102:107808. doi: 10.1016/j.compbiolchem.2022.107808. Epub 2023 Jan 2.

Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach.新冠病毒风险检测的临床应用：一种自然语言处理方法。

Viruses. 2022 Dec 11;14(12):2761. doi: 10.3390/v14122761.

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS：通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。

BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.

An imConvNet-based deep learning model for Chinese medical named entity recognition.基于 imConvNet 的深度学习模型在中文医疗命名实体识别中的应用。

BMC Med Inform Decis Mak. 2022 Nov 21;22(1):303. doi: 10.1186/s12911-022-02049-4.

A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications.一个用于从开放获取文献中发现和丰富宏基因组元数据的机器学习框架。

Gigascience. 2022 Aug 11;11. doi: 10.1093/gigascience/giac077.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于文档级注意力的 BiLSTM-CRF 结合疾病词典的疾病命名实体识别。

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献