DTranNER：基于深度学习的标签-标签转换模型的生物医学命名实体识别。

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.

机构信息

Graduate School of Knowledge Service Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea.

Department of Industrial & Systems Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, South Korea.

出版信息

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

DOI:10.1186/s12859-020-3393-1

PMID:32046638

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7014657/

Abstract

BACKGROUND

Biomedical named-entity recognition (BioNER) is widely modeled with conditional random fields (CRF) by regarding it as a sequence labeling problem. The CRF-based methods yield structured outputs of labels by imposing connectivity between the labels. Recent studies for BioNER have reported state-of-the-art performance by combining deep learning-based models (e.g., bidirectional Long Short-Term Memory) and CRF. The deep learning-based models in the CRF-based methods are dedicated to estimating individual labels, whereas the relationships between connected labels are described as static numbers; thereby, it is not allowed to timely reflect the context in generating the most plausible label-label transitions for a given input sentence. Regardless, correctly segmenting entity mentions in biomedical texts is challenging because the biomedical terms are often descriptive and long compared with general terms. Therefore, limiting the label-label transitions as static numbers is a bottleneck in the performance improvement of BioNER.

RESULTS

We introduce DTranNER, a novel CRF-based framework incorporating a deep learning-based label-label transition model into BioNER. DTranNER uses two separate deep learning-based networks: Unary-Network and Pairwise-Network. The former is to model the input for determining individual labels, and the latter is to explore the context of the input for describing the label-label transitions. We performed experiments on five benchmark BioNER corpora. Compared with current state-of-the-art methods, DTranNER achieves the best F1-score of 84.56% beyond 84.40% on the BioCreative II gene mention (BC2GM) corpus, the best F1-score of 91.99% beyond 91.41% on the BioCreative IV chemical and drug (BC4CHEMD) corpus, the best F1-score of 94.16% beyond 93.44% on the chemical NER, the best F1-score of 87.22% beyond 86.56% on the disease NER of the BioCreative V chemical disease relation (BC5CDR) corpus, and a near-best F1-score of 88.62% on the NCBI-Disease corpus.

CONCLUSIONS

Our results indicate that the incorporation of the deep learning-based label-label transition model provides distinctive contextual clues to enhance BioNER over the static transition model. We demonstrate that the proposed framework enables the dynamic transition model to adaptively explore the contextual relations between adjacent labels in a fine-grained way. We expect that our study can be a stepping stone for further prosperity of biomedical literature mining.

摘要

背景

生物医学命名实体识别（BioNER）通常通过将其视为序列标记问题来使用条件随机场（CRF）进行建模。基于 CRF 的方法通过在标签之间施加连接性来生成标签的结构化输出。最近的 BioNER 研究通过结合基于深度学习的模型（例如，双向长短期记忆）和 CRF 报告了最先进的性能。基于 CRF 的方法中的深度学习模型专门用于估计单个标签，而连接标签之间的关系则描述为静态数字；因此，在为给定输入句子生成最合理的标签-标签转换时，无法及时反映上下文。尽管如此，由于生物医学术语通常比一般术语更具描述性且更长，因此正确分割生物医学文本中的实体提及仍然具有挑战性。因此，将标签-标签转换限制为静态数字是 BioNER 性能提高的瓶颈。

结果

我们引入了 DTranNER，这是一种新颖的基于 CRF 的框架，将基于深度学习的标签-标签转换模型集成到 BioNER 中。DTranNER 使用两个独立的基于深度学习的网络：Unary-Network 和 Pairwise-Network。前者用于对输入进行建模以确定单个标签，后者用于探索输入的上下文以描述标签-标签转换。我们在五个基准 BioNER 语料库上进行了实验。与当前最先进的方法相比，DTranNER 在 BioCreative II 基因提及（BC2GM）语料库上的最佳 F1 分数达到 84.56%，超过了 84.40%，在 BioCreative IV 化学和药物（BC4CHEMD）语料库上的最佳 F1 分数达到 91.99%，超过了 91.41%，在化学命名实体识别（Chemical NER）上的最佳 F1 分数达到 94.16%，超过了 93.44%，在 BioCreative V 化学疾病关系（BC5CDR）语料库上的最佳疾病命名实体识别（Disease NER）分数达到 87.22%，超过了 86.56%，在 NCBI-Disease 语料库上的近乎最佳 F1 分数达到 88.62%。

结论

我们的结果表明，基于深度学习的标签-标签转换模型的引入为增强 BioNER 提供了独特的上下文线索，优于静态转换模型。我们证明，所提出的框架使动态转换模型能够以精细的方式自适应地探索相邻标签之间的上下文关系。我们希望我们的研究能够成为进一步繁荣生物医学文献挖掘的垫脚石。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2fd/7014657/4ee93803b3d8/12859_2020_3393_Fig1_HTML.jpg

相似文献

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

Biomedical named entity recognition using deep neural networks with contextual information.

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.

BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.

Biomedical named entity recognition using BERT in the machine reading comprehension framework.

J Biomed Inform. 2021 Jun;118:103799. doi: 10.1016/j.jbi.2021.103799. Epub 2021 May 6.

Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition.

Comput Biol Med. 2019 May;108:122-132. doi: 10.1016/j.compbiomed.2019.04.002. Epub 2019 Apr 7.

Towards reliable named entity recognition in the biomedical domain.

Bioinformatics. 2020 Jan 1;36(1):280-286. doi: 10.1093/bioinformatics/btz504.

Dictionary-based matching graph network for biomedical named entity recognition.

Sci Rep. 2023 Dec 8;13(1):21667. doi: 10.1038/s41598-023-48564-w.

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Bioinformatics. 2018 May 1;34(9):1547-1554. doi: 10.1093/bioinformatics/btx815.

Cross-type biomedical named entity recognition with deep multi-task learning.

Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.

Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.

引用本文的文献

Do LLMs Surpass Encoders for Biomedical NER?

Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22.

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.

Biomedical named entity recognition based on fusion multi-features embedding.

Technol Health Care. 2023;31(S1):111-121. doi: 10.3233/THC-236011.

A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.

Database (Oxford). 2022 Jul 15;2022. doi: 10.1093/database/baac056.

Chinese Clinical Named Entity Recognition with ALBERT and MHA Mechanism.

Evid Based Complement Alternat Med. 2022 May 23;2022:2056039. doi: 10.1155/2022/2056039. eCollection 2022.

Parallel sequence tagging for concept recognition.

BMC Bioinformatics. 2022 Mar 24;22(Suppl 1):623. doi: 10.1186/s12859-021-04511-y.

A pre-training and self-training approach for biomedical named entity recognition.

PLoS One. 2021 Feb 9;16(2):e0246310. doi: 10.1371/journal.pone.0246310. eCollection 2021.

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Front Genet. 2020 Dec 22;11:618862. doi: 10.3389/fgene.2020.618862. eCollection 2020.

Medical Information Extraction in the Age of Deep Learning.

Yearb Med Inform. 2020 Aug;29(1):208-220. doi: 10.1055/s-0040-1702001. Epub 2020 Aug 21.

本文引用的文献

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.

BMC Bioinformatics. 2019 May 29;20(Suppl 10):249. doi: 10.1186/s12859-019-2813-6.

Cross-type biomedical named entity recognition with deep multi-task learning.

Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.

Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Bioinformatics. 2018 Apr 15;34(8):1381-1388. doi: 10.1093/bioinformatics/btx761.

Deep learning with word embeddings improves biomedical named entity recognition.

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

Exploring Context with Deep Structured Models for Semantic Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1352-1366. doi: 10.1109/TPAMI.2017.2708714. Epub 2017 May 26.

Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data.

BMC Bioinformatics. 2017 Jan 5;18(1):18. doi: 10.1186/s12859-016-1415-9.

tmChem: a high performance approach for chemical named entity recognition and normalization.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DTranNER：基于深度学习的标签-标签转换模型的生物医学命名实体识别。

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献