Suppr超能文献

利用实体关联和门上下文感知的中医命名实体识别

Chinese medical named entity recognition utilizing entity association and gate context awareness.

作者信息

Yan Yang, Kang Yufeng, Huang Wenbo, Cai Xudong

机构信息

Institution of Computer Science and Technology, Changchun Normal University, Changchun, Jilin, China.

出版信息

PLoS One. 2025 Feb 25;20(2):e0319056. doi: 10.1371/journal.pone.0319056. eCollection 2025.

Abstract

Recognizing medical named entities is a crucial aspect of applying deep learning in the medical domain. Automated methods for identifying specific entities from medical literature or other texts can enhance the efficiency and accuracy of information processing, elevate medical service quality, and aid clinical decision-making. Nonetheless, current methods exhibit limitations in contextual awareness and insufficient consideration of contextual relevance and interactions between entities. In this study, we initially encode medical text inputs using the Chinese pre-trained RoBERTa-wwm-ext model to extract comprehensive contextual features and semantic information. Subsequently, we employ recurrent neural networks in conjunction with the multi-head attention mechanism as the primary gating structure for parallel processing and capturing inter-entity dependencies. Finally, we leverage conditional random fields in combination with the cross-entropy loss function to enhance entity recognition accuracy and ensure label sequence consistency. Extensive experiments conducted on datasets including MCSCSet and CMeEE demonstrate that the proposed model attains F1 scores of 91.90% and 64.36% on the respective datasets, outperforming other related models. These findings confirm the efficacy of our method for recognizing named entities in Chinese medical texts.

摘要

识别医学命名实体是在医学领域应用深度学习的一个关键方面。从医学文献或其他文本中识别特定实体的自动化方法可以提高信息处理的效率和准确性,提升医疗服务质量,并辅助临床决策。尽管如此,当前的方法在上下文感知方面存在局限性,并且对实体之间的上下文相关性和交互考虑不足。在本研究中,我们首先使用中文预训练的RoBERTa-wwm-ext模型对医学文本输入进行编码,以提取全面的上下文特征和语义信息。随后,我们将循环神经网络与多头注意力机制结合使用,作为并行处理和捕捉实体间依赖关系的主要门控结构。最后,我们利用条件随机场结合交叉熵损失函数来提高实体识别的准确性,并确保标签序列的一致性。在包括MCSCSet和CMeEE在内的数据集上进行的大量实验表明,所提出的模型在各自的数据集上分别获得了91.90%和64.36%的F1分数,优于其他相关模型。这些发现证实了我们的方法在识别中文医学文本中的命名实体方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9fa4/11856514/858371c5e52a/pone.0319056.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验