Suppr超能文献

基于连续学习和知识增强的中医概念规范化(CMCN: Chinese medical concept normalization using continual learning and knowledge-enhanced)

CMCN: Chinese medical concept normalization using continual learning and knowledge-enhanced.

机构信息

School of Management, Nanjing University of Posts & Telecommunications, Nanjing 210003, China; Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China.

School of Management, Nanjing University of Posts & Telecommunications, Nanjing 210003, China.

出版信息

Artif Intell Med. 2024 Nov;157:102965. doi: 10.1016/j.artmed.2024.102965. Epub 2024 Aug 27.

Abstract

Medical Concept Normalization (MCN) is a crucial process for deep information extraction and natural language processing tasks, which plays a vital role in biomedical research. Although MCN in English has achieved significant research achievements, Chinese medical concept normalization (CMCN) remains insufficiently explored due to its complex syntactic structure and the paucity of Chinese medical semantic and ontology resources. In recent years, deep learning has been extensively applied across numerous natural language processing tasks, owing to its robust learning capabilities, adaptability, and transferability. It has proven to be well suited for intricate and specialized knowledge discovery research in the biomedical field. In this study, we conduct research on CMCN through the lens of deep learning. Specifically, our research introduces a model that leverages polymorphic semantic information and knowledge enhanced through multi-task learning and retain more important medical features through continual learning. As the cornerstone of CMCN, disease names are the main focus of this research. We evaluated various methodologies on Chinese disease dataset built by ourselves, finally achieving 76.12 % on Accuracy@1, 87.20 % on Accuracy@5 and 90.02 % on Accuracy@10 with our best-performing model GCBM-BSCL. This research not only advances the fields of knowledge mining and medical concept normalization but also enhances the integration and application of artificial intelligence in the medical and health field. We have published the source code and results on https://github.com/BearLiX/CMCN.

摘要

医学概念规范化(MCN)是深度信息提取和自然语言处理任务的关键过程,在生物医学研究中起着至关重要的作用。尽管英语中的 MCN 已经取得了显著的研究成果,但由于其复杂的句法结构和缺乏中文医学语义和本体资源,中文医学概念规范化(CMCN)仍未得到充分探索。近年来,深度学习在众多自然语言处理任务中得到了广泛应用,因为它具有强大的学习能力、适应性和可转移性。它非常适合生物医学领域复杂和专门的知识发现研究。在这项研究中,我们通过深度学习的视角研究 CMCN。具体来说,我们的研究引入了一个模型,该模型利用多态语义信息和通过多任务学习增强的知识,并通过持续学习保留更多重要的医学特征。作为 CMCN 的基石,疾病名称是本研究的主要关注点。我们在自己构建的中文疾病数据集上评估了各种方法,最终我们的最佳模型 GCBM-BSCL 在 Accuracy@1 上达到了 76.12%,在 Accuracy@5 上达到了 87.20%,在 Accuracy@10 上达到了 90.02%。这项研究不仅推进了知识挖掘和医学概念规范化领域的发展,还增强了人工智能在医疗保健领域的整合和应用。我们已经在 https://github.com/BearLiX/CMCN 上发布了源代码和结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验