Zhang Dongfang, Du Haoze, Wang Xiaolei, Zhu Mingdong, Pang Xiaoxiao, Wei Dongqing, Wang Xianfang
School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China.
Department of Computer Science, North Carolina State University, Raleigh, 27695, USA.
Interdiscip Sci. 2025 Jun 5. doi: 10.1007/s12539-025-00715-5.
In the domain of Chinese clinical medical question-answering (QA), traditional Large Language Models (LLMs) encounter challenges such as hallucinations and difficulties in updating knowledge for knowledge-intensive tasks. To address these issues, this research presents a Chinese clinical medical QA model that integrates Retrieval-Augmented Generation (RAG) and a medical knowledge graph, named CMedRAGBot. First, a Chinese medical knowledge graph encompassing multiple entity types-including diseases, medications, and symptoms-is constructed. Based on this knowledge graph, a Named Entity Recognition (NER) model built on a Chinese-RoBERTa and BiGRU architecture is designed, with data augmentation strategies employed to enhance its generalization capability. In addition, prompt engineering techniques are used to implement intent recognition for user queries, mapping them to predefined intent categories. Finally, the aforementioned modules are integrated to form a complete Chinese clinical medical QA system. In the experimental evaluation, CMedRAGBot is deployed on five state-of-the-art LLMs (including ChatGPT-4o, ChatGPT-o1, DeepSeek-R1, Llama-3.3-70B-Instruct, and Gemini 2.0 Flash) and tested using specialized question banks derived from the Chinese Clinical Medical Qualification Examinations and Residency Standardization Training Examinations from 2000 to 2023. The results indicate that the integration of CMedRAGBot significantly improves the test accuracy of all models, with increases of up to approximately 10%. Furthermore, ablation experiments reveal that data augmentation enhances NER model's F1 score from 95.27% to 97.55%, while the inclusion of an intent recognition module markedly improves the model's ability to understand complex queries, thereby further boosting answer accuracy. Source code of the research is available at https://github.com/zhdongfang/CMedRAGBot .
在中国临床医学问答(QA)领域,传统的大语言模型(LLMs)面临诸如幻觉以及在知识密集型任务中更新知识困难等挑战。为解决这些问题,本研究提出了一种整合检索增强生成(RAG)和医学知识图谱的中国临床医学QA模型,名为CMedRAGBot。首先,构建了一个包含多种实体类型(包括疾病、药物和症状)的中文医学知识图谱。基于此知识图谱,设计了一个基于中文RoBERTa和双向门控循环单元(BiGRU)架构的命名实体识别(NER)模型,并采用数据增强策略来提高其泛化能力。此外,使用提示工程技术对用户查询进行意图识别,将其映射到预定义的意图类别。最后,将上述模块集成以形成一个完整的中国临床医学QA系统。在实验评估中,CMedRAGBot部署在五个最先进的大语言模型(包括ChatGPT - 4o、ChatGPT - o1、渊思R1、Llama - 3.3 - 70B - Instruct和Gemini 2.0 Flash)上,并使用从2000年至2023年的中国临床医学资格考试和住院医师规范化培训考试中提取的专业题库进行测试。结果表明,CMedRAGBot的集成显著提高了所有模型的测试准确率,提高幅度高达约10%。此外,消融实验表明,数据增强将NER模型的F1分数从95.27%提高到97.55%,而包含意图识别模块显著提高了模型理解复杂查询的能力,从而进一步提高了答案的准确性。该研究的源代码可在https://github.com/zhdongfang/CMedRAGBot获取。