Chen Chunliang, Wang Xinyu, Guan Ming, Yue Wenjing, Wu Yuanbin, Zhou Ya, Wang Xiaoling
East China Normal University, No.3663 Zhongshanbei Road, Shanghai, China, 86 18621306726.
Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China.
JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.
A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.
This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.
We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam," customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.
We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.
Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.
大语言模型为推进中医智能化发展提供了新机遇。辨证思维是中医的重要组成部分,使大语言模型具备这种能力是中医更有效临床应用的关键一步。然而,鉴于中医辨证思维的复杂性,让模型具备这种能力是一项巨大挑战。
本研究旨在评估大语言模型的辨证思维能力,并设计一种有效提高其在该领域表现的方法。
我们将中医辨证思维过程分解为三个核心任务:病机推理、证候推理和诊断建议。为评估大语言模型在这些任务中的表现,我们构建了一个高质量评估数据集,为定量评估其能力奠定可靠基础。此外,我们基于“开卷考试”理念开发了一种生成指令数据的方法,定制了三个数据模板,并动态检索与任务相关的专业知识,将其插入模板内的预定义位置。这种方法有效生成了符合中医辨证思维独特特征的高质量指令数据。利用这些指令数据,我们对基础模型进行微调,增强了大语言模型的辨证思维能力。
我们为评估数据集收集了200个医案,并将其标准化为三种类型的任务问题。我们测试了通用和中医专用的大语言模型,并将它们的表现与我们提出的解决方案进行比较。结果表明,我们的方法显著增强了大语言模型的辨证思维能力。我们的模型在任务1中达到了85.7%,在任务2中的准确率为81.2%,分别比表现最佳的中医和通用大语言模型高出26.3%和15.8%。在任务3中,我们的模型相似度得分达到84.3,表明该模型与专家给出的建议非常相似。
现有的通用大语言模型和中医专用大语言模型在辨证思维核心任务中仍存在重大局限性。我们的研究表明,通过设计专业指令模板和生成高质量指令数据对大语言模型进行微调,可以显著提高它们在核心任务上 的表现。优化后的大语言模型在推理结果上显示出高度相似性,与领域专家的意见一致,表明它们可以在一定程度上模拟辨证思维。这些发现对深入解读中医临床诊疗过程的复杂性具有重要的理论和实践意义。