Suppr超能文献

评估和提高大语言模型中的辨证思维能力:方法开发研究

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

作者信息

Chen Chunliang, Wang Xinyu, Guan Ming, Yue Wenjing, Wu Yuanbin, Zhou Ya, Wang Xiaoling

机构信息

East China Normal University, No.3663 Zhongshanbei Road, Shanghai, China, 86 18621306726.

Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China.

出版信息

JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

Abstract

BACKGROUND

A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.

OBJECTIVE

This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.

METHODS

We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam," customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.

RESULTS

We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.

CONCLUSIONS

Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.

摘要

背景

大语言模型为推进中医智能化发展提供了新机遇。辨证思维是中医的重要组成部分,使大语言模型具备这种能力是中医更有效临床应用的关键一步。然而,鉴于中医辨证思维的复杂性,让模型具备这种能力是一项巨大挑战。

目的

本研究旨在评估大语言模型的辨证思维能力,并设计一种有效提高其在该领域表现的方法。

方法

我们将中医辨证思维过程分解为三个核心任务:病机推理、证候推理和诊断建议。为评估大语言模型在这些任务中的表现,我们构建了一个高质量评估数据集,为定量评估其能力奠定可靠基础。此外,我们基于“开卷考试”理念开发了一种生成指令数据的方法,定制了三个数据模板,并动态检索与任务相关的专业知识,将其插入模板内的预定义位置。这种方法有效生成了符合中医辨证思维独特特征的高质量指令数据。利用这些指令数据,我们对基础模型进行微调,增强了大语言模型的辨证思维能力。

结果

我们为评估数据集收集了200个医案,并将其标准化为三种类型的任务问题。我们测试了通用和中医专用的大语言模型,并将它们的表现与我们提出的解决方案进行比较。结果表明,我们的方法显著增强了大语言模型的辨证思维能力。我们的模型在任务1中达到了85.7%,在任务2中的准确率为81.2%,分别比表现最佳的中医和通用大语言模型高出26.3%和15.8%。在任务3中,我们的模型相似度得分达到84.3,表明该模型与专家给出的建议非常相似。

结论

现有的通用大语言模型和中医专用大语言模型在辨证思维核心任务中仍存在重大局限性。我们的研究表明,通过设计专业指令模板和生成高质量指令数据对大语言模型进行微调,可以显著提高它们在核心任务上 的表现。优化后的大语言模型在推理结果上显示出高度相似性,与领域专家的意见一致,表明它们可以在一定程度上模拟辨证思维。这些发现对深入解读中医临床诊疗过程的复杂性具有重要的理论和实践意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验