评估和提高大语言模型中的辨证思维能力：方法开发研究

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

作者信息

Chen Chunliang, Wang Xinyu, Guan Ming, Yue Wenjing, Wu Yuanbin, Zhou Ya, Wang Xiaoling

机构信息

East China Normal University, No.3663 Zhongshanbei Road, Shanghai, China, 86 18621306726.

Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China.

出版信息

JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

DOI:10.2196/75103

PMID:40540614

Abstract

BACKGROUND

A large language model (LLM) provides new opportunities to advance the intelligent development of traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.

OBJECTIVE

This study aims to evaluate the ability of LLMs for syndrome differentiation thinking and design a method to effectively enhance their performance in this area.

METHODS

We decomposed the process of syndrome differentiation thinking in TCM into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, forming a reliable foundation for quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam," customized three data templates, and dynamically retrieved task-relevant professional knowledge that was inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.

RESULTS

We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM-specific LLMs, comparing their performance with our proposed solution. The findings demonstrated that our method significantly enhanced LLMs' syndrome differentiation thinking. Our model achieved 85.7% in Task 1 and 81.2% accuracy in Task 2, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%, respectively. In Task 3, our model achieved a similarity score of 84.3, indicating that the model was remarkably similar to advice given by experts.

CONCLUSIONS

Existing general LLMs and TCM-specific LLMs continue to have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance on core tasks. The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. These findings have important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.

摘要

背景

大语言模型为推进中医智能化发展提供了新机遇。辨证思维是中医的重要组成部分，使大语言模型具备这种能力是中医更有效临床应用的关键一步。然而，鉴于中医辨证思维的复杂性，让模型具备这种能力是一项巨大挑战。

目的

本研究旨在评估大语言模型的辨证思维能力，并设计一种有效提高其在该领域表现的方法。

方法

我们将中医辨证思维过程分解为三个核心任务：病机推理、证候推理和诊断建议。为评估大语言模型在这些任务中的表现，我们构建了一个高质量评估数据集，为定量评估其能力奠定可靠基础。此外，我们基于“开卷考试”理念开发了一种生成指令数据的方法，定制了三个数据模板，并动态检索与任务相关的专业知识，将其插入模板内的预定义位置。这种方法有效生成了符合中医辨证思维独特特征的高质量指令数据。利用这些指令数据，我们对基础模型进行微调，增强了大语言模型的辨证思维能力。

结果

我们为评估数据集收集了200个医案，并将其标准化为三种类型的任务问题。我们测试了通用和中医专用的大语言模型，并将它们的表现与我们提出的解决方案进行比较。结果表明，我们的方法显著增强了大语言模型的辨证思维能力。我们的模型在任务1中达到了85.7%，在任务2中的准确率为81.2%，分别比表现最佳的中医和通用大语言模型高出26.3%和15.8%。在任务3中，我们的模型相似度得分达到84.3，表明该模型与专家给出的建议非常相似。

结论

现有的通用大语言模型和中医专用大语言模型在辨证思维核心任务中仍存在重大局限性。我们的研究表明，通过设计专业指令模板和生成高质量指令数据对大语言模型进行微调，可以显著提高它们在核心任务上的表现。优化后的大语言模型在推理结果上显示出高度相似性，与领域专家的意见一致，表明它们可以在一定程度上模拟辨证思维。这些发现对深入解读中医临床诊疗过程的复杂性具有重要的理论和实践意义。

相似文献

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.

JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.

J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Stigma Management Strategies of Autistic Social Media Users.

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.

Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.

Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.

How lived experiences of illness trajectories, burdens of treatment, and social inequalities shape service user and caregiver participation in health and social care: a theory-informed qualitative evidence synthesis.

Health Soc Care Deliv Res. 2025 Jun;13(24):1-120. doi: 10.3310/HGTQ8159.

引用本文的文献

Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis.

Front Pharmacol. 2025 Jul 25;16:1649041. doi: 10.3389/fphar.2025.1649041. eCollection 2025.

Artificial Intelligence in Traditional Chinese Medicine: Multimodal Fusion and Machine Learning for Enhanced Diagnosis and Treatment Efficacy.

Curr Med Sci. 2025 Aug 7. doi: 10.1007/s11596-025-00103-6.

本文引用的文献

Applications and Future Prospects of Medical LLMs: A Survey Based on the M-KAT Conceptual Framework.

J Med Syst. 2024 Dec 27;48(1):112. doi: 10.1007/s10916-024-02132-5.

TCMChat: A generative large language model for traditional Chinese medicine.

Pharmacol Res. 2024 Dec;210:107530. doi: 10.1016/j.phrs.2024.107530. Epub 2024 Nov 29.

Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models.

J Am Med Inform Assoc. 2024 Sep 1;31(9):2019-2029. doi: 10.1093/jamia/ocae087.

MedChatZH: A tuning LLM for traditional Chinese medicine consultations.

Comput Biol Med. 2024 Apr;172:108290. doi: 10.1016/j.compbiomed.2024.108290. Epub 2024 Mar 13.

The Potential Applications and Challenges of ChatGPT in the Medical Field.

Int J Gen Med. 2024 Mar 5;17:817-826. doi: 10.2147/IJGM.S456659. eCollection 2024.

Traditional Chinese medicine diagnostic prediction model for holistic syndrome differentiation based on deep learning.

Integr Med Res. 2024 Mar;13(1):101019. doi: 10.1016/j.imr.2023.101019. Epub 2023 Dec 19.

A Comparison of a Large Language Model vs Manual Chart Review for the Extraction of Data Elements From the Electronic Health Record.

Gastroenterology. 2024 Apr;166(4):707-709.e3. doi: 10.1053/j.gastro.2023.12.019. Epub 2023 Dec 25.

ChatGPT or LLM in next-generation drug discovery and development: pharmaceutical and biotechnology companies can make use of the artificial intelligence-based device for a faster way of drug discovery and development.

Int J Surg. 2023 Dec 1;109(12):4382-4384. doi: 10.1097/JS9.0000000000000719.

Advances in the Application of Traditional Chinese Medicine Using Artificial Intelligence: A Review.

Am J Chin Med. 2023;51(5):1067-1083. doi: 10.1142/S0192415X23500490. Epub 2023 Jul 7.

Syndrome differentiation in modern research of traditional Chinese medicine.

J Ethnopharmacol. 2012 Apr 10;140(3):634-42. doi: 10.1016/j.jep.2012.01.033. Epub 2012 Feb 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估和提高大语言模型中的辨证思维能力：方法开发研究

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献