电子认知行为疗法失眠对话系统：大语言模型及失眠治疗适应策略的比较评估

eCBT-I dialogue system: a comparative evaluation of large language models and adaptation strategies for insomnia treatment.

作者信息

Bao Xueying, Zhu Xingyu, Yang Dongren, Lou Hao, Wang Ruoyun, Wu Yutong, Li Wenhui, Xia Yu, Zeng Li, Pan Yingying, Wang Xiqin, Zhang Xian, Ling Cheng, Ling Youhui, Zhang Yan, Zhao Qi, Yang Mei

机构信息

Department of Intensive Care Unit, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.

Department of Physics, Research Institute for Biomimetics and Soft Matter, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, 361005, China.

出版信息

J Transl Med. 2025 Aug 5;23(1):862. doi: 10.1186/s12967-025-06871-y.

DOI:10.1186/s12967-025-06871-y

PMID:40764995

Abstract

BACKGROUND

Traditional face-to-face mental health treatments are often limited by time and space. Thanks to the development of advanced large language models (LLMs), digital mental health treatments can provide personalized advice to patients and improve compliance. However, in the field of CBT-I, specialized, real-time interactive dialogue platforms have not been fully developed.

METHODS

Our research team construct an eCBT-I intelligent dialogue system based on the RAG architecture, aiming to provide an example of the deep integration of CBT-I knowledge graphs and large language models. Furthermore, in order to optimize the performance of the system's core language generation module on the insomnia dialogue dataset, we systematically include eight mainstream large language models (ChatGLM2-6b, ChatGLM3-6b, Baichuan-7b, Baichuan-13b, Qwen-7b, Qwen2-7b, Llama-2-7b-chat-hf, and Llama-2-13b-chat-hf) and three adaptation strategies (LoRA, QLoRA, and Freeze). We screen the suitability of the three adaptation strategies for the eight major language models in the group, and thus determine the best adaptation method for each language model to maximize performance improvement. The eight best-adapted language models are then evaluated in three dimensions to compare their performance on the small sample sleep dialogue dataset and the C-eval dataset. All subjects that evaluated under experimental conditions are historical medical records and patients who did not exhibit delirium and had normal language expression abilities.

RESULTS

Through the matching of model characteristics to adaptation strategies and the horizontal evaluation of multiple models, we compare the contribution of different fine-tuning strategies to the performance improvement of different language models on the small insomnia dialogue dataset, and finally determine that Qwen2-7b (Freeze) is the model with the best performance on the insomnia dialogue dataset.

CONCLUSIONS

This study effectively integrates the CBT-I knowledge graph with the large language model through the RAG architecture, which improves the professionalism of the eCBT-I intelligent dialogue system. The systematic fine-tuning method selection process and the confirmation of the optimal model not only improve the adaptability of the large language model in the CBT-I task, but also provide a useful paradigm for AI applications in medical subfields with resource constraints and difficulties in data collection, laying a solid foundation for more accurate and efficient digital CBT-I clinical practice in the future.

摘要

背景

传统的面对面心理健康治疗往往受到时间和空间的限制。得益于先进的大语言模型（LLMs）的发展，数字心理健康治疗可以为患者提供个性化建议并提高依从性。然而，在失眠认知行为疗法（CBT-I）领域，专门的实时交互式对话平台尚未得到充分发展。

方法

我们的研究团队构建了一个基于检索增强生成（RAG）架构的eCBT-I智能对话系统，旨在提供一个CBT-I知识图谱与大语言模型深度整合的示例。此外，为了优化系统核心语言生成模块在失眠对话数据集上的性能，我们系统地纳入了八个主流大语言模型（ChatGLM2-6B、ChatGLM3-6B、百川-7B、百川-13B、通义千问-7B、通义千问2-7B、Llama-2-7B-chat-hf和Llama-2-13B-chat-hf）以及三种适配策略（LoRA、QLoRA和Freeze）。我们筛选这三种适配策略对组内八个主要语言模型的适用性，从而确定每个语言模型的最佳适配方法，以最大限度地提高性能。然后从三个维度对八个最佳适配语言模型进行评估，以比较它们在小样本睡眠对话数据集和C-eval数据集上的性能。所有在实验条件下评估的受试者均为历史病历以及未出现谵妄且语言表达能力正常的患者。

结果

通过模型特征与适配策略的匹配以及多个模型的横向评估，我们比较了不同微调策略对小失眠对话数据集上不同语言模型性能提升的贡献，最终确定通义千问2-7B（Freeze）是在失眠对话数据集上性能最佳的模型。

结论

本研究通过RAG架构有效地将CBT-I知识图谱与大语言模型整合在一起，提高了eCBT-I智能对话系统的专业性。系统的微调方法选择过程以及最优模型的确定不仅提高了大语言模型在CBT-I任务中的适应性，还为资源受限且数据收集困难的医学子领域中的人工智能应用提供了有用的范例，为未来更准确、高效的数字CBT-I临床实践奠定了坚实基础。