• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CALLM:通过使用大语言模型进行数据增强来加强临床访谈分析

CALLM: Enhancing Clinical Interview Analysis Through Data Augmentation With Large Language Models.

作者信息

Wu Yuqi, Mao Kaining, Zhang Yanbo, Chen Jie

出版信息

IEEE J Biomed Health Inform. 2024 Dec;28(12):7531-7542. doi: 10.1109/JBHI.2024.3435085. Epub 2024 Dec 5.

DOI:10.1109/JBHI.2024.3435085
PMID:39074002
Abstract

The global prevalence of mental health disorders is increasing, leading to a significant economic burden estimated in trillions of dollars. In automated mental health diagnosis, the scarcity and imbalance of clinical data pose considerable challenges for researchers, limiting the effectiveness of machine learning algorithms. To cope with this issue, this paper aims to introduce a novel clinical transcript data augmentation framework by leveraging large language models (CALLM). The framework follows a "patient-doctor role-playing" intuition to generate realistic synthetic data. In addition, our study introduces a unique "Textbook-Assignment-Application" (T-A-A) partitioning approach to offer a systematic means of crafting synthetic clinical interview datasets. Concurrently, we have also developed a "Response-Reason" prompt engineering paradigm to generate highly authentic and diagnostically valuable transcripts. By leveraging a fine-tuned DistilBERT model on the E-DAIC PTSD dataset, we achieved a balanced accuracy of 0.77, an F1-score of 0.70, and an AUC of 0.78 during test set evaluations, which showcase robust adaptability in both Zero-Shot Learning (ZSL) and Few-Shot Learning (FSL) scenarios. We further compare the CALLM framework with other data augmentation methods and PTSD diagnostic works and demonstrates consistent improvements. Compared to conventional data collection methods, our synthetic dataset not only demonstrates superior performance but also incurs less than 1% of the associated costs.

摘要

全球精神健康障碍的患病率正在上升,导致了估计达数万亿美元的巨大经济负担。在自动化精神健康诊断中,临床数据的稀缺和不平衡给研究人员带来了巨大挑战,限制了机器学习算法的有效性。为应对这一问题,本文旨在通过利用大语言模型(CALLM)引入一种新颖的临床记录数据增强框架。该框架遵循“医患角色扮演”的思路来生成逼真的合成数据。此外,我们的研究引入了一种独特的“教科书 - 作业 - 应用”(T - A - A)划分方法,以提供一种系统的方式来构建合成临床访谈数据集。同时,我们还开发了一种“回答 - 理由”提示工程范式,以生成高度真实且具有诊断价值的记录。通过在E - DAIC PTSD数据集上微调DistilBERT模型,我们在测试集评估中实现了0.77的平衡准确率、0.70的F1分数和0.78的AUC,这在零样本学习(ZSL)和少样本学习(FSL)场景中都展示了强大的适应性。我们进一步将CALLM框架与其他数据增强方法和PTSD诊断工作进行比较,并展示了持续的改进。与传统数据收集方法相比,我们的合成数据集不仅表现出卓越的性能,而且相关成本不到1%。

相似文献

1
CALLM: Enhancing Clinical Interview Analysis Through Data Augmentation With Large Language Models.CALLM:通过使用大语言模型进行数据增强来加强临床访谈分析
IEEE J Biomed Health Inform. 2024 Dec;28(12):7531-7542. doi: 10.1109/JBHI.2024.3435085. Epub 2024 Dec 5.
2
Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码:对一个使用医学文本的自动分析系统的评估
JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.
3
Enhancing data quality in medical concept normalization through large language models.通过大语言模型提高医学概念规范化中的数据质量。
J Biomed Inform. 2025 May;165:104812. doi: 10.1016/j.jbi.2025.104812. Epub 2025 Apr 1.
4
ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records.ChatGPT-4从电子健康记录中提取心力衰竭症状和体征
Prog Cardiovasc Dis. 2024 Nov-Dec;87:44-49. doi: 10.1016/j.pcad.2024.10.010. Epub 2024 Oct 21.
5
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
6
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架,在临床笔记上进行了微调,能够从电子健康记录中准确提取头痛频率。
Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.
7
HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing.健康提示:一种临床自然语言处理的零样本学习范式。
AMIA Annu Symp Proc. 2023 Apr 29;2022:972-981. eCollection 2022.
8
Interdisciplinary approach to identify language markers for post-traumatic stress disorder using machine learning and deep learning.采用跨学科方法,利用机器学习和深度学习技术识别创伤后应激障碍的语言标志物。
Sci Rep. 2024 May 30;14(1):12468. doi: 10.1038/s41598-024-61557-7.
9
Enhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records.利用自然语言处理技术从电子病历中研究领域标准识别,以增强创伤后应激障碍患者评估。
BMC Med Inform Decis Mak. 2024 Jun 4;24(1):154. doi: 10.1186/s12911-024-02554-8.
10
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.

引用本文的文献

1
Sentiment analysis in public health: a systematic review of the current state, challenges, and future directions.公共卫生中的情感分析:对当前状况、挑战及未来方向的系统综述
Front Public Health. 2025 Jun 20;13:1609749. doi: 10.3389/fpubh.2025.1609749. eCollection 2025.
2
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.生成式人工智能在心理健康领域的应用及伦理意义:系统综述
JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.
3
The Applications of Large Language Models in Mental Health: Scoping Review.
大语言模型在心理健康领域的应用:范围综述
J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.
4
Opinion: Mental health research: to augment or not to augment.观点:心理健康研究:是否进行增强研究
Front Psychiatry. 2025 Feb 18;16:1539157. doi: 10.3389/fpsyt.2025.1539157. eCollection 2025.