• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

心理语言模型:通过在线文本数据利用大语言模型进行心理健康预测。

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

作者信息

Xu Xuhai, Yao Bingsheng, Dong Yuanzhe, Gabriel Saadia, Yu Hong, Hendler James, Ghassemi Marzyeh, Dey Anind K, Wang Dakuo

机构信息

Massachusetts Institute of Technology & University of Washington, USA.

Rensselaer Polytechnic Institute, USA.

出版信息

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.

DOI:10.1145/3643540
PMID:39925940
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11806945/
Abstract

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.

摘要

大语言模型(LLMs)的进展推动了各种应用。然而,在理解和增强LLMs在心理健康领域的能力方面,研究仍存在显著差距。在这项工作中,我们通过在线文本数据对多个LLMs在各种心理健康预测任务上进行了全面评估,包括Alpaca、Alpaca-LoRA、FLAN-T5、GPT-3.5和GPT-4。我们进行了广泛的实验,涵盖零样本提示、少样本提示和指令微调。结果表明,对于心理健康任务,零样本和少样本提示设计的LLMs表现出有前景但有限的性能。更重要的是,我们的实验表明,指令微调可以同时显著提高LLMs在所有任务上的性能。我们经过最佳微调的模型Mental-Alpaca和Mental-FLAN-T5,在平衡准确率上比GPT-3.5的最佳提示设计(大25倍和15倍)高出10.9%,比GPT-4的最佳设计(大250倍和150倍)高出4.8%。它们的表现进一步与最先进的特定任务语言模型相当。我们还对LLMs在心理健康推理任务上的能力进行了探索性案例研究,展示了某些模型(如GPT-4)的有前景的能力。我们将研究结果总结为一套行动指南,用于增强LLMs心理健康任务能力的潜在方法。同时,我们也强调在实际心理健康环境中实现可部署性之前的重要局限性,如已知的种族和性别偏见。我们突出了这一研究方向伴随的重要伦理风险。

相似文献

1
Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.心理语言模型:通过在线文本数据利用大语言模型进行心理健康预测。
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.
2
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
3
Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量
Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.
4
Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.使用微调的大语言模型增强语义文本理解:以Quora问题对重复识别为例的研究
PLoS One. 2025 Jan 10;20(1):e0317042. doi: 10.1371/journal.pone.0317042. eCollection 2025.
5
Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.中文医学命名实体识别中大型语言模型的比较分析
Bioengineering (Basel). 2024 Sep 29;11(10):982. doi: 10.3390/bioengineering11100982.
6
Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.用于个性化风险评估的生成式大语言模型驱动的对话式人工智能应用程序:COVID-19案例研究
JMIR AI. 2025 Mar 27;4:e67363. doi: 10.2196/67363.
7
Open-source LLMs for text annotation: a practical guide for model setting and fine-tuning.用于文本标注的开源语言模型:模型设置与微调实用指南。
J Comput Soc Sci. 2025;8(1):17. doi: 10.1007/s42001-024-00345-9. Epub 2024 Dec 18.
8
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
9
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
10
SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction.敏感癌症GPT:利用生成式大语言模型处理结构化组学数据以优化药物敏感性预测。
bioRxiv. 2025 Mar 3:2025.02.27.640661. doi: 10.1101/2025.02.27.640661.

引用本文的文献

1
Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media.心理健康数据集上的监督学习与大语言模型基准:中国社交媒体中的认知扭曲与自杀风险
Bioengineering (Basel). 2025 Aug 19;12(8):882. doi: 10.3390/bioengineering12080882.
2
Evaluation of large language models on mental health: from knowledge test to illness diagnosis.大型语言模型在心理健康方面的评估:从知识测试到疾病诊断。
Front Psychiatry. 2025 Aug 6;16:1646974. doi: 10.3389/fpsyt.2025.1646974. eCollection 2025.
3
Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
Diagnosing psychiatric disorders from history of present illness using a large-scale linguistic model.使用大规模语言模型从现病史诊断精神障碍。
Psychiatry Clin Neurosci. 2023 Nov;77(11):597-604. doi: 10.1111/pcn.13580. Epub 2023 Sep 7.
3
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.
用于从电子健康记录中提取精神疾病表型的大语言模型
medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.
4
Multimodal Sensing-Enabled Large Language Models for Automated Emotional Regulation: A Review of Current Technologies, Opportunities, and Challenges.用于自动情绪调节的多模态传感大语言模型:当前技术、机遇与挑战综述
Sensors (Basel). 2025 Aug 1;25(15):4763. doi: 10.3390/s25154763.
5
Leveraging large language models for automated depression screening.利用大语言模型进行自动抑郁症筛查。
PLOS Digit Health. 2025 Jul 28;4(7):e0000943. doi: 10.1371/journal.pdig.0000943. eCollection 2025 Jul.
6
Digital psychiatry: concepts, framework, and implications.数字精神病学:概念、框架及影响
Front Psychiatry. 2025 Jul 4;16:1572444. doi: 10.3389/fpsyt.2025.1572444. eCollection 2025.
7
Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.利用计算语言学和机器学习检测青少年心理健康障碍的超高风险。
Schizophrenia (Heidelb). 2025 Jul 15;11(1):98. doi: 10.1038/s41537-025-00649-3.
8
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.生成式人工智能在心理健康领域的应用及伦理意义:系统综述
JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.
9
Vega: LLM-Driven Intelligent Chatbot Platform for Internet of Things Control and Development.维加:用于物联网控制与开发的由大语言模型驱动的智能聊天机器人平台。
Sensors (Basel). 2025 Jun 18;25(12):3809. doi: 10.3390/s25123809.
10
: enhancing the transferability of large language models for depression detection using free-text explanations.利用自由文本解释提高大语言模型在抑郁症检测中的可迁移性。
Front Artif Intell. 2025 May 21;8:1564828. doi: 10.3389/frai.2025.1564828. eCollection 2025.
ChatDoctor:一种基于医学领域知识对大型语言模型Meta-AI(LLaMA)进行微调的医学聊天模型。
Cureus. 2023 Jun 24;15(6):e40895. doi: 10.7759/cureus.40895. eCollection 2023 Jun.
4
Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。
Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.
5
Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data.判断事实,判断规范:训练机器学习模型来判断人类需要一种修改后的标记数据方法。
Sci Adv. 2023 May 10;9(19):eabq0701. doi: 10.1126/sciadv.abq0701.
6
A Call to Action on Assessing and Mitigating Bias in Artificial Intelligence Applications for Mental Health.呼吁重视并减轻人工智能应用于精神健康领域中的偏见
Perspect Psychol Sci. 2023 Sep;18(5):1062-1096. doi: 10.1177/17456916221134490. Epub 2022 Dec 9.
7
Social media-based interventions for adolescent and young adult mental health: A scoping review.基于社交媒体的青少年和青年心理健康干预措施:一项范围综述。
Internet Interv. 2022 Sep 28;30:100578. doi: 10.1016/j.invent.2022.100578. eCollection 2022 Dec.
8
Machine learning models to detect anxiety and depression through social media: A scoping review.通过社交媒体检测焦虑和抑郁的机器学习模型:一项范围综述。
Comput Methods Programs Biomed Update. 2022;2:100066. doi: 10.1016/j.cmpbup.2022.100066. Epub 2022 Sep 9.
9
Ethical Machine Learning in Healthcare.医疗保健中的伦理机器学习。
Annu Rev Biomed Data Sci. 2021 Jul;4:123-144. doi: 10.1146/annurev-biodatasci-092820-114757. Epub 2021 May 6.
10
ReVibe: A Context-assisted Evening Recall Approach to Improve Self-report Adherence.ReVibe:一种用于提高自我报告依从性的情境辅助夜间回忆方法。
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2019 Dec;3(4):1-27. doi: 10.1145/3369806. Epub 2019 Dec 11.