心理语言模型：通过在线文本数据利用大语言模型进行心理健康预测。

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

作者信息

Xu Xuhai, Yao Bingsheng, Dong Yuanzhe, Gabriel Saadia, Yu Hong, Hendler James, Ghassemi Marzyeh, Dey Anind K, Wang Dakuo

机构信息

Massachusetts Institute of Technology & University of Washington, USA.

Rensselaer Polytechnic Institute, USA.

出版信息

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.

DOI:10.1145/3643540

PMID:39925940

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11806945/

Abstract

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.

摘要

大语言模型（LLMs）的进展推动了各种应用。然而，在理解和增强LLMs在心理健康领域的能力方面，研究仍存在显著差距。在这项工作中，我们通过在线文本数据对多个LLMs在各种心理健康预测任务上进行了全面评估，包括Alpaca、Alpaca-LoRA、FLAN-T5、GPT-3.5和GPT-4。我们进行了广泛的实验，涵盖零样本提示、少样本提示和指令微调。结果表明，对于心理健康任务，零样本和少样本提示设计的LLMs表现出有前景但有限的性能。更重要的是，我们的实验表明，指令微调可以同时显著提高LLMs在所有任务上的性能。我们经过最佳微调的模型Mental-Alpaca和Mental-FLAN-T5，在平衡准确率上比GPT-3.5的最佳提示设计（大25倍和15倍）高出10.9%，比GPT-4的最佳设计（大250倍和150倍）高出4.8%。它们的表现进一步与最先进的特定任务语言模型相当。我们还对LLMs在心理健康推理任务上的能力进行了探索性案例研究，展示了某些模型（如GPT-4）的有前景的能力。我们将研究结果总结为一套行动指南，用于增强LLMs心理健康任务能力的潜在方法。同时，我们也强调在实际心理健康环境中实现可部署性之前的重要局限性，如已知的种族和性别偏见。我们突出了这一研究方向伴随的重要伦理风险。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

心理语言模型：通过在线文本数据利用大语言模型进行心理健康预测。

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

心理语言模型：通过在线文本数据利用大语言模型进行心理健康预测。

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献