心理健康数据集上的监督学习与大语言模型基准：中国社交媒体中的认知扭曲与自杀风险

Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media.

作者信息

Qi Hongzhi, Fu Guanghui, Li Jianqiang, Song Changwei, Zhai Wei, Luo Dan, Liu Shuo, Yu Yijing, Yang Bingxiang, Zhao Qing

机构信息

College of Computer Science, Beijing University of Technology, Beijing 100124, China.

Institut du Cerveau-Paris Brain Institute-ICM, Sorbonne Université, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, 75013 Paris, France.

出版信息

Bioengineering (Basel). 2025 Aug 19;12(8):882. doi: 10.3390/bioengineering12080882.

DOI:10.3390/bioengineering12080882

PMID:40868395

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12383806/

Abstract

On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification, which contains 1249 posts, and SocialCD-3K, a multi-label classification dataset for cognitive distortion detection that contains 3407 posts. We conduct a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experiment with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluate the performance of the LLMs after fine-tuning on the proposed tasks. Experimental results show a significant performance gap between prompted LLMs and supervised learning. Our best supervised model achieves strong results, with an F1-score of 82.76% for the high-risk class in the suicide task and a micro-averaged F1-score of 76.10% for the cognitive distortion task. Without fine-tuning, the best-performing LLM lags by 6.95 percentage points in the suicide task and a more pronounced 31.53 points in the cognitive distortion task. Fine-tuning substantially narrows this performance gap to 4.31% and 3.14% for the respective tasks. While this research highlights the potential of LLMs in psychological contexts, it also shows that supervised learning remains necessary for more challenging tasks.

摘要

在社交媒体上，用户经常表达个人感受，在某些特定话题上可能表现出认知扭曲甚至自杀倾向。早期识别这些迹象对于有效的心理干预至关重要。在本文中，我们介绍了两个来自中国社交媒体的新数据集：用于自杀风险分类的SOS-HL-1K，包含1249篇帖子；以及SocialCD-3K，一个用于认知扭曲检测的多标签分类数据集，包含3407篇帖子。我们使用两种监督学习方法和八个大语言模型（LLMs）对提出的数据集进行了全面评估。从提示工程的角度来看，我们试验了两种类型的提示策略，包括四种零样本和五种少样本策略。我们还评估了在提出的任务上微调后LLMs的性能。实验结果表明，提示的LLMs和监督学习之间存在显著的性能差距。我们最好的监督模型取得了不错的结果，在自杀任务中高风险类别的F1分数为82.76%，在认知扭曲任务中的微平均F1分数为76.10%。在没有微调的情况下，表现最佳的LLM在自杀任务中落后6.95个百分点，在认知扭曲任务中落后更为明显，达31.53个百分点。微调大大缩小了各自任务的性能差距，分别为4.31%和3.14%。虽然这项研究突出了LLMs在心理背景下的潜力，但也表明对于更具挑战性的任务，监督学习仍然是必要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a2bf/12383806/5ddb63c7338a/bioengineering-12-00882-g001.jpg

相似文献

Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media.

Bioengineering (Basel). 2025 Aug 19;12(8):882. doi: 10.3390/bioengineering12080882.

Prescription of Controlled Substances: Benefits and Risks

Improving Suicidal Ideation Detection in Social Media Posts: Topic Modeling and Synthetic Data Augmentation Approach.

JMIR Form Res. 2025 Jun 11;9:e63272. doi: 10.2196/63272.

Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.

medRxiv. 2025 Aug 12:2025.08.08.25333315. doi: 10.1101/2025.08.08.25333315.

Stigma Management Strategies of Autistic Social Media Users.

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction.

J Med Internet Res. 2025 Jul 8;27:e75347. doi: 10.2196/75347.

Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study.

JMIR Med Inform. 2025 Aug 18;13:e68955. doi: 10.2196/68955.

Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.

J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study.

JMIR Med Inform. 2025 Jun 20;13:e75103. doi: 10.2196/75103.

Sexual Harassment and Prevention Training

本文引用的文献

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.

Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.

JMIR Ment Health. 2024 May 16;11:e57234. doi: 10.2196/57234.

How to write effective prompts for large language models.

Nat Hum Behav. 2024 Apr;8(4):611-615. doi: 10.1038/s41562-024-01847-2.

ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.

Eur Radiol. 2024 May;34(5):2817-2825. doi: 10.1007/s00330-023-10213-1. Epub 2023 Oct 5.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Health system-scale language models are all-purpose prediction engines.

Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.

Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.

Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.

Generating scholarly content with ChatGPT: ethical challenges for medical publishing.

Lancet Digit Health. 2023 Mar;5(3):e105-e106. doi: 10.1016/S2589-7500(23)00019-5. Epub 2023 Feb 6.

A review on sentiment analysis and emotion detection from text.

Soc Netw Anal Min. 2021;11(1):81. doi: 10.1007/s13278-021-00776-6. Epub 2021 Aug 28.

Distant Supervision for Mental Health Management in Social Media: Suicide Risk Classification System Development Study.

J Med Internet Res. 2021 Aug 26;23(8):e26119. doi: 10.2196/26119.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

心理健康数据集上的监督学习与大语言模型基准：中国社交媒体中的认知扭曲与自杀风险

Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献