Qi Hongzhi, Fu Guanghui, Li Jianqiang, Song Changwei, Zhai Wei, Luo Dan, Liu Shuo, Yu Yijing, Yang Bingxiang, Zhao Qing
College of Computer Science, Beijing University of Technology, Beijing 100124, China.
Institut du Cerveau-Paris Brain Institute-ICM, Sorbonne Université, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, 75013 Paris, France.
Bioengineering (Basel). 2025 Aug 19;12(8):882. doi: 10.3390/bioengineering12080882.
On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification, which contains 1249 posts, and SocialCD-3K, a multi-label classification dataset for cognitive distortion detection that contains 3407 posts. We conduct a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experiment with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluate the performance of the LLMs after fine-tuning on the proposed tasks. Experimental results show a significant performance gap between prompted LLMs and supervised learning. Our best supervised model achieves strong results, with an F1-score of 82.76% for the high-risk class in the suicide task and a micro-averaged F1-score of 76.10% for the cognitive distortion task. Without fine-tuning, the best-performing LLM lags by 6.95 percentage points in the suicide task and a more pronounced 31.53 points in the cognitive distortion task. Fine-tuning substantially narrows this performance gap to 4.31% and 3.14% for the respective tasks. While this research highlights the potential of LLMs in psychological contexts, it also shows that supervised learning remains necessary for more challenging tasks.
在社交媒体上,用户经常表达个人感受,在某些特定话题上可能表现出认知扭曲甚至自杀倾向。早期识别这些迹象对于有效的心理干预至关重要。在本文中,我们介绍了两个来自中国社交媒体的新数据集:用于自杀风险分类的SOS-HL-1K,包含1249篇帖子;以及SocialCD-3K,一个用于认知扭曲检测的多标签分类数据集,包含3407篇帖子。我们使用两种监督学习方法和八个大语言模型(LLMs)对提出的数据集进行了全面评估。从提示工程的角度来看,我们试验了两种类型的提示策略,包括四种零样本和五种少样本策略。我们还评估了在提出的任务上微调后LLMs的性能。实验结果表明,提示的LLMs和监督学习之间存在显著的性能差距。我们最好的监督模型取得了不错的结果,在自杀任务中高风险类别的F1分数为82.76%,在认知扭曲任务中的微平均F1分数为76.10%。在没有微调的情况下,表现最佳的LLM在自杀任务中落后6.95个百分点,在认知扭曲任务中落后更为明显,达31.53个百分点。微调大大缩小了各自任务的性能差距,分别为4.31%和3.14%。虽然这项研究突出了LLMs在心理背景下的潜力,但也表明对于更具挑战性的任务,监督学习仍然是必要的。