Xu Yijun, Fang Zhaoxi, Lin Weinan, Jiang Yue, Jin Wen, Balaji Prasanalakshmi, Wang Jiangda, Xia Ting
Department of Computer Science and Engineering, Shaoxing University, Shaoxing, China.
Institute of Artificial Intelligence, Shaoxing University, Shaoxing, China.
Front Psychiatry. 2025 Aug 6;16:1646974. doi: 10.3389/fpsyt.2025.1646974. eCollection 2025.
Large language models (LLMs) have opened up new possibilities in the field of mental health, offering applications in areas such as mental health assessment, psychological counseling, and education. This study systematically evaluates 15 state-of-the-art LLMs, including DeepSeekR1/V3 (March 24, 2025), GPT-4.1 (April 15, 2025), Llama4 (April 5, 2025), and QwQ (March 6, 2025, developed by Alibaba), on two key tasks: mental health knowledge testing and mental illness diagnosis in the Chinese context. We use publicly available datasets, including Dreaddit, SDCNL, and questions from the CAS Counsellor Qualification Exam. Results indicate that DeepSeek-R1, QwQ, and GPT-4.1 outperform other models in both knowledge accuracy and diagnostic performance. Our findings highlight the strengths and limitations of current LLMs in Chinese mental health scenarios and provide clear guidance for selecting and improving models in this sensitive domain.
Front Psychiatry. 2025-8-6
J Med Internet Res. 2025-7-11
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024-3
Eur J Investig Health Psychol Educ. 2025-1-18
JMIR Ment Health. 2024-10-18
JMIR Ment Health. 2024-7-29