Wang Liying, Bhanushali Tanmay, Huang Zhuoran, Yang Jingyi, Badami Sukriti, Hightow-Weidman Lisa
Institute on Digital Health and Innovation, College of Nursing, Florida State University, 222 S Copeland St, Tallahassee, FL, 32306, United States, 1 (850) 644-3296.
Center of Population Sciences for Health Equity, College of Nursing, Florida State University, Tallahassee, FL, United States.
JMIR Ment Health. 2025 May 15;12:e70014. doi: 10.2196/70014.
BACKGROUND: The global shortage of mental health professionals, exacerbated by increasing mental health needs post COVID-19, has stimulated growing interest in leveraging large language models to address these challenges. OBJECTIVES: This systematic review aims to evaluate the current capabilities of generative artificial intelligence (GenAI) models in the context of mental health applications. METHODS: A comprehensive search across 5 databases yielded 1046 references, of which 8 studies met the inclusion criteria. The included studies were original research with experimental designs (eg, Turing tests, sociocognitive tasks, trials, or qualitative methods); a focus on GenAI models; and explicit measurement of sociocognitive abilities (eg, empathy and emotional awareness), mental health outcomes, and user experience (eg, perceived trust and empathy). RESULTS: The studies, published between 2023 and 2024, primarily evaluated models such as ChatGPT-3.5 and 4.0, Bard, and Claude in tasks such as psychoeducation, diagnosis, emotional awareness, and clinical interventions. Most studies used zero-shot prompting and human evaluators to assess the AI responses, using standardized rating scales or qualitative analysis. However, these methods were often insufficient to fully capture the complexity of GenAI capabilities. The reliance on single-shot prompting techniques, limited comparisons, and task-based assessments isolated from a context may oversimplify GenAI's abilities and overlook the nuances of human-artificial intelligence interaction, especially in clinical applications that require contextual reasoning and cultural sensitivity. The findings suggest that while GenAI models demonstrate strengths in psychoeducation and emotional awareness, their diagnostic accuracy, cultural competence, and ability to engage users emotionally remain limited. Users frequently reported concerns about trustworthiness, accuracy, and the lack of emotional engagement. CONCLUSIONS: Future research could use more sophisticated evaluation methods, such as few-shot and chain-of-thought prompting to fully uncover GenAI's potential. Longitudinal studies and broader comparisons with human benchmarks are needed to explore the effects of GenAI-integrated mental health care.
JMIR Ment Health. 2025-5-15
JMIR Ment Health. 2025-1-20
JMIR Ment Health. 2024-10-17
NPP Digit Psychiatry Neurosci. 2024
Diagnostics (Basel). 2024-4-15
Npj Ment Health Res. 2024-1-22
J Biomed Inform. 2024-3
J Multidiscip Healthc. 2024-1-31
Curr Psychiatry Rep. 2023-12
Front Digit Health. 2023-11-8