Scherbakov Dmitry A, Hubig Nina C, Lenert Leslie A, Alekseyenko Alexander V, Obeid Jihad S
Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States.
Interdisciplinary Transformation University, Linz, Austria.
JMIR Ment Health. 2025 Jan 16;12:e67192. doi: 10.2196/67192.
The use of natural language processing (NLP) in mental health research is increasing, with a wide range of applications and datasets being investigated.
This review aims to summarize the use of NLP in mental health research, with a special focus on the types of text datasets and the use of social determinants of health (SDOH) in NLP projects related to mental health.
The search was conducted in September 2024 using a broad search strategy in PubMed, Scopus, and CINAHL Complete. All citations were uploaded to Covidence (Veritas Health Innovation) software. The screening and extraction process took place in Covidence with the help of a custom large language model (LLM) module developed by our team. This LLM module was calibrated and tuned to automate many aspects of the review process.
The screening process, assisted by the custom LLM, led to the inclusion of 1768 studies in the final review. Most of the reviewed studies (n=665, 42.8%) used clinical data as their primary text dataset, followed by social media datasets (n=523, 33.7%). The United States contributed the highest number of studies (n=568, 36.6%), with depression (n=438, 28.2%) and suicide (n=240, 15.5%) being the most frequently investigated mental health issues. Traditional demographic variables, such as age (n=877, 56.5%) and gender (n=760, 49%), were commonly extracted, while SDOH factors were less frequently reported, with urban or rural status being the most used (n=19, 1.2%). Over half of the citations (n=826, 53.2%) did not provide clear information on dataset accessibility, although a sizable number of studies (n=304, 19.6%) made their datasets publicly available.
This scoping review underscores the significant role of clinical notes and social media in NLP-based mental health research. Despite the clear relevance of SDOH to mental health, their underutilization presents a gap in current research. This review can be a starting point for researchers looking for an overview of mental health projects using text data. Shared datasets could be used to place more emphasis on SDOH in future studies.
自然语言处理(NLP)在心理健康研究中的应用日益增加,目前正在对广泛的应用和数据集进行研究。
本综述旨在总结NLP在心理健康研究中的应用,特别关注文本数据集的类型以及健康社会决定因素(SDOH)在与心理健康相关的NLP项目中的应用。
2024年9月,我们在PubMed、Scopus和CINAHL Complete数据库中使用广泛的检索策略进行了检索。所有文献均上传至Covidence(Veritas Health Innovation)软件。在我们团队开发的定制大语言模型(LLM)模块的帮助下,在Covidence中进行筛选和提取过程。该LLM模块经过校准和调整,以实现综述过程的许多方面自动化。
在定制LLM的辅助下,筛选过程最终纳入了1768项研究进行综述。大多数被综述的研究(n = 665,42.8%)将临床数据作为其主要文本数据集,其次是社交媒体数据集(n = 523,33.7%)。美国的研究数量最多(n = 568,36.6%),抑郁症(n = 438,28.2%)和自杀(n = 240,15.5%)是最常被研究的心理健康问题。传统的人口统计学变量,如年龄(n = 877,56.5%)和性别(n = 760,49%),被普遍提取,而SDOH因素的报告较少,其中城乡状况是最常被使用的(n = 19,1.2%)。超过一半的文献(n = 826,53.2%)没有提供关于数据集可获取性的明确信息,尽管有相当数量的研究(n = 304,19.6%)将其数据集公开。
本范围综述强调了临床记录和社交媒体在基于NLP的心理健康研究中的重要作用。尽管SDOH与心理健康明显相关,但其利用不足在当前研究中存在差距。本综述可为寻求使用文本数据的心理健康项目概述的研究人员提供一个起点。共享数据集可用于在未来研究中更加强调SDOH。