Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand.
Puey Ungphakorn Institute for Economic Research, Bank of Thailand, Bangkok, Thailand.
J Biomed Inform. 2022 Sep;133:104145. doi: 10.1016/j.jbi.2022.104145. Epub 2022 Jul 28.
In many countries, mental health issues are among the most serious public health concerns. National mental health statistics are frequently collected from reported patient cases or government-sponsored surveys, which have restricted coverage, frequency, and timeliness. Many domains of study, including public healthcare and biomedical informatics, have recently adopted social media data as a feasible real-time alternative to traditional methods of gathering representative information at the population level in a variety of contexts. However, because of the limits of fundamental natural language processing tools and labeled corpora in countries with limited natural language resources, such as Thailand, implementing social media systems to monitor mental health signals could be challenging. This paper presents LAPoMM, a novel framework for monitoring real-time mental health indicators from social media data without using labeled datasets in low-resource languages. Specifically, we use cross-lingual methods to train language-agnostic models and validate our framework by examining cross-correlations between the aggregate predicted mental signals and real-world administrative data from Thailand's Department of Mental Health, which includes monthly depression patients and reported cases of suicidal attempts. A combination of a language-agnostic representation and a deep learning classification model outperforms all other cross-lingual techniques for recognizing various mental signals in tweets, such as emotions, sentiments, and suicidal tendencies. The correlation analyses discover a strong positive relationship between actual depression cases and the predicted negative sentiment signals as well as suicide attempts and negative signals (e.g., fear, sadness, and disgust) and suicidal tendency. These findings establish the effectiveness of our proposed framework and its potential applications in monitoring population-level mental health using large-scale social media data. Furthermore, because the language-agnostic model utilized in the methodology is capable of supporting a wide range of languages, the proposed LAPoMM framework can be easily generalized for analogous applications in other countries with limited language resources.
在许多国家,心理健康问题是最严重的公共卫生问题之一。国家心理健康统计数据通常是从报告的患者病例或政府资助的调查中收集的,这些数据的覆盖范围、频率和及时性都受到限制。包括公共卫生保健和生物医学信息学在内的许多研究领域最近都采用了社交媒体数据作为一种可行的替代传统方法,以便在各种情况下实时收集具有代表性的人群信息。然而,由于基本自然语言处理工具和标签语料库的限制,以及在自然语言资源有限的国家(如泰国),实施社交媒体系统来监测心理健康信号可能具有挑战性。本文提出了 LAPoMM,这是一种从社交媒体数据中监测实时心理健康指标的新框架,而无需在低资源语言中使用标记数据集。具体来说,我们使用跨语言方法来训练语言不可知模型,并通过检查汇总预测的心理健康信号与泰国心理健康部的实际行政数据之间的交叉相关性来验证我们的框架,该数据包括每月的抑郁症患者和自杀未遂报告病例。语言不可知表示和深度学习分类模型的组合在识别推文中的各种心理健康信号(例如情绪、情感和自杀倾向)方面优于所有其他跨语言技术。相关分析发现,实际的抑郁症病例与预测的负面情绪信号以及自杀未遂与负面信号(例如恐惧、悲伤和厌恶)和自杀倾向之间存在很强的正相关关系。这些发现确立了我们提出的框架的有效性及其在使用大规模社交媒体数据监测人群心理健康方面的潜在应用。此外,由于该方法中使用的语言不可知模型能够支持多种语言,因此所提出的 LAPoMM 框架可以很容易地推广到其他语言资源有限的国家的类似应用中。