Wang Xiaofeng, Chen Shuai, Li Tao, Li Wanting, Zhou Yejie, Zheng Jie, Chen Qingcai, Yan Jun, Tang Buzhou
School of Communication, Shenzhen University, Shenzhen, China.
Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China.
JMIR Med Inform. 2020 Jul 29;8(7):e17958. doi: 10.2196/17958.
Depression is a serious personal and public mental health problem. Self-reporting is the main method used to diagnose depression and to determine the severity of depression. However, it is not easy to discover patients with depression owing to feelings of shame in disclosing or discussing their mental health conditions with others. Moreover, self-reporting is time-consuming, and usually leads to missing a certain number of cases. Therefore, automatic discovery of patients with depression from other sources such as social media has been attracting increasing attention. Social media, as one of the most important daily communication systems, connects large quantities of people, including individuals with depression, and provides a channel to discover patients with depression. In this study, we investigated deep-learning methods for depression risk prediction using data from Chinese microblogs, which have potential to discover more patients with depression and to trace their mental health conditions.
The aim of this study was to explore the potential of state-of-the-art deep-learning methods on depression risk prediction from Chinese microblogs.
Deep-learning methods with pretrained language representation models, including bidirectional encoder representations from transformers (BERT), robustly optimized BERT pretraining approach (RoBERTa), and generalized autoregressive pretraining for language understanding (XLNET), were investigated for depression risk prediction, and were compared with previous methods on a manually annotated benchmark dataset. Depression risk was assessed at four levels from 0 to 3, where 0, 1, 2, and 3 denote no inclination, and mild, moderate, and severe depression risk, respectively. The dataset was collected from the Chinese microblog Weibo. We also compared different deep-learning methods with pretrained language representation models in two settings: (1) publicly released pretrained language representation models, and (2) language representation models further pretrained on a large-scale unlabeled dataset collected from Weibo. Precision, recall, and F1 scores were used as performance evaluation measures.
Among the three deep-learning methods, BERT achieved the best performance with a microaveraged F1 score of 0.856. RoBERTa achieved the best performance with a macroaveraged F1 score of 0.424 on depression risk at levels 1, 2, and 3, which represents a new benchmark result on the dataset. The further pretrained language representation models demonstrated improvement over publicly released prediction models.
We applied deep-learning methods with pretrained language representation models to automatically predict depression risk using data from Chinese microblogs. The experimental results showed that the deep-learning methods performed better than previous methods, and have greater potential to discover patients with depression and to trace their mental health conditions.
抑郁症是一个严重的个人和公共心理健康问题。自我报告是用于诊断抑郁症和确定抑郁严重程度的主要方法。然而,由于患者在向他人透露或讨论其心理健康状况时会感到羞耻,因此很难发现抑郁症患者。此外,自我报告耗时较长,通常会导致遗漏一定数量的病例。因此,从社交媒体等其他来源自动发现抑郁症患者越来越受到关注。社交媒体作为最重要的日常交流系统之一,连接了大量人群,包括抑郁症患者,并提供了一个发现抑郁症患者的渠道。在本研究中,我们使用来自中国微博的数据,研究了用于抑郁症风险预测的深度学习方法,这些数据有潜力发现更多抑郁症患者并追踪他们的心理健康状况。
本研究旨在探索最先进的深度学习方法在基于中国微博进行抑郁症风险预测方面的潜力。
研究了带有预训练语言表示模型的深度学习方法,包括来自变换器的双向编码器表示(BERT)、稳健优化的BERT预训练方法(RoBERTa)以及用于语言理解的广义自回归预训练(XLNET),用于抑郁症风险预测,并在一个人工标注的基准数据集上与先前的方法进行比较。抑郁症风险从0到3分为四个级别评估,其中0、1、2和3分别表示无倾向、轻度、中度和重度抑郁症风险。该数据集从中国微博“微博”收集。我们还在两种设置下比较了不同的带有预训练语言表示模型的深度学习方法:(1)公开发布 的预训练语言表示模型,以及(2)在从微博收集的大规模未标注数据集上进一步预训练的语言表示模型。精确率、召回率和F1分数用作性能评估指标。
在这三种深度学习方法中,BERT取得了最佳性能,微平均F1分数为0.856。RoBERTa在1、2和3级抑郁症风险上取得了最佳性能,宏平均F1分数为0.424,这代表了该数据集上的一个新的基准结果。进一步预训练的语言表示模型比公开发布的预测模型表现出改进。
我们应用带有预训练语言表示模型的深度学习方法,利用来自中国微博的数据自动预测抑郁症风险。实验结果表明,深度学习方法比先前的方法表现更好,并且在发现抑郁症患者和追踪他们的心理健康状况方面有更大的潜力。