Zhang Yipeng, Lyu Hanjia, Liu Yubao, Zhang Xiyang, Wang Yu, Luo Jiebo
University of Rochester Rochester, NY United States.
University of Akron Akron, OH United States.
JMIR Infodemiology. 2021 Jul 18;1(1):e26769. doi: 10.2196/26769. eCollection 2021 Jan-Dec.
The COVID-19 pandemic has affected people's daily lives and has caused economic loss worldwide. Anecdotal evidence suggests that the pandemic has increased depression levels among the population. However, systematic studies of depression detection and monitoring during the pandemic are lacking.
This study aims to develop a method to create a large-scale depression user data set in an automatic fashion so that the method is scalable and can be adapted to future events; verify the effectiveness of transformer-based deep learning language models in identifying depression users from their everyday language; examine psychological text features' importance when used in depression classification; and, finally, use the model for monitoring the fluctuation of depression levels of different groups as the disease propagates.
To study this subject, we designed an effective regular expression-based search method and created the largest English Twitter depression data set containing 2575 distinct identified users with depression and their past tweets. To examine the effect of depression on people's Twitter language, we trained three transformer-based depression classification models on the data set, evaluated their performance with progressively increased training sizes, and compared the model's tweet chunk-level and user-level performances. Furthermore, inspired by psychological studies, we created a fusion classifier that combines deep learning model scores with psychological text features and users' demographic information, and investigated these features' relations to depression signals. Finally, we demonstrated our model's capability of monitoring both group-level and population-level depression trends by presenting two of its applications during the COVID-19 pandemic.
Our fusion model demonstrated an accuracy of 78.9% on a test set containing 446 people, half of which were identified as having depression. Conscientiousness, neuroticism, appearance of first person pronouns, talking about biological processes such as eat and sleep, talking about power, and exhibiting sadness were shown to be important features in depression classification. Further, when used for monitoring the depression trend, our model showed that depressive users, in general, responded to the pandemic later than the control group based on their tweets (n=500). It was also shown that three US states-New York, California, and Florida-shared a similar depression trend as the whole US population (n=9050). When compared to New York and California, people in Florida demonstrated a substantially lower level of depression.
This study proposes an efficient method that can be used to analyze the depression level of different groups of people on Twitter. We hope this study can raise awareness among researchers and the public of COVID-19's impact on people's mental health. The noninvasive monitoring system can also be readily adapted to other big events besides COVID-19 and can be useful during future outbreaks.
新冠疫情影响了人们的日常生活,并在全球范围内造成了经济损失。轶事证据表明,疫情加剧了民众的抑郁情绪。然而,目前缺乏关于疫情期间抑郁检测与监测的系统性研究。
本研究旨在开发一种方法,以自动方式创建大规模抑郁用户数据集,使该方法具有可扩展性并能适用于未来事件;验证基于Transformer的深度学习语言模型从日常语言中识别抑郁用户的有效性;研究心理文本特征在抑郁分类中的重要性;最后,利用该模型监测不同群体在疾病传播过程中抑郁水平的波动。
为研究该课题,我们设计了一种有效的基于正则表达式的搜索方法,并创建了最大的英文推特抑郁数据集,其中包含2575个已识别的不同抑郁用户及其过去的推文。为研究抑郁对人们推特语言的影响,我们在该数据集上训练了三个基于Transformer的抑郁分类模型,随着训练规模的逐步增加评估其性能,并比较模型在推文块级别和用户级别的性能。此外,受心理学研究启发,我们创建了一个融合分类器,将深度学习模型分数与心理文本特征及用户人口统计信息相结合,并研究这些特征与抑郁信号的关系。最后,我们通过展示该模型在新冠疫情期间的两个应用,证明了其监测群体层面和总体层面抑郁趋势的能力。
我们的融合模型在一个包含446人的测试集上的准确率为78.9%,其中一半被确定为患有抑郁症。尽责性、神经质、第一人称代词的出现、谈论饮食和睡眠等生物过程、谈论权力以及表现出悲伤被证明是抑郁分类中的重要特征。此外,当用于监测抑郁趋势时,我们的模型表明,总体而言,抑郁用户基于其推文(n = 500)对疫情的反应比对照组晚。研究还表明,美国的三个州——纽约、加利福尼亚和佛罗里达——与美国全体人口(n = 9050)的抑郁趋势相似。与纽约和加利福尼亚相比,佛罗里达的人们抑郁水平明显较低。
本研究提出了一种可用于分析推特上不同人群抑郁水平的有效方法。我们希望这项研究能够提高研究人员和公众对新冠疫情对人们心理健康影响的认识。这种非侵入性监测系统也可以很容易地适用于除新冠疫情之外的其他重大事件,并在未来疫情爆发期间发挥作用。