Owen David, Antypas Dimosthenis, Hassoulas Athanasios, Pardiñas Antonio F, Espinosa-Anke Luis, Collados Jose Camacho
School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom.
Centre for Medical Education, School of Medicine, Cardiff University, Cardiff, United Kingdom.
JMIR AI. 2023 Mar 24;2:e41205. doi: 10.2196/41205.
Major depressive disorder is a common mental disorder affecting 5% of adults worldwide. Early contact with health care services is critical for achieving accurate diagnosis and improving patient outcomes. Key symptoms of major depressive disorder (depression hereafter) such as cognitive distortions are observed in verbal communication, which can also manifest in the structure of written language. Thus, the automatic analysis of text outputs may provide opportunities for early intervention in settings where written communication is rich and regular, such as social media and web-based forums.
The objective of this study was 2-fold. We sought to gauge the effectiveness of different machine learning approaches to identify users of the mass web-based forum Reddit, who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date was a relevant factor in performing this detection.
A total of 2 Reddit data sets containing posts belonging to users with and without a history of depression diagnosis were obtained. The intersection of these data sets provided users with an estimated date of depression diagnosis. This derived data set was used as an input for several machine learning classifiers, including transformer-based language models (LMs).
Bidirectional Encoder Representations from Transformers (BERT) and MentalBERT transformer-based LMs proved the most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean -score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3-4 months) of posts before a depressed user's estimated diagnosis date are the most indicative of their illness, with data before that period not helping the models detect more accurately. Furthermore, in the 4- to 8-week period before the user's estimated diagnosis date, their posts exhibited more negative sentiment than any other 4-week period in their post history.
Transformer-based LMs may be used on data from web-based social media forums to identify users at risk for psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental health care interventions to support those at risk for this condition.
重度抑郁症是一种常见的精神障碍,全球5%的成年人受其影响。尽早联系医疗服务对于实现准确诊断和改善患者预后至关重要。重度抑郁症(以下简称抑郁症)的关键症状,如认知扭曲,在言语交流中可见,也可能体现在书面语言结构中。因此,在书面交流丰富且频繁的环境中,如社交媒体和网络论坛,对文本输出进行自动分析可能为早期干预提供机会。
本研究有两个目标。我们试图评估不同机器学习方法识别大规模网络论坛Reddit用户的有效性,这些用户最终被诊断为患有抑郁症。然后,我们旨在确定论坛帖子发布时间与抑郁症诊断日期之间的时间间隔是否是进行这种检测的一个相关因素。
总共获得了2个Reddit数据集,其中包含有抑郁症诊断史和无抑郁症诊断史用户的帖子。这些数据集的交集为用户提供了抑郁症诊断的估计日期。这个派生数据集被用作几个机器学习分类器的输入,包括基于Transformer的语言模型(LMs)。
基于Transformer的双向编码器表征(BERT)和MentalBERT语言模型在区分已知患有抑郁症的论坛用户和未患抑郁症的用户方面最为有效。在用于二元分类的实验设置中,它们各自的平均得分均为0.64。结果还表明,在抑郁症患者估计诊断日期前的最后12至16周(约3 - 4个月)的帖子最能表明其病情,在此之前的数据无助于模型更准确地检测。此外,在用户估计诊断日期前的4至8周内,他们的帖子比其发帖历史中的任何其他4周时间段表现出更多的负面情绪。
基于Transformer的语言模型可用于基于网络的社交媒体论坛数据,以识别有患抑郁症等精神疾病风险的用户。这些分类器提取的语言特征可能在抑郁症发作前数周甚至数月出现,从而能够进行积极的精神卫生保健干预,以支持有患此病风险的人群。