Suppr超能文献

评估中国社交媒体中的自杀风险和情绪困扰:一项文本挖掘与机器学习研究。

Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study.

作者信息

Cheng Qijin, Li Tim Mh, Kwok Chi-Leung, Zhu Tingshao, Yip Paul Sf

机构信息

HKJC Center for Suicide Research and Prevention, The University of Hong Kong, Hong Kong, China (Hong Kong).

Department of Paediatrics & Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China (Hong Kong).

出版信息

J Med Internet Res. 2017 Jul 10;19(7):e243. doi: 10.2196/jmir.7276.

Abstract

BACKGROUND

Early identification and intervention are imperative for suicide prevention. However, at-risk people often neither seek help nor take professional assessment. A tool to automatically assess their risk levels in natural settings can increase the opportunity for early intervention.

OBJECTIVE

The aim of this study was to explore whether computerized language analysis methods can be utilized to assess one's suicide risk and emotional distress in Chinese social media.

METHODS

A Web-based survey of Chinese social media (ie, Weibo) users was conducted to measure their suicide risk factors including suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress levels. Participants' Weibo posts published in the public domain were also downloaded with their consent. The Weibo posts were parsed and fitted into Simplified Chinese-Linguistic Inquiry and Word Count (SC-LIWC) categories. The associations between SC-LIWC features and the 5 suicide risk factors were examined by logistic regression. Furthermore, the support vector machine (SVM) model was applied based on the language features to automatically classify whether a Weibo user exhibited any of the 5 risk factors.

RESULTS

A total of 974 Weibo users participated in the survey. Those with high suicide probability were marked by a higher usage of pronoun (odds ratio, OR=1.18, P=.001), prepend words (OR=1.49, P=.02), multifunction words (OR=1.12, P=.04), a lower usage of verb (OR=0.78, P<.001), and a greater total word count (OR=1.007, P=.008). Second-person plural was positively associated with severe depression (OR=8.36, P=.01) and stress (OR=11, P=.005), whereas work-related words were negatively associated with WSC (OR=0.71, P=.008), severe depression (OR=0.56, P=.005), and anxiety (OR=0.77, P=.02). Inconsistently, third-person plural was found to be negatively associated with WSC (OR=0.02, P=.047) but positively with severe stress (OR=41.3, P=.04). Achievement-related words were positively associated with depression (OR=1.68, P=.003), whereas health- (OR=2.36, P=.004) and death-related (OR=2.60, P=.01) words positively associated with stress. The machine classifiers did not achieve satisfying performance in the full sample set but could classify high suicide probability (area under the curve, AUC=0.61, P=.04) and severe anxiety (AUC=0.75, P<.001) among those who have exhibited WSC.

CONCLUSIONS

SC-LIWC is useful to examine language markers of suicide risk and emotional distress in Chinese social media and can identify characteristics different from previous findings in the English literature. Some findings are leading to new hypotheses for future verification. Machine classifiers based on SC-LIWC features are promising but still require further optimization for application in real life.

摘要

背景

早期识别和干预对于预防自杀至关重要。然而,有自杀风险的人往往既不寻求帮助,也不接受专业评估。一种能在自然环境中自动评估其风险水平的工具可以增加早期干预的机会。

目的

本研究旨在探讨计算机化语言分析方法是否可用于评估中国社交媒体用户的自杀风险和情绪困扰。

方法

对中国社交媒体(即微博)用户进行了一项基于网络的调查,以测量他们的自杀风险因素,包括自杀可能性、微博自杀交流(WSC)、抑郁、焦虑和压力水平。在参与者同意的情况下,还下载了他们在公共领域发布的微博帖子。对微博帖子进行解析,并将其归入简体中文语言查询与字数统计(SC-LIWC)类别。通过逻辑回归分析SC-LIWC特征与5种自杀风险因素之间的关联。此外,基于语言特征应用支持向量机(SVM)模型自动分类微博用户是否表现出5种风险因素中的任何一种。

结果

共有974名微博用户参与了调查。自杀可能性高的用户具有较高的代词使用率(优势比,OR=1.18,P=0.001)、前置词使用率(OR=1.49,P=0.02)、多功能词使用率(OR=1.12,P=0.04)、较低的动词使用率(OR=0.78,P<0.001)以及较多的总字数(OR=1.007,P=0.008)。第二人称复数与严重抑郁(OR=8.36,P=0.01)和压力(OR=11,P=0.005)呈正相关,而与工作相关的词汇与WSC(OR=0.71,P=0.008)、严重抑郁(OR=0.56,P=0.005)和焦虑(OR=0.77,P=0.02)呈负相关。不一致的是,第三人称复数与WSC呈负相关(OR=0.02,P=0.047),但与严重压力呈正相关(OR=41.3,P=0.04)。与成就相关的词汇与抑郁呈正相关(OR=1.68,P=0.003),而与健康相关(OR=2.36,P=0.004)和与死亡相关(OR=2.60,P=0.01)的词汇与压力呈正相关。机器分类器在全样本集中未取得令人满意的性能,但在表现出WSC的人群中能够对高自杀可能性(曲线下面积,AUC=0.61,P=0.04)和严重焦虑(AUC=0.75,P<0.001)进行分类。

结论

SC-LIWC有助于检测中国社交媒体中自杀风险和情绪困扰的语言标记,并能识别出与英文文献中先前发现不同的特征。一些发现为未来的验证提出了新的假设。基于SC-LIWC特征的机器分类器前景广阔,但仍需进一步优化以应用于现实生活。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f5f/5525005/430234f2fd75/jmir_v19i7e243_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验