利用深度学习从社交媒体中提取自杀相关的精神压力源

BACKGROUND: Suicide has been one of the leading causes of deaths in the United States. One major cause of suicide is psychiatric stressors. The detection of psychiatric stressors in an at risk population will facilitate the early prevention of suicidal behaviors and suicide. In recent years, the widespread popularity and real-time information sharing flow of social media allow potential early intervention in a large-scale population. However, few automated approaches have been proposed to extract psychiatric stressors from Twitter. The goal of this study was to investigate techniques for recognizing suicide related psychiatric stressors from Twitter using deep learning based methods and transfer learning strategy which leverages an existing annotation dataset from clinical text. METHODS: First, a dataset of suicide-related tweets was collected from Twitter streaming data with a multiple-step pipeline including keyword-based retrieving, filtering and further refining using an automated binary classifier. Specifically, a convolutional neural networks (CNN) based algorithm was used to build the binary classifier. Next, psychiatric stressors were annotated in the suicide-related tweets. The stressor recognition problem is conceptualized as a typical named entity recognition (NER) task and tackled using recurrent neural networks (RNN) based methods. Moreover, to reduce the annotation cost and improve the performance, transfer learning strategy was adopted by leveraging existing annotation from clinical text. RESULTS & CONCLUSIONS: To our best knowledge, this is the first effort to extract psychiatric stressors from Twitter data using deep learning based approaches. Comparison to traditional machine learning algorithms shows the superiority of deep learning based approaches. CNN is leading the performance at identifying suicide-related tweets with a precision of 78% and an F-1 measure of 83%, outperforming Support Vector Machine (SVM), Extra Trees (ET), etc. RNN based psychiatric stressors recognition obtains the best F-1 measure of 53.25% by exact match and 67.94% by inexact match, outperforming Conditional Random Fields (CRF). Moreover, transfer learning from clinical notes for the Twitter corpus outperforms the training with Twitter corpus only with an F-1 measure of 54.9% by exact match. The results indicate the advantages of deep learning based methods for the automated stressors recognition from social media.

背景：自杀是美国主要死因之一。自杀的一个主要原因是精神压力。在高危人群中检测到精神压力源将有助于早期预防自杀行为和自杀。近年来，社交媒体的广泛普及和实时信息共享流使得对大规模人群进行潜在的早期干预成为可能。然而，很少有自动化方法被提出从 Twitter 中提取精神压力源。本研究的目的是探讨基于深度学习的方法和迁移学习策略从 Twitter 中识别与自杀相关的精神压力源的技术，该策略利用来自临床文本的现有注释数据集。

方法：首先，使用包括基于关键字的检索、过滤和使用自动二进制分类器进一步细化的多步骤管道，从 Twitter 流数据中收集与自杀相关的推文数据集。具体来说，使用卷积神经网络 (CNN) 算法构建二进制分类器。接下来，对与自杀相关的推文中的精神压力源进行注释。压力识别问题被概念化为典型的命名实体识别 (NER) 任务，并使用基于递归神经网络 (RNN) 的方法来解决。此外，为了降低注释成本并提高性能，通过利用来自临床文本的现有注释，采用了迁移学习策略。

结果与结论：据我们所知，这是首次使用基于深度学习的方法从 Twitter 数据中提取精神压力源。与传统机器学习算法的比较表明了基于深度学习的方法的优越性。CNN 在识别与自杀相关的推文方面表现出色，准确率为 78%，F1 得分为 83%，优于支持向量机 (SVM)、Extra Trees (ET) 等。基于 RNN 的精神压力源识别通过精确匹配获得最佳 F1 得分为 53.25%，通过不精确匹配获得最佳 F1 得分为 67.94%，优于条件随机场 (CRF)。此外，从临床笔记向 Twitter 语料库的迁移学习比仅使用 Twitter 语料库的训练表现更好，精确匹配的 F1 得分为 54.9%。结果表明，基于深度学习的方法在社交媒体中自动识别压力源方面具有优势。