King's College London, IoPPN, London, SE5 8AF, UK.
School of Computer Science and Communication, KTH, Stockholm.
Sci Rep. 2017 Mar 22;7:45141. doi: 10.1038/srep45141.
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.
受精神疾病影响的人数不断增加,随之而来的是医疗保健使用负担的增加,以及生产力和调整后质量生命年的损失。自然语言处理技术在电子健康记录中的应用越来越广泛,可用于大规模研究精神健康状况和风险行为。然而,临床医生撰写的叙述性笔记并不能直接捕捉到患者的自身经历,只能记录护理点的横断面、专业印象。社交媒体平台已成为“即时”日常交流的来源,其中包括幸福感和心理健康等话题。在这项研究中,我们分析了社交媒体平台 Reddit 上的帖子,并开发了分类器,根据 11 种障碍主题识别和分类与精神疾病相关的帖子。使用神经网络和深度学习方法,我们可以在平衡数据集中自动识别精神疾病相关帖子,准确率为 91.08%,选择正确主题的加权平均准确率为 71.37%。我们相信,这些结果是开发方法来描述大量用户生成内容的第一步,这些内容可以支持内容策展和有针对性的干预。