Doan Son, Ritchart Amanda, Perry Nicholas, Chaparro Juan D, Conway Mike
Deparment of Biomedical Informatics, University of California, San Diego, La Jolla, CA, United States.
Linguistics Department, University of California, San Diego, La Jolla, CA, United States.
JMIR Public Health Surveill. 2017 Jun 13;3(2):e35. doi: 10.2196/publichealth.5939.
Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Twitter is a microblog platform that allows users to post their own personal messages (tweets), including their expressions about feelings and actions related to stress and stress management (eg, relaxing). While Twitter is increasingly used as a source of data for understanding mental health from a population perspective, the specific issue of stress-as manifested on Twitter-has not yet been the focus of any systematic study.
The objective of our study was to understand how people express their feelings of stress and relaxation through Twitter messages. In addition, we aimed at investigating automated natural language processing methods to (1) classify stress versus nonstress and relaxation versus nonrelaxation tweets, and (2) identify first-hand experience-that is, who is the experiencer-in stress and relaxation tweets.
We first performed a qualitative content analysis of 1326 and 781 tweets containing the keywords "stress" and "relax," respectively. We then investigated the use of machine learning algorithms-in particular naive Bayes and support vector machines-to automatically classify tweets as stress versus nonstress and relaxation versus nonrelaxation. Finally, we applied these classifiers to sample datasets drawn from 4 cities in the United States (Los Angeles, New York, San Diego, and San Francisco) obtained from Twitter's streaming application programming interface, with the goal of evaluating the extent of any correlation between our automatic classification of tweets and results from public stress surveys.
Content analysis showed that the most frequent topic of stress tweets was education, followed by work and social relationships. The most frequent topic of relaxation tweets was rest & vacation, followed by nature and water. When we applied the classifiers to the cities dataset, the proportion of stress tweets in New York and San Diego was substantially higher than that in Los Angeles and San Francisco. In addition, we found that characteristic expressions of stress and relaxation varied for each city based on its geolocation.
This content analysis and infodemiology study revealed that Twitter, when used in conjunction with natural language processing techniques, is a useful data source for understanding stress and stress management strategies, and can potentially supplement infrequently collected survey-based stress data.
压力是导致美国许多重大健康问题的一个因素,如心脏病、抑郁症和自身免疫性疾病。在心理健康治疗中,放松常常被推荐作为减轻压力的一线策略,从而改善健康状况。推特是一个微博平台,用户可以发布自己的个人信息(推文),包括他们对与压力及压力管理相关的感受和行为的表达(例如放松)。虽然推特越来越多地被用作从人群角度理解心理健康的数据来源,但推特上所呈现的压力这一具体问题尚未成为任何系统研究的焦点。
我们研究的目的是了解人们如何通过推特信息表达他们的压力和放松情绪。此外,我们旨在研究自动化自然语言处理方法,以(1)对压力与非压力推文以及放松与非放松推文进行分类,(2)识别压力和放松推文中的第一手体验——即体验者是谁。
我们首先分别对1,326条和781条包含关键词“压力”和“放松”的推文进行了定性内容分析。然后,我们研究了机器学习算法(特别是朴素贝叶斯和支持向量机)的使用,以自动将推文分类为压力与非压力以及放松与非放松。最后,我们将这些分类器应用于从推特的流式应用程序编程接口获取的美国4个城市(洛杉矶、纽约、圣地亚哥和旧金山)的样本数据集,目的是评估我们对推文的自动分类与公共压力调查结果之间的相关程度。
内容分析表明,压力推文最常见的主题是教育,其次是工作和社会关系。放松推文最常见的主题是休息与度假,其次是自然和水。当我们将分类器应用于城市数据集时,纽约和圣地亚哥的压力推文比例明显高于洛杉矶和旧金山。此外,我们发现每个城市基于其地理位置,压力和放松的特征表达各不相同。
这项内容分析和信息流行病学研究表明,推特与自然语言处理技术结合使用时,是理解压力和压力管理策略的有用数据源,并且有可能补充基于调查的、收集频率较低的压力数据。