Author Affiliations: Department of Biomedical Informatics, School of Medicine (Drs Yang, Al-Garadi, and Sarker, and Ms Xie), and Nell Hodgson Woodruff School of Nursing (Ms Hair), Emory University, Atlanta, GA.
Comput Inform Nurs. 2023 Sep 1;41(9):717-724. doi: 10.1097/CIN.0000000000000985.
Americans bear a high chronic stress burden, particularly during the COVID-19 pandemic. Although social media have many strengths to complement the weaknesses of conventional stress measures, including surveys, they have been rarely utilized to detect individuals self-reporting chronic stress. Thus, this study aimed to develop and evaluate an automatic system on Twitter to identify users who have self-reported chronic stress experiences. Using the Twitter public streaming application programming interface, we collected tweets containing certain stress-related keywords (eg, "chronic," "constant," "stress") and then filtered the data using pre-defined text patterns. We manually annotated tweets with (without) self-report of chronic stress as positive (negative). We trained multiple classifiers and tested them via accuracy and F1 score. We annotated 4195 tweets (1560 positives, 2635 negatives), achieving an inter-annotator agreement of 0.83 (Cohen's kappa). The classifier based on Bidirectional Encoder Representation from Transformers performed the best (accuracy of 83.6% [81.0-86.1]), outperforming the second best-performing classifier (support vector machines: 76.4% [73.5-79.3]). The past tweets from the authors of positive tweets contained useful information, including sources and health impacts of chronic stress. Our study demonstrates that users' self-reported chronic stress experiences can be automatically identified on Twitter, which has a high potential for surveillance and large-scale intervention.
美国人承受着较高的慢性压力负担,尤其是在 COVID-19 大流行期间。虽然社交媒体有许多优势可以弥补传统压力测量方法的不足,包括调查,但它们很少被用于检测个体自我报告的慢性压力。因此,本研究旨在开发和评估一个基于 Twitter 的自动系统,以识别自我报告有慢性压力体验的用户。我们使用 Twitter 公共流媒体应用程序编程接口收集包含某些与压力相关关键词的推文(例如,“慢性”“持续”“压力”),然后使用预定义的文本模式对数据进行过滤。我们手动注释包含(不包含)自我报告慢性压力的推文作为阳性(阴性)。我们使用多种分类器进行训练,并通过准确性和 F1 分数进行测试。我们注释了 4195 条推文(1560 条阳性,2635 条阴性),注释者之间的一致性为 0.83(Cohen 的 kappa)。基于转换器的双向编码器表示的分类器表现最好(准确性为 83.6%[81.0-86.1]),优于表现第二好的分类器(支持向量机:76.4%[73.5-79.3])。阳性推文作者的过去推文包含有用信息,包括慢性压力的来源和健康影响。我们的研究表明,可以在 Twitter 上自动识别用户的自我报告慢性压力体验,这对于监测和大规模干预具有很高的潜力。